Elements of all Study Templates
Study or Trait Templates are used to customize data collection and recording forms for different crops, study types and breeding projects. They are specially formatted excel files which will be created by a Configuration tool in the full version of the IB WorkBench, but can be created manually or with the existing WorkBook tool.
A Study Template specifies all the variables in a data set. There are two kinds of variables, Labels and Variates. Each variable is defined by a name (unique to the study), and a set of metadata determined and controlled by the Trait Dictionary.
This metadata includes the property which the variable represents (eg block in a field trial, level of nitrogen fertilizer applied, or the name of the trait being measured such as plant height), the units or scale in which the property is represented (eg kg/ha or rust disease score on a specific scale), and the procedure by which the value is applied or measured (eg nitrogen fertilizer applied as urea at planting, or water use efficiency measured by such and such a procedure). These three items allow integration and comparison of data across studies. Further metadata specifies whether the variable is categorical (taking values from a finite set of levels) or not and whether it is stored as a character string or as a number in the database. Finally there are basic quality standards for each variable - the set of possible levels (and their meanings) for categorical variables and maximum and minimum allowable values for non categorical (continuous or discrete) numeric variables.
Labels define the context of the dataset. They are always categorical (take values from a finite set of levels). The sampling units or observation units of the dataset are uniquely indexed by levels of the Labels but there is often redundancy in this indexing so that Label variables can be grouped into sets with one member of the set, termed the Factor for the set. The Label which defines the Factor specifies all possible levels for that factor set and the other Labels in the set simply add context. For example a data set will often have a variable specifying the Entry Number for germplasm in a trial and another specifying the designation or name of the entry. The first of these is the Factor (Entry Number) and the other is a Label of Entry Number. It should be noted that labels can attach the same level to different factor levels whereas the Factor must uniquely define all levels. (In our example two entries could have the same designation).
Figure 1. Data Model showing Labels, Factors, Observation Units and Variates
Factors and Labels generally contain information that is known about the experiment before the data collection starts and fall into two main categories: Design Factors specifying the sampling lay-out and treatments, and Management Factors specifying conditions under which the data was collected (eg irrigated or not, planting date etc.).
Some Labels have only one level for the entire data set (not necessarily for the whole experiment) These are called Conditions and they generally appear as headers to the data set (eg the investigator, the location where the whole data set was collected, the date of the data collection). Every data set has one Condition - the Study to which it belongs and this Condition obviously applies to all data sets in the study. Other Condition Labels which apply to all data sets in the study are called Study Labels. Other Conditions may take different values in different data sets. For example, the property Trial Instance could be represented by two Condition Labels, Trial Location ID and Trial Location Name. These could be Study Labels if the whole study was carried out at one location, but they could also be Trial Instance Conditions indicating that the experiment is a multi-location experiment and that data sets will arrive with different levels of this factor. Variates are the variables to be measured for the data set. Their values are not generally known before the data collection. They can be divided into two classes, Constants and Variates. Constants are variates which take only one value for the whole data set (although not necessarily for the whole Study). They are separated out because they generally appear as headers in a Field Book. Examples are the Ph of the site or the total rainfall during the trial. Other Variates take separate values for each sampling unit in the dataset and thes generally appear as columns in the data set.
The sampling units of the dataset are indexed by levels of the Factor Labels. That is ignoring the redundancy attributable to other Label variables for each factor.
Specific structure of Trait Templates
The structure of a Study Template described above is very general and requires a lot of user input and user choices to turn it into a Field Book for data gathering. Specific types of study such as Field Trials have certain commonalities so we can make a Trial Template more informative for the user interface by adding some constraints to the general definition of a Study Template. An example of a multi-environment Field Trial template is shown in Figure 2.
Figure 2. Example of a Template for a Multi-Environment Trial
We insist that the following Factors and Labels are present in a template for a multi-environment field trial (MET):
- A Factor with property Study* - this is always assumed to be present anyway. There may be some Study labels present also to define MET context.
- A Factor with property 'Trial Instance' and scale 'Number'* and possibly some labels of this factor to define context of the trial instances. Scale Number simply indicates levels 1,2,3 ...n. Hence only the number of levels need be specified in order to define all levels.
- A Factor with property 'Germplasm Entry' and scale 'Entry ID', 'Number'* or 'Nested Number'. If the scale is Entry ID then the Entry IDs of the germplasm list are used as the factor levels. If the scale is Number then sequential numbers 1,2,...n are assigned in the order of the germplasm entries on the list. The scale Nested Number concatenates the level of a nesting factor with the level number of the nested factor as formed for the scale Number, to create unique levels for the data set. (eg. entry 10 in trial 9 will have level 910.) The nesting factor is indicated by name in the NESTED IN column of the template. For Trial templates it will be the name of the Trial Instance factor.
If there are labels of the Germplasm Entry factor (and there will almost certainly be some) then columns of the germplasm list are automatically mapped to labels with Property and Scale as in the following table, or, if the property and scale of the labels do not match those in the table, the user is asked for the label values.
||Germplasm List Record
- A Factor with property 'Field Plot' and scale 'Nested Number'* which is nested in the Trial Instance factor (which must be named in the NESTED IN column of the template) and will have some labels with properties matching design parameters such as Rep and Block. We can assume (for simplicity of the Fieldbook user interface) that these have scale Number.
There may be other factors in the template, and we can assume for the sake of interface simplicity, that they too have scale Number. These factors describe two additional features of MET fieldbooks:
- Other Treatment factors* are factors with properties other than Study, Trial Instance, Germplasm Entry, Field Plot or Sampling Unit and we can assume they have scale 'Number'.
- Factors with property 'Sampling Unit' and scale 'Nested Number'* define one or more sub-sampling schemes for the MET. These are always nested in Field Plots.
Each of the Factors of type 3 to 6 could have associated Labels with different scales and methods.
There are two types of measurement variables in a MET template, Constants and Variates. Each will be observed at a specific sampling level. For Constants, the possible levels are at the Study level (one value for the whole study) or at the Trial Instance level - one value for each trial instance. The choice is indicated in the Sample Unit column of the template. Similarly Variates will be observed at the Field Plot level or one of the Sampling Unit levels and again the choice is indicated by the appropriate factor name in the Sample Unit column of the template.