Two onlologies are used to annotate different elements of phenotyping data so that they can be consistently stored in the CHADO ND schema and compatible data can be integrated across different phenotyping studies. Firstly, the IBDB Structure ontology has terms defining properties of different elements of the data model and relationships between them as well as CLASS terms grouping terms of both ontologies into classes. This essentially extends the CHADO ND schema. Secondly, the Crop Ontology defines the variables which describe the context of the phenotyping experiement and which record the observations from the experiments.(Here we merge the CropResearch Ontology with a particular CropTrait ontology to make a single working Crop Ontology).
OMS Schema (From CHADO)
The Crop Ontology contains several controlled vocabularies (CV):
- The PROPERTIES CV has terms describing the design and treatment factors applied in phenotyping experiments and the traits being measured in them.
- The METHODS CV has terms describing the protocols by which those properties are applied or measured in phenotyping experiments.
- The SCALES CV has terms describing the scales or units in which the values of the properties are recorded.
- The STANDARD VARIABLES CV has terms defined by combinations of one property term, one method term and one scale term which describe the variables recording the design and observations of phenotyping experiments.
- In addition each categorical variable (one which has a scale constraining its values to a specific set of valid values) had a VALID VALUE CV with terms describing those valid values.
Each of these controlled vocabularies is stored in the cv table and the terms from each are in the cvterm table. VALID VALUE CVs for categorical variables are named with the cvterm_id of the categorical variable and described by the preferred name of that variable.
Figure 2. Controlled vocabularies for Phenotyping Data from Field Trials
The IBDB TERMS CV
Terms in the IBDB TERMS ontology belong to one of four classes: IBDB structure, IBDB class, IBDB data type or IBDB element. The relation of belonging to a class is recorded by a record in the cvterm_relationship table with type_id=1225 (is a). Terms in the class IBDB structure describe entities and relationships used in the schema. Terms in IBDB class group terms of an ontology in a hierarchy for easy browsing. Terms in IBDB data type describe the data type of each STANDARD VARIABLE. Terms in IBDB element indicate the element of the logical schema to which a STANDARD VARIABLE belongs, and where its values are stored.
IBDB TERMS which are class terms also have a 'type' relationship (1105) to term 1090 'Class'
The PROPERTIES, METHODS AND SCALES CVS
Terms in the PROPERTIES CV (cv_id=1010) have an 'is a' (1225) relationship to an IBDB class which affords easy browsing.
Table 4. Terms in the IBDB class category
1345 Physiological, 1430 Yield components, 1440 Phenology, 1450 Post harvest
Terms in the METHODS (cv_id=1020)and SCALES (cv_id=1030) describe the methods or protocols by which properties are applied or measured and the scales or units in which they are reported. They do not have any specific relationships or properties.
The VARIABLES CV
Terms in the VARIABLES CV are related to one PROPERTY term, one METHOD term, and one SCALE term. They also have a type relationship (has type, type_id=1105) which defines the data type of its values.
Table 5: Each variable is related to one data type with the 'has type' (cvterm_id=1105) relationship.
Each Variable also has a 'stored in' relationship (type_id=1044) which specifies which component of the schema that stores the values of that variable and where. The list of possible storage elements for any variable is given in Table 6. This means that a STANDARD VARIABLE (combination of PROPERTY, METHOD and SCALE) could appear more than once in the ontology (with a different name) because in one study it might belong to one element of the schema and in another to different element. For example NITROGEN FERTILIZER might be a trial management factor in one study but a treatment factor in another. In the first it belongs to the TRIAL ENVIRONMENT component and in the second it belongs to the TRIAL DESIGN component. This complicates data integration, but is a consequence of splitting the schema into different elements, which for the most part will not overlap. When such overlaps occur, the application layer will have to deal with the data integration.
However the combination of a PROPERTY, METHOD, SCALE and storage ELEMENT is unique and affords integration of all data with these characteristics across all studies in the database. The challenge for the application layer is to have the application know the storage ELEMENT of each variable and we will have to get this from the application templates.
Table 6. Interpretation of 'stored in' (cvterm_id=1044) or 'role' relationship for terms in the VARIABLE CV
||type of variable
||definition and storage location of the values
||Study element with values stored in projectprop.value
||Study name stored in project.name
||Study title stored in project.description
||Dataset element with values stored in projectprop.value
||Dataset name stored in project.name
||Dataset description stored in project.description
||Trial environment information stored in nd_geolocationprop.value
||Trial instance number stored in nd_geolocation.description
||Georeference data stored in nd_geolocation.latitude
||Georeference data stored in nd_geolocation.longitude
||Georeference geodetic datum stored in nd_geolocation.geodetic_datum
||Georeference altitude stored in nd_geolocation.altitude
||Field trial design and layout information stored in nd_experimentprop.value
||Germplasm entry information stored in stockprop.value
||Germplasm entry number unique within in a study stored in stock.uniquename
||GMS germplasm identifier stored in stock.dbxref_id
||GMS germplasm name stored in stock.name
||Germplasm entry code assigned within a study stored in stock.value (with a type_id=8300)
||Phenotypic data stored in phenotype.value
||Categorical variate with values stored in phenotype.cvalue_id
For example term 8250 GREMPLASM IDENTIFIER - ASSIGNED (DBCV) has relationships shown in Table 7.
Table 7: Relationships for a STANDARD VARIABLE term in the CRO
||interpretation of the relationship type
||name of the related term
||8250 is a
||8250 stored in
||8250 has type
||8250 has property
||8250 has method
||8250 has scale
The VALID VALUE CVS
Each variable of type 1130 (CATEGORICAL VARIABLE) spawns a VALID VALUE CV as shown in Figure 2. These cvs contain the valid values and their interpretation for the categorical variable. They are named in the cv table by the string value of the cvterm_id from the VARIABLES CV for the categorical variable to which they belong (although this is not the link which is described below), and they are described in the cv table with the description of the categorical variable. For example the valid values for the variable 8135 EXPERIMENTAL DESIGN - ASSIGNED (TYPE) are in cv 8135 as shown in Figure 3.
Figure 3: Valid Values of a categorical variable contained in a sub-cv of the CRO
Categorical variables sometimes require an ordering for their values (either because there is an intrinsic ordering eg Low, Medium, High, or because if makes sense to present them to users (in pick lists for example) in a certain order. The default order is the alphabetical order of cvterm.name (with numbers treated in character order). If a different ordering is required each term should have a property in the cvtermprop table of type_id order (IBDB TERMS cvterm_id=1420) with the numerical sequence order for that term as its value.
The categorical variable in the VARIABLES CV has a 'has value' relationship (cvterm_id=1190) to each term it its VALID VALUE CV. For example these relationships are shown in Table 8 for variable 8135.
Table 8: Relationships between a categorical variable and its valid values.
||interpretation of the relationship type
||name of the related term
||8135 HAS VALUE
||8136 HAS VALUE
||8137 HAS VALUE
||8138 HAS VALUE
||8139 HAS VALUE
||8140 HAS VALUE
||8141 HAS VALUE
Terms of type 1110 (NUMERIC VARIABLES) may have MINIMUM and MAXIMUM allowable values specified in the cvtermprop table as shown in Table 9 for variable SOIL PH.
Table 9: MINIMUM and MAXIMUM allowable values for a numeric variable in the CRO
SYNONYMS AND FOREIGN LANGUAGE TERM NAMES AND DESCRIPTIONS
Synonyms and foreign language names and descriptions for terms are stored in the cvtermsynonym table.
Figure 4. Synonyms and foreign language names for controlled vocabulary terms
The LINKS TO www.CropOntology.org
The Crop ontologies in IBDB are supposed to be linked (and synchronized) with the terms on www.CropOntology.org. The Term IDs from that site are carried as properties of the corresponding cvterms in the cvtermprop table with type_id=1226 (Crop ontology term ID).