Access Keys:
Skip to content (Access Key - 0)

Application 2.2.1 Tool 2.10 Ontology Management System

INTRODUCTION

Two onlologies are used to annotate different elements of phenotyping data so that they can be consistently stored in the CHADO ND schema and compatible data can be integrated across different phenotyping studies. Firstly, the IBDB Structure ontology has terms defining properties of different elements of the data model and relationships between them as well as CLASS terms grouping terms of both ontologies into classes. This essentially extends the CHADO ND schema. Secondly, the Crop Ontology defines the variables which describe the context of the phenotyping experiement and which record the observations from the experiments.(Here we merge the CropResearch Ontology with a particular CropTrait ontology to make a single working Crop Ontology).

OMS Schema (From CHADO)

The Crop Ontology contains several controlled vocabularies (CV):

  • The PROPERTIES CV has terms describing the design and treatment factors applied in phenotyping experiments and the traits being measured in them.
  • The METHODS CV has terms describing the protocols by which those properties are applied or measured in phenotyping experiments.
  • The SCALES CV has terms describing the scales or units in which the values of the properties are recorded.
  • The STANDARD VARIABLES CV has terms defined by combinations of one property term, one method term and one scale term which describe the variables recording the design and observations of phenotyping experiments.
  • In addition each categorical variable (one which has a scale constraining its values to a specific set of valid values) had a VALID VALUE CV with terms describing those valid values.

Each of these controlled vocabularies is stored in the cv table and the terms from each are in the cvterm table. VALID VALUE CVs for categorical variables are named with the cvterm_id of the categorical variable and described by the preferred name of that variable.

Figure 2. Controlled vocabularies for Phenotyping Data from Field Trials

The IBDB TERMS CV

Terms in the IBDB TERMS ontology belong to one of four classes: IBDB structure, IBDB class, IBDB data type or IBDB element. The relation of belonging to a class is recorded by a record in the cvterm_relationship table with type_id=1225 (is a). Terms in the class IBDB structure describe entities and relationships used in the schema. Terms in IBDB class group terms of an ontology in a hierarchy for easy browsing. Terms in IBDB data type describe the data type of each STANDARD VARIABLE. Terms in IBDB element indicate the element of the logical schema to which a STANDARD VARIABLE belongs, and where its values are stored.

IBDB TERMS which are class terms also have a 'type' relationship (1105) to term 1090 'Class'

The PROPERTIES, METHODS AND SCALES CVS

Terms in the PROPERTIES CV (cv_id=1010) have an 'is a' (1225) relationship to an IBDB class which affords easy browsing.

Table 4. Terms in the IBDB class category

1345 Physiological, 1430 Yield components, 1440 Phenology, 1450 Post harvest

Terms in the METHODS (cv_id=1020)and SCALES (cv_id=1030) describe the methods or protocols by which properties are applied or measured and the scales or units in which they are reported. They do not have any specific relationships or properties.

The VARIABLES CV

Terms in the VARIABLES CV are related to one PROPERTY term, one METHOD term, and one SCALE term. They also have a type relationship (has type, type_id=1105) which defines the data type of its values.

Table 5: Each variable is related to one data type with the 'has type' (cvterm_id=1105) relationship.

Each Variable also has a 'stored in' relationship (type_id=1044) which specifies which component of the schema that stores the values of that variable and where. The list of possible storage elements for any variable is given in Table 6. This means that a STANDARD VARIABLE (combination of PROPERTY, METHOD and SCALE) could appear more than once in the ontology (with a different name) because in one study it might belong to one element of the schema and in another to different element. For example NITROGEN FERTILIZER might be a trial management factor in one study but a treatment factor in another. In the first it belongs to the TRIAL ENVIRONMENT component and in the second it belongs to the TRIAL DESIGN component. This complicates data integration, but is a consequence of splitting the schema into different elements, which for the most part will not overlap. When such overlaps occur, the application layer will have to deal with the data integration.

However the combination of a PROPERTY, METHOD, SCALE and storage ELEMENT is unique and affords integration of all data with these characteristics across all studies in the database. The challenge for the application layer is to have the application know the storage ELEMENT of each variable and we will have to get this from the application templates.

Table 6. Interpretation of 'stored in' (cvterm_id=1044) or 'role' relationship for terms in the VARIABLE CV

projectprop.type_id type of variable definition and storage location of the values
1010 Study Information Study element with values stored in projectprop.value
1011 Study name Study name stored in project.name
1012 Study title Study title stored in project.description
1015 Dataset Information Dataset element with values stored in projectprop.value
1016 Dataset name Dataset name stored in project.name
1017 Dataset description Dataset description stored in project.description
1020 Trial environment Trial environment information stored in nd_geolocationprop.value
1021 Trial instance Trial instance number stored in nd_geolocation.description
1022 Latitude Georeference data stored in nd_geolocation.latitude
1023 Longitude Georeference data stored in nd_geolocation.longitude
1024 Datum Georeference geodetic datum stored in nd_geolocation.geodetic_datum
1025 Altitude Georeference altitude stored in nd_geolocation.altitude
1030 Trial design Field trial design and layout information stored in nd_experimentprop.value
1040 Germplasm entry Germplasm entry information stored in stockprop.value
1041 Entry number Germplasm entry number unique within in a study stored in stock.uniquename
1042 Entry GID GMS germplasm identifier stored in stock.dbxref_id
1046 Entry designation GMS germplasm name stored in stock.name
1047 Entry code Germplasm entry code assigned within a study stored in stock.value (with a type_id=8300)
1043 Observation variate Phenotypic data stored in phenotype.value
1048 Categorical variate Categorical variate with values stored in phenotype.cvalue_id


For example term 8250 GREMPLASM IDENTIFIER - ASSIGNED (DBCV) has relationships shown in Table 7.

Table 7: Relationships for a STANDARD VARIABLE term in the CRO

id type_id subject_id object_id interpretation of the relationship type name of the related term
2420 1225 8250 1087 8250 is a GERMPLASM TERM
3250 1044 8250 1040 8250 stored in ENTRY ELEMENT
6560 1105 8250 1120 8250 has type CHARACTER VARIABLE
8240 1200 8250 2205 8250 has property GERMPLASM ID
8242 1210 8250 4030 8250 has method ASSIGNED
8244 1220 8250 6000 8250 has scale DBCV

The VALID VALUE CVS

Each variable of type 1130 (CATEGORICAL VARIABLE) spawns a VALID VALUE CV as shown in Figure 2. These cvs contain the valid values and their interpretation for the categorical variable. They are named in the cv table by the string value of the cvterm_id from the VARIABLES CV for the categorical variable to which they belong (although this is not the link which is described below), and they are described in the cv table with the description of the categorical variable. For example the valid values for the variable 8135 EXPERIMENTAL DESIGN - ASSIGNED (TYPE) are in cv 8135 as shown in Figure 3.

Figure 3: Valid Values of a categorical variable contained in a sub-cv of the CRO

Categorical variables sometimes require an ordering for their values (either because there is an intrinsic ordering eg Low, Medium, High, or because if makes sense to present them to users (in pick lists for example) in a certain order. The default order is the alphabetical order of cvterm.name (with numbers treated in character order). If a different ordering is required each term should have a property in the cvtermprop table of type_id order (IBDB TERMS cvterm_id=1420) with the numerical sequence order for that term as its value.

The categorical variable in the VARIABLES CV has a 'has value' relationship (cvterm_id=1190) to each term it its VALID VALUE CV. For example these relationships are shown in Table 8 for variable 8135.

Table 8: Relationships between a categorical variable and its valid values.

id type_id subject_id object_id interpretation of the relationship type name of the related term
10100 1190 8135 10100 8135 HAS VALUE CRD
10110 1190 8135 10110 8136 HAS VALUE RCBD
10120 1190 8135 10120 8137 HAS VALUE ALPHA
10130 1190 8135 10130 8138 HAS VALUE RIBD
10140 1190 8135 10140 8139 HAS VALUE NRIBD
10150 1190 8135 10150 8140 HAS VALUE NRRCD
10160 1190 8135 10160 8141 HAS VALUE AUGMENTED

VALID RANGES

Terms of type 1110 (NUMERIC VARIABLES) may have MINIMUM and MAXIMUM allowable values specified in the cvtermprop table as shown in Table 9 for variable SOIL PH.

Table 9: MINIMUM and MAXIMUM allowable values for a numeric variable in the CRO

cvtermprop_id cvterm_id type_id value rank
8000 8270 1113 1 0
8010 8270 1115 14 0

SYNONYMS AND FOREIGN LANGUAGE TERM NAMES AND DESCRIPTIONS

Synonyms and foreign language names and descriptions for terms are stored in the cvtermsynonym table.

Figure 4. Synonyms and foreign language names for controlled vocabulary terms

The LINKS TO www.CropOntology.org

The Crop ontologies in IBDB are supposed to be linked (and synchronized) with the terms on www.CropOntology.org. The Term IDs from that site are carried as properties of the corresponding cvterms in the cvtermprop table with type_id=1226 (Crop ontology term ID).

Adaptavist Theme Builder (3.3.3-conf210) Powered by Atlassian Confluence 2.10.3, the Enterprise Wiki.
Free theme builder license