Access Keys:
Skip to content (Access Key - 0)

Business Rules for Mapping to Chado

The intent of the description is to be as readable as possible, so in some places the relationships are described at their logical level. For example, property type names may be stored in a separate table with links to the property table by id. For clarity a description of the property table would describe the type by its text name rather than id.

Primary keys are omitted in all descriptions of tables.

1.      Project Table (Study)

A Study is captured using the Project table. Information stored at this level describes properties relevant for all field trials in a Project (Study). Since it is important both that local breeders are free to use their nomenclature and that these local terms are mapped to a central ontology, the properties table maps all terms to the Ontology at the project level.  Example property terms include name of the Study, Start/End Date, Principal Investigator, Phenotypic data types, etc.

The project_properties table captures links to the ontology. Properties with the same rank represent a single compound property. The type column distinguishes the components of the compound property. Note that these entries are just the names of the properties with their ontology mappings. Values for properties are stored in the appropriate sub-module, usually in Experiment.

Table1: Example Project Properties for PI and Grain Yield

Project_id Type (would use an ID) Value Rank
1 Local Name Principal Investigator Name 1
1 Link to Ontology table 2 1
1 Short name PI 1
1 Local Name Grain yield 26
1 Link to Ontology table 12 26
1 Short Name YIELD 26

Question: Is the short name redundant?

2.      Geolocation table (Location)

The Geolocation table maps to the Location module of the logical data model. Information in this table corresponds to actual physical locations where Field Trials are conducted. Information about Locations is stored in the GMS database, outside the scope of the Phenotyping database. The Geolocation table has a link to the GMS location table in the existing schema via a property linked to the loc_id.

The Site Name (or Site Code if the Site Name is not available) is stored in the description field of the nd_geolocation table. The Site Name, Side Code, Site ID and loc_id are stored in the nd_gelocationprop table.

Locations are linked directly both to Project (study) and Experiment (Field Trial).

Location properties can store information about the location that don't affect the experiment outcome (see RULE OF THUMB note in experiment table). e.g. local name for location

3.      Stock table (Germplasm)

The Stock table maps to the Germplasm module of the logical data model.  Entries in the Stock table represent the Germplasm used in a field trail. Information about the Germplasm is stored in the GMS database, outside the scope of the Phenotyping database. Thus, the Stock table serves as a link to the information in the GMS.

The Stock table is used to represent "bags of seeds" rather than unique germplasm. Each entry in the Stock table represents a germplasm in the context of an experiment.

The Stock table is linked to the Germplasm table in GMS. The uniquename column contains the IBDB_GID. This column uniquely identifies a germplasm, but is not necessarily unique in the Stock table. Type = entry and the location-based name for the germplasm (e.g. entry_1) is stored in the name column.

The stock properties table contains information about stocks, for example whether a certain seed is a control/check in an experiment (field trial), source, where you got the seed, a label for the pedigree. QUESTION: How much to store here and how much to generate or replicate from the GMS. Perhaps replicate what is in GMS and allow users to "edit" this info and store their edits in the stock properties table.

4.      Experiment table (Field Trial)

The Experiment table maps to the Field Trial Module in the Logical Data Model. Experiments (Field Trials) and attached Observations are the heart of the Pheontyping database. The information stored in the Experiment and supporting tables describes the design of the Field Trials and/or the results of statistical analysis. The Germplasm included in an Experiment are indicated by links to the Stock table.

The Experiment table supports polymorphic behavior in the IBDB Phenotyping Database. It will store raw observations as well as summary statistics, means and even study templates. The different types are distinguished in the type column.

RULE OF THUMB: If a property affects the outcome of the experiment (e.g. fertilization, irrigation) it is stored in the experiment property table linked to an experiment of the appropriate type. For example, irrigation is sometimes applied to an entire location, in which case it will be linked to an experiment of type location. In other cases irrigation varies by plot in which case it will be linked to an experiment of type plot. For properties that don't affect the outcome (e.g. location name) they can be stored as a property of the appropriate entity.

Types are their usage

a.     Type = template, linked by proj_id, no links to the stock table. The purpose is to generate a "blank fieldbook" and capture properties of the experiment at the study level (e.g. irrigation)

b.     Type = location, linked by proj_id, no links to stock table. Usage of properties is similar to study above.

c.     Type = plot. A field trial. Linked by proj_id and links into the stock table. The properties describe the plot. Raw observations are linked from here in the phenotype table.

d.     Type = means. Linked by proj_id, links to the stock table, properties determined by what is being tracked. Means are linked from here in the phenotype table.

e.     Type = summary stats. Linked by proj_id, links to the stock table, properties similar to plot.

Mean sets indexed by different factors (e.g. all stocks by fert or all ferts by stock) are handled in the application layer by grouping means with the same properties.

UNDER ANALYSIS:

1. Multiple factor phenotypes (e.g. water table at a certain date) - handle by linking to a hierachy of cv_term - ? - example ontology would have "height at x" with children (1cm, 2cm, etc.).

Properties of the Experiments are dependent on the type of experiment. Some properties can occur in multiple experiment types. For example, a location may be irrigated or have only certain plots which are irrigated. In this example, the irrigation property would be tracked at the location and plot level respectively.

Example properties for Plots include the physical layout of Field Trials (e.g. Plot, SubPlot, Rep), for locations conditions of the trial (e.g. fertilized, irrigated), and  for templates the study level information (e.g. Principal Investigator, Objectives).  The type_id for a property maps it back to the appropriate key in the Project (Study) properties table.

Refer to mappings in Table 1: Project Properties.

Table 2 Experiment Property Entry Showing Principal Investigator

experiment_id type_id (links to "link to ontology" row in project_prop) value rank
1 2 Arllet 1



5.      Phenotype table (Observations)

Observations are recorded in the phenotype table. All rows are linked to the appropriate experiment row and type of observation is determined by type of experiment.

In the case of experiments of type Plot the observations in the phenotype table will be the directly observed raw data about plants in field trial (e.g. yield, plant height).

In the case of experiments of type Means or Summary Stats, the observations in the phenotype table will be the results of statistical analysis on the raw data from Plot experiments. These results will be stored rather than computed in the database due to the complexity of the analysis even for simple means calculations.

The type of observation is linked back to the Project Property table using the attr_id column. In the current understanding of the business rules, observable_id and attr_id are identical. The team will perform further analysis to determine if both columns are required.

Below is a sample entry from the Phenotype table with Yield 10.3, Plant Height 80 and BLB Resistance score 3. Units for each of the measures (e.g. cm) are also stored in the project_properties table but are deleted for brevity.

The corresponding entries from the project property table are also included for clarity. Note in this example the type_id is shown rather than the text for the ontology type.

Table 3 Phenotype Data and Project Properties

Observable_id Attr_id Value
12 12 10.3
13 13 80
15 15 3


Project_id Type_id Value Rank
2 1004 YIELD 26
2 1005 Grain yield 26
2 1006 12 26
2 1004 PHT 27
2 1005 Plant height 27
2 1006 13 27
2 1004 BLB 28
2 1005 BLB resistance score 28
2 1006 15 28



6.      CV_term table (Ontology)

The cv_term table maps to the Ontology module of the Logical Data Model. The Ontology module contains the standard data dictionary for all terms in the Phenotyping Data Model. As described above, project_properties are used to map these terms to local equivalents.

Linking to the OMS system. cv_term replaces OMS_STDVARS and cv_term_prop replaces OMS_Classes in the existing IBDB.

The dbxref_id links o thte Ontology module.

observable_id, attr_id and assay_id map to trait (Property in the Description sheet), scale (Scale in the Description sheet) and method (Method in the Description sheet) respectively.

NOTE: Some details of the use of the ontology are still under analysis. It is not anticipated there will be any changes that affect the schema.

7. Protocol table

FOR FUTURE ANALYSIS - During performance testing, there is an option to pull in the protocol if the size of experiment prop becomes too large. Protocols can be used to group together different experiment properties for re-use. The goal is just to remove redundancy in the experiment properties table.

Adaptavist Theme Builder (3.3.3-conf210) Powered by Atlassian Confluence 2.10.3, the Enterprise Wiki.
Free theme builder license