AACT Data Dictionary
AACT is composed of 54 tables that provide information related to clinical trials. The database contains multiple schemas, the main one being 'ctgov' which provides data retrieved from ClinicalTrials.gov. The main table in the ctgov schema is 'studies' which relates to all other ctgov tables through the NCT_ID. The ctgov schema also includes 2 tables that provide MeSH terms (Medical Subject Headings) which are published by the National Library of Medicine (NLM). The NLM has populated browse_conditions and browse_interventions tables with the MeSH terms they've determined help describe a study. The NLM updates the MeSH thesaurus each year. AACT provides some older versions of the MeSH thesaurus in the 'mesh_archive' schema.
AACT also includes a set of project schemas (prefixed with 'proj_') which contain datasets collected/curated by previous researchers who used the AACT database to conduct their study. These datasets enhance the value of the clinical trials data in a number of ways. Descriptions of project schema tables & columns are included in the data dictionary below. The Projects page provides more comprehensive information about these datasets.
Click here to view a listing of all 54 tables with description and row counts. Detailed information about all data elements included in these tables can be found in the data dictionary below.
AACT Data Elements
The Data Dictionary (table below) provides detailed information about each data element in the AACT database. Every study-related table/column in the AACT relational database is represented as a row in this table. There're only 3 examples presented for Enumerations. (AACT includes a few administrative tables that contain data not directly related to studies. Information about these tables is not included in the data dictionary.)
Click on the icon that appears at the beginning of the data element row to view the section of the NLM documentation that defines that particular data element. (The icon appears only for data element defined by NLM.)
Sort: Click on a column header to sort the table alphabetically by the values in that column.
Search/Filter: An input box appears under each column name. Enter a search term in one or more boxes & press [Enter] to filter the display. You many enter values in multiple columns to further restrict a search. The filter ignores case and will find all data elements that 'include' the term you enter (ie. your term does not need to be the complete value.)
Search Support: The first column provides icons that support filtering. The larger magnifying glass in the top row toggles display of the filter row. The smaller magnifying glass below it can be clicked to launch a search (same as pressing [Enter]). Clicking the funnel icon will clear filter values.
Authoritative Source: Please refer to the National Library of Medicine's (NLM) documentation for official definitions of all study/protocol and results data data elements. The icon that appears at the beginning of a row will open a tab in your browser to display NLM information about the data element on that row.
Horizontal Scroll: All columns do not fit on a page; you may scroll horizontally to view and filter on additional information about the data elements.
Source Column This column identifies the XML tag path used to obtain values for the data element from ClinicalTrials.gov API. Click here for the XML format specifications (xsd) and here to see an example of a study as provided by NLM via their API; this represents the source information used to populate the AACT database.
Row Count Column Each table's primary key definition includes the table's current row count.
Enumerations Column Some AACT table columns are used to store a discrete set of predefined terms (enumerations). For example, Public::Study.enrollment_type can only contain values: 'Anticipated', 'Actual' or null. If the table/column is for a set of enumerations, the data dictionary presents each of the enumeration values along with the corresponding number of rows with that value.
The list below describes the 54 study-related tables in the AACT database and provides current row counts for each. Project tables are also defined at the bottom of this table.
|Schema||Name||Row Count||Description||Domain||Rows per Study|
|ctgov||baseline_counts||139,556||Sample size at baseline for each study group; usually a count of participants but can represent other units of measure such as 'hands', 'hips', etc.||Results||many|
|ctgov||baseline_measurements||1,447,638||Summaries of demographic & baseline measures collected by arm or comparison group and for the entire population of participants in the clinical study.||Results||many|
|ctgov||brief_summaries||374,770||A single text column that provides a brief description of the study.||Protocol||one|
|ctgov||browse_conditions||633,066||NLM uses an internal algorithm to assess the data entered for a study and creates a list of standard MeSH terms that describe the condition(s) being addressed by the clinical trial. This table provides the results of NLM's assessment||Protocol||many|
|ctgov||browse_interventions||255,332||NLM uses an internal algorithm to assess the data entered for a study and creates a list of standard MeSH terms that describe the intervention(s) being addressed by the clinical trial. This table provides the results of NLM's assessment||Protocol||many|
|ctgov||calculated_values||375,612||An AACT-provided table that contains info that's been calculated from the information received from ClinicalTrials.gov. For example, number_of_facilities and actual_duration are provided in this table.||Protocol||one|
|ctgov||central_contacts||147,806||Contact info for people (primary & backup) who can answer questions concerning enrollment at any location of the study.||Protocol||many|
|ctgov||conditions||636,557||Name(s) of the disease(s) or condition(s) studied in the clinical study, or the focus of the clinical study. Can include NLM's Medical Subject Heading (MeSH)-controlled vocabulary terms.||Protocol||many|
|ctgov||countries||530,128||Countries in which the study has facilities/sites.||Protocol||many|
|ctgov||design_group_interventions||820,143||A cross reference for groups/interventions. If a study has multiple groups and multiple interventions, this table shows which interventions are associated with which groups.||Protocol||many|
|ctgov||design_groups||668,350||Defines the protocol-specified group, subgroup, or cohort of participants in a clinical trial assigned to receive specific intervention(s) or observations according to a protocol.||Protocol||many|
|ctgov||design_outcomes||2,088,215||Description of planned outcome measures and observations that will describe patterns of diseases and traits/associations with exposures, risk factors or treatment.||Protocol||many|
|ctgov||designs||375,612||Description of how the study will be conducted, including comparison group design and strategies for masking and allocating participants.||Protocol||one|
|ctgov||detailed_descriptions||249,571||A single text column that provides a detailed description of the study protocol.||Protocol||one|
|ctgov||documents||9,733||The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time.||Results||many|
|ctgov||drop_withdrawals||357,711||Summarized information about how many participants withdrew from the study, when and why. This information explains disposition of participants relative to the numbers starting and completing the study (enumerated in the Milestones table).||Results||many|
|ctgov||eligibilities||375,612||Information about the criteria used to select participants; includes inclusion and exclusion criteria||Protocol||one|
|ctgov||facilities||2,494,726||Name, address and recruiting status of the facilities participating in the study.||Protocol||many|
|ctgov||facility_contacts||334,112||Contact information for people responsible for the study at each facility. (primary and backup) Facility contact information is available if the facility status (Facilities.Status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. Contact information is removed from the publicly available content at ClinicalTrials.gov when the facility is no longer recruiting, or when the overall study status (Studies.Overall_status) changes to indicate that the study has completed recruitment.||Protocol||many|
|ctgov||facility_investigators||224,737||Names of the investigators at each study facility. Facility investigator information is available if the facility status (Facilities.Status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. Investigator information is removed from the publicly available content at ClinicalTrials.gov when the facility is no longer recruiting, or when the overall study status (Studies.Overall_status) changes to indicate that the study has completed recruitment.||Protocol||many|
|ctgov||id_information||506,342||Identifiers (other than the NCT ID) that uniquely identify the study such as that assigned by the sponsor, or an NCT ID that had previously been used for the study.||Protocol||many|
|ctgov||intervention_other_names||326,707||Terns or phrases that are synonymous with an intervention. (Each row is linked to one of the interventions associated with the study.)||Protocol||many|
|ctgov||interventions||643,512||The interventions or exposures (including drugs, medical devices, procedures, vaccines, and other products) of interest to the study, or associated with study arms/groups.||Protocol||many|
|ctgov||keywords||1,012,626||Provides words or phrases that best describe the protocol. Keywords help users find studies in the database. Can include NLM's Medical Subject Heading (MeSH)-controlled vocabulary terms.||Protocol||many|
|ctgov||links||56,669||Web site directly relevant to the protocol. (ie, links to educational, research, government, and other non-profit Web pages)||Protocol||many|
|ctgov||milestones||511,104||Information summarizing the progress of participants through each stage of a study, including the number of participants who started and completed the trial. Enumeration of participants not completing the study is included in the Drop_Withdrawals table.||Results||many|
|ctgov||outcome_analyses||206,592||Results of scientifically appropriate statistical analyses performed on primary and secondary study outcomes. Includes results for treatment effect estimates, confidence intervals and othe rmeasures of dispersion, and p-values.||Results||many|
|ctgov||outcome_analysis_groups||399,943||Identifies the comparison groups that were involved with each outcome analysis||Results||many|
|ctgov||outcome_counts||907,566||Sample size included in analysis for each outcome for each study group; usually participants but can represent other units of measure such as eyes 'lesions', etc.||Results||many|
|ctgov||outcome_measurements||2,907,984||Summary data for primary and secondary outcome measures for each study group. Includes parameter estimates and measures of dispersion/precision.||Results||many|
|ctgov||outcomes||379,655||Descriptions of outcomes, or observation that were measured to determine patterns of diseases or traits, or associations with exposures, risk factors, or treatment. Includes information such as time frame, population and units. (Specific measurement results are stored in the Outcome_Measurements table.)||Results||many|
|ctgov||overall_officials||374,697||People responsible for the overall scientific leadership of the protocol including the principal investigator.||Protocol||many|
|ctgov||participant_flows||48,700||Recruitment information relevant to the recruitment process & pre-assignment details (ie. significant events in the study that occur after participant enrollment, but prior to assignment of participants). Information about participant flow that applies to all milestones.||Results||one|
|ctgov||pending_results||17,914||Provides information about events related to the submission of study results for quality control (QC) review before the results are publicly posted. Events reported: submissions, cancellations and returns for modifications. 'Unknown' is specified for cancellations that occurred before 05/08/2018 (when this data began being collected). When a study passes quality control review: 1) results_first_submitted_date is set to the study's first submission date, 2) results_first_submitted_qc_date is set to the submission date of the version of results that passed QC, 3) the study's pending_results rows are removed, and 4) the results are posted on ClinicalTrials.gov. The latest versions of all studies are posted every business day but, there can be unexpected delays. The results_first_posted_date value will usually be identified as an 'Estimate' when first posted. This will switch to 'Actual' (and the date may be adjusted) on the next posting cycle, when the true posting date is known.||Results||many|
|ctgov||provided_documents||23,790||The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time.||Results||many|
|ctgov||reported_events||6,481,059||Summary information about reported adverse events (any untoward or unfavorable medical occurrence to participants, including abnormal physical exams, laboratory findings, symptoms, or diseases), including serious adverse events, other adverse events, and mortality.||Results||many|
|ctgov||responsible_parties||356,790||People who have access to and control over the data from the study, have the right to publish study results, and have the ability to meet all of the requirements for the submission of study information.||Protocol||many|
|ctgov||result_agreements||48,700||Info about whether an agreement exists between the sponsor & the principal investigators (PIs) that restricts the PIs ability to discuss study results at scientific meetings or other public or private forums, or to publish info concerning the study in scientific or academic journals after the study is completed.||Results||many|
|ctgov||result_contacts||48,700||Point of contact for scientific information about the clinical study results information.||Results||many|
|ctgov||result_groups||1,262,140||Consolidated, aggregate list of group titles/descriptions used for reporting summary results information.||Results||many|
|ctgov||search_results||5,496||This joins studies with saved queries withing the study_searches table||n/a||many|
|ctgov||sponsors||600,257||Name of study sponsors and collaborators. The sponsor is the entity or individual initiating the study. Collaborators are other organizations providing support, including funding, design, implementation, data analysis, and reporting.||Protocol||many|
|ctgov||studies||375,612||Basic info about study, including study title, date study registered with ClinicalTrials.gov, date results first posted to ClinicalTrials.gov, dates for study start and completion, phase of study, enrollment status, planned or actual enrollment, number of study arms/groups, etc.||Protocol & Results||one|
|ctgov||study_references||1,145,888||Citations to publications related to the study protocol and/or results. Includes PubMed Unique Identifier (PMID) and/or full bibliographic citation.||Protocol||many|
|ctgov||study_searches||1||<html>These are saved queries that are used to search <u>ClinicalTrials.gov</u></html>||n/a||none|
|proj_tag_nephrology||tagged_terms||115||Table of terms determined to be nephrology-related by a team of Duke clinicians. The source of the terms is the 2010 MeSH thesaurus as well as free-text terms/phrases used to identify interventional studies registered in ClinicalTrials.gov between 2007 and 2010.||Project||n/a|
|proj_tag_nephrology||analyzed_studies||13,327||Information identifying the studies used to support the 2014 JK Inrig publication in Am J Kidney Disease. The studies identified were determined to be nephrology-related in the course of conducting the investigation.||Project||n/a|
|proj_cdek_standard_orgs||cdek_organizations||24,749||Table of standard organization names that serve to||Project||n/a|
|proj_cdek_standard_orgs||cdek_synonyms||25,678||Table of organization names that have been entered into ClinicalTrials.gov, along with the preferred (standard) name as determined by CDEK.||Project||n/a|
|proj_results_reporting||analyzed_studies||13,327||Trials that were determined likely subject to FDAAA provisions (highly likely applicable clinical trials, or HLACTs) from 2008 through 2013. Regression models were used to examine characteristics associated with reporting at 12 months and throughout the 5-year study period.||Project||n/a|
|proj_tag_study_characteristics||oncology_studies||0||Trials determined to be oncology related for the purpose of the Study Characteristics investigation.||Project||n/a|
|proj_tag_study_characteristics||mental_health_studies||0||Trials determined to be mental health related for the purpose of the Study Characteristics investigation.||Project||n/a|
|proj_tag_study_characteristics||cardiovascular_studies||0||Trials determined to be cardiovascular related for the purpose of the Study Characteristics investigation.||Project||n/a|
|proj_tag_study_characteristics||tagged_terms||115||Table of terms determined to be either mental health, oncology or cardiovascular-related by a team of Duke clinicians. The source of the terms is the 2010 MeSH thesaurus as well as free-text terms/phrases used to identify interventional studies registered in ClinicalTrials.gov between 2007 and 2010.||Project||n/a|
AACT Views & Functions
Database views & functions have been provided to facilitate data retrieval. These features are described below. The first table identifies a set of views that provide concatenate sets of values per study. These views are useful for people who need to generate a spreadsheet that contains one row per study and would like to include info for which studies often have multiple values, such as conditions. For example, if you need a comma-separated list of the conditions associated with a study, you may get that from the all_conditions view. These views are in the ctgov schema; all include 2 columns: NCT_ID & Names.
Here is a sample query:
aact=# select * from ctgov.all_conditions where nct_id='NCT00000146';
|Schema||View/Function name||Description||Source Data||Data Returned||Example|
|ctgov||all_conditions||concatenated list of all conditions (MeSH term) identified for a study||browse_conditions.mesh_term||nct_id & names: string containing comma delimited list of condtions||select * from all_conditions where nct_id = '';|
|ctgov||all_countries||concatenates all countries associated with a study (excluding those identified as having been removed)||countries.name||nct_id & names: string containing comma delimited list of countries|
|ctgov||all_design_outcomes||design_outcomes.measure||nct_id & names: string containing comma delimited list of outcome measures|
|ctgov||all_facilities||concatenated list of all facility names associated with a study||facilities.name||nct_id & names: string containing comma delimited list of facility names|
|ctgov||all_group_types||concantenated list of the arm/group types included in a study||design_groups.group_type||nct_id & names: string containing comma delimited list of group types|
|ctgov||all_id_information||concatenated list of the IDs associated with a study||id_information.id_value||nct_id & names: string containing comma delimited list of ids|
|ctgov||all_interventions||concatenated list of all interventions (MeSH term) identified for a study||browse_interventions.mesh_term||nct_id & names: string containing comma delimited list of mesh terms|
|ctgov||all_intervention_types||concatenated list of all intervention types for a study||interventions.intervention_type||nct_id & names: string containing comma delimited list of intervention types|
|ctgov||all_keywords||concatenated list of the keywords associated with a study||keywords.name||nct_id & names: string containing comma delimited list of keywords|
|ctgov||all_primary_outcome_measures||concatenated list of the primary outcome measures for a study||design_outcomes.measure where type='primary'||nct_id & names: string containing comma delimited list of measures|
|ctgov||all_secondary_outcome_measures||concatenated list of the secondary outcome measures for a study||design_outcomes.measure where type='secondary'||nct_id & names: string containing comma delimited list of measures|
|ctgov||all_sponsors||concatenated list of the study sponsors||sponsors.name||nct_id & names: string containing comma delimited list of sponsor names|
|ctgov||all_states||concatenated list of the states where the study is being conducted||facilities.state||nct_id & names: string containing comma delimited list of state names|
All ClinicalTrials.gov data are available in the ctgov schema of the AACT database.
Clinicaltrials.gov assigns each study two sets of MeSH terms. Browse Conditions for conditions studied in the trial & Browse Interventions for interventions used in the trial. Earlier versions of the data did not include the full MeSH hierarchy, but studies now contain the full hierarchy.
The column mesh_type indicates if the term is a leaf identified as “mesh-list” or an ancestor identified as “mesh-ancestor”.
When data submitters provide information to ClinicalTrials.gov about a study, they’re encouraged to use Medical Subject Heading (MeSH) \ terminology for interventions, conditions, and keywords. The browse_conditions and browse_interventions tables contain MeSH terms generated by an algorithm run by NLM. The NLM algorithm is re-run nightly on all studies in the ClinicalTrials.gov database, and sources the most up-to-date information in the study record, the latest version of the algorithm, and the version of the MeSH thesaurus in use at that time.
The National Library of Medicine updates the MeSH Thesaurus each year. Please refer to the National Library of Medicine's (NLM) documentation for authoritative definitions of the data elements:
The online data dictionary includes links to sections of this NLM documentation to provide documentation about specific data element.