AACT Data Dictionary

AACT is composed of 51 tables that provide information related to clinical trials. The database contains multiple schemas, the main one being 'ctgov' which provides data retrieved from ClinicalTrials.gov. The main table in the ctgov schema is 'studies' which relates to all other ctgov tables through the NCT_ID. The ctgov schema also includes 2 tables that provide MeSH terms (Medical Subject Headings) which are published by the National Library of Medicine (NLM). The NLM has populated browse_conditions and browse_interventions tables with the MeSH terms they've determined help describe a study. The NLM updates the MeSH thesaurus each year. AACT provides some older versions of the MeSH thesaurus in the 'mesh_archive' schema.

AACT also includes a set of project schemas (prefixed with 'proj_') which contain datasets collected/curated by previous researchers who used the AACT database to conduct their study. These datasets enhance the value of the clinical trials data in a number of ways. Descriptions of project schema tables & columns are included in the data dictionary below. The Projects page provides more comprehensive information about these datasets.

Click here to view a listing of all 51 tables with description and row counts. Detailed information about all data elements included in these tables can be found in the data dictionary below.

AACT Data Elements

The Data Dictionary (table below) provides detailed information about each data element in the AACT database. Every study-related table/column in the AACT relational database is represented as a row in this table. There're only 3 examples presented for Enumerations. (AACT includes a few administrative tables that contain data not directly related to studies. Information about these tables is not included in the data dictionary.)

Click on the icon that appears at the beginning of the data element row to view the section of the NLM documentation that defines that particular data element. (The icon appears only for data element defined by NLM.)

  • Sort: Click on a column header to sort the table alphabetically by the values in that column.

  • Search/Filter: An input box appears under each column name. Enter a search term in one or more boxes & press [Enter] to filter the display. You many enter values in multiple columns to further restrict a search. The filter ignores case and will find all data elements that 'include' the term you enter (ie. your term does not need to be the complete value.)

  • Search Support: The first column provides icons that support filtering. The larger magnifying glass in the top row toggles display of the filter row. The smaller magnifying glass below it can be clicked to launch a search (same as pressing [Enter]). Clicking the funnel icon will clear filter values.

  • Authoritative Source: Please refer to the National Library of Medicine's (NLM) documentation for official definitions of all study/protocol and results data data elements. The icon that appears at the beginning of a row will open a tab in your browser to display NLM information about the data element on that row.

  • Horizontal Scroll: All columns do not fit on a page; you may scroll horizontally to view and filter on additional information about the data elements.

  • Source Column This column identifies the XML tag path used to obtain values for the data element from ClinicalTrials.gov API. Click here for the XML format specifications (xsd) and here to see an example of a study as provided by NLM via their API; this represents the source information used to populate the AACT database.

  • Row Count Column Each table's primary key definition includes the table's current row count.

  • Enumerations Column Some AACT table columns are used to store a discrete set of predefined terms (enumerations). For example, Public::Study.enrollment_type can only contain values: 'Anticipated', 'Actual' or null. If the table/column is for a set of enumerations, the data dictionary presents each of the enumeration values along with the corresponding number of rows with that value.

AACT Tables

The list below describes the 51 study-related tables in the AACT database and provides current row counts for each. Project tables are also defined at the bottom of this table.

Schema Name Row Count Description Domain Rows per Study
ctgov baseline_counts 139,556 Sample size at baseline for each study group; usually a count of participants but can represent other units of measure such as 'hands', 'hips', etc. Results many
ctgov baseline_measurements 1,447,638 Summaries of demographic & baseline measures collected by arm or comparison group and for the entire population of participants in the clinical study.  Results many
ctgov brief_summaries 374,770 A single text column that provides a brief description of the study. Protocol one
ctgov browse_conditions 633,066 NLM uses an internal algorithm to assess the data entered for a study and creates a list of standard MeSH terms that describe the condition(s) being addressed by the clinical trial. This table provides the results of NLM's assessment Protocol many
ctgov browse_interventions 255,332 NLM uses an internal algorithm to assess the data entered for a study and creates a list of standard MeSH terms that describe the intervention(s) being addressed by the clinical trial. This table provides the results of NLM's assessment Protocol many
ctgov calculated_values 375,612 An AACT-provided table that contains info that's been calculated from the information received from ClinicalTrials.gov. For example, number_of_facilities and actual_duration are provided in this table. Protocol one
ctgov central_contacts 147,806 Contact info for people (primary & backup) who can answer questions concerning enrollment at any location of the study. Protocol many
ctgov conditions 636,557 Name(s) of the disease(s) or condition(s) studied in the clinical study, or the focus of the clinical study. Can include NLM's Medical Subject Heading (MeSH)-controlled vocabulary terms. Protocol many
ctgov countries 530,128 Countries in which the study has facilities/sites. Protocol many
ctgov design_group_interventions 820,143 A cross reference for groups/interventions. If a study has multiple groups and multiple interventions, this table shows which interventions are associated with which groups. Protocol many
ctgov design_groups 668,350 Defines the protocol-specified group, subgroup, or cohort of participants in a clinical trial assigned to receive specific intervention(s) or observations according to a protocol. Protocol many
ctgov design_outcomes 2,088,215 Description of planned outcome measures and observations that will describe patterns of diseases and traits/associations with exposures, risk factors or treatment. Protocol many
ctgov designs 375,612 Description of how the study will be conducted, including comparison group design and strategies for masking and allocating participants. Protocol one
ctgov detailed_descriptions 249,571 A single text column that provides a detailed description of the study protocol. Protocol one
ctgov documents 9,733 The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time. Results many
ctgov drop_withdrawals 357,711 Summarized information about how many participants withdrew from the study, when and why. This information explains disposition of participants relative to the numbers starting and completing the study (enumerated in the Milestones table). Results many
ctgov eligibilities 375,612 Information about the criteria used to select participants; includes inclusion and exclusion criteria Protocol one
ctgov facilities 2,494,726 Name, address and recruiting status of the facilities participating in the study. Protocol many
ctgov facility_contacts 334,112 Contact information for people responsible for the study at each facility. (primary and backup) Facility contact information is available if the facility status (Facilities.Status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. Contact information is removed from the publicly available content at ClinicalTrials.gov when the facility is no longer recruiting, or when the overall study status (Studies.Overall_status) changes to indicate that the study has completed recruitment. Protocol many
ctgov facility_investigators 224,737 Names of the investigators at each study facility. Facility investigator information is available if the facility status (Facilities.Status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. Investigator information is removed from the publicly available content at ClinicalTrials.gov when the facility is no longer recruiting, or when the overall study status (Studies.Overall_status) changes to indicate that the study has completed recruitment. Protocol many
ctgov id_information 506,342 Identifiers (other than the NCT ID) that uniquely identify the study such as that assigned by the sponsor, or an NCT ID that had previously been used for the study. Protocol many
ctgov intervention_other_names 326,707 Terns or phrases that are synonymous with an intervention. (Each row is linked to one of the interventions associated with the study.) Protocol many
ctgov interventions 643,512 The interventions or exposures (including drugs, medical devices, procedures, vaccines, and other products) of interest to the study, or associated with study arms/groups. Protocol many
ctgov keywords 1,012,626 Provides words or phrases that best describe the protocol. Keywords help users find studies in the database. Can include NLM's Medical Subject Heading (MeSH)-controlled vocabulary terms. Protocol many
ctgov links 56,669 Web site directly relevant to the protocol. (ie, links to educational, research, government, and other non-profit Web pages) Protocol many
ctgov milestones 511,104 Information summarizing the progress of participants through each stage of a study, including the number of participants who started and completed the trial. Enumeration of participants not completing the study is included in the Drop_Withdrawals table. Results many
ctgov outcome_analyses 206,592 Results of scientifically appropriate statistical analyses performed on primary and secondary study outcomes. Includes results for treatment effect estimates, confidence intervals and othe rmeasures of dispersion, and p-values. Results many
ctgov outcome_analysis_groups 399,943 Identifies the comparison groups that were involved with each outcome analysis Results many
ctgov outcome_counts 907,566 Sample size included in analysis for each outcome for each study group; usually participants but can represent other units of measure such as eyes 'lesions', etc. Results many
ctgov outcome_measurements 2,907,984 Summary data for primary and secondary outcome measures for each study group. Includes parameter estimates and measures of dispersion/precision. Results many
ctgov outcomes 379,655 Descriptions of outcomes, or observation that were measured to determine patterns of diseases or traits, or associations with exposures, risk factors, or treatment. Includes information such as time frame, population and units. (Specific measurement results are stored in the Outcome_Measurements table.) Results many
ctgov overall_officials 374,697 People responsible for the overall scientific leadership of the protocol including the principal investigator. Protocol many
ctgov participant_flows 48,700 Recruitment information relevant to the recruitment process & pre-assignment details (ie. significant events in the study that occur after participant enrollment, but prior to assignment of participants). Information about participant flow that applies to all milestones. Results one
ctgov pending_results 17,914 Provides information about events related to the submission of study results for quality control (QC) review before the results are publicly posted. Events reported: submissions, cancellations and returns for modifications. 'Unknown' is specified for cancellations that occurred before 05/08/2018 (when this data began being collected). When a study passes quality control review: 1) results_first_submitted_date is set to the study's first submission date, 2) results_first_submitted_qc_date is set to the submission date of the version of results that passed QC, 3) the study's pending_results rows are removed, and 4) the results are posted on ClinicalTrials.gov. The latest versions of all studies are posted every business day but, there can be unexpected delays. The results_first_posted_date value will usually be identified as an 'Estimate' when first posted. This will switch to 'Actual' (and the date may be adjusted) on the next posting cycle, when the true posting date is known. Results many
ctgov provided_documents 23,790 The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time.  Results many
ctgov reported_events 6,481,059 Summary information about reported adverse events (any untoward or unfavorable medical occurrence to participants, including abnormal physical exams, laboratory findings, symptoms, or diseases), including serious adverse events, other adverse events, and mortality. Results many
ctgov responsible_parties 356,790 People who have access to and control over the data from the study, have the right to publish study results, and have the ability to meet all of the requirements for the submission of study information. Protocol many
ctgov result_agreements 48,700 Info about whether an agreement exists between the sponsor & the principal investigators (PIs) that restricts the PIs ability to discuss study results at scientific meetings or other public or private forums, or to publish info concerning the study in scientific or academic journals after the study is completed. Results many
ctgov result_contacts 48,700 Point of contact for scientific information about the clinical study results information.  Results many
ctgov result_groups 1,262,140 Consolidated, aggregate list of group titles/descriptions used for reporting summary results information. Results many
ctgov search_results 5,496 This joins studies with saved queries withing the study_searches table n/a many
ctgov sponsors 600,257 Name of study sponsors and collaborators. The sponsor is the entity or individual initiating the study. Collaborators are other organizations providing support, including funding, design, implementation, data analysis, and reporting. Protocol many
ctgov studies 375,612 Basic info about study, including study title, date study registered with ClinicalTrials.gov, date results first posted to ClinicalTrials.gov, dates for study start and completion, phase of study, enrollment status, planned or actual enrollment, number of study arms/groups, etc. Protocol & Results one
ctgov study_references 1,145,888 Citations to publications related to the study protocol and/or results. Includes PubMed Unique Identifier (PMID) and/or full bibliographic citation.  Protocol many
ctgov study_searches 1 These are saved queries that are used to search ClinicalTrials.gov n/a none
proj_tag_study_characteristics tagged_terms 115 Table of terms determined to be either mental health, oncology or cardiovascular-related by a team of Duke clinicians. The source of the terms is the 2010 MeSH thesaurus as well as free-text terms/phrases used to identify interventional studies registered in ClinicalTrials.gov between 2007 and 2010. Project n/a
ctgov ipd_information_types 30,155
ctgov mesh_headings 0
ctgov mesh_terms 0
ctgov reported_event_totals 0
ctgov retractions 0

AACT Views & Functions

Database views & functions have been provided to facilitate data retrieval. These features are described below. The first table identifies a set of views that provide concatenate sets of values per study. These views are useful for people who need to generate a spreadsheet that contains one row per study and would like to include info for which studies often have multiple values, such as conditions. For example, if you need a comma-separated list of the conditions associated with a study, you may get that from the all_conditions view. These views are in the ctgov schema; all include 2 columns: NCT_ID & Names.

Here is a sample query:

    

aact=# select * from ctgov.all_conditions where nct_id='NCT00000146';

Schema View/Function name Description Source Data Data Returned Example
ctgov all_conditions concatenated list of all conditions (MeSH term) identified for a study browse_conditions.mesh_term nct_id & names: string containing comma delimited list of condtions select * from all_conditions where nct_id = '';
ctgov all_countries concatenates all countries associated with a study (excluding those identified as having been removed) countries.name nct_id & names: string containing comma delimited list of countries
ctgov all_design_outcomes design_outcomes.measure nct_id & names: string containing comma delimited list of outcome measures
ctgov all_facilities concatenated list of all facility names associated with a study facilities.name nct_id & names: string containing comma delimited list of facility names
ctgov all_group_types concantenated list of the arm/group types included in a study design_groups.group_type nct_id & names: string containing comma delimited list of group types
ctgov all_id_information concatenated list of the IDs associated with a study id_information.id_value nct_id & names: string containing comma delimited list of ids
ctgov all_interventions concatenated list of all interventions (MeSH term) identified for a study browse_interventions.mesh_term nct_id & names: string containing comma delimited list of mesh terms
ctgov all_intervention_types concatenated list of all intervention types for a study interventions.intervention_type nct_id & names: string containing comma delimited list of intervention types
ctgov all_keywords concatenated list of the keywords associated with a study keywords.name nct_id & names: string containing comma delimited list of keywords
ctgov all_primary_outcome_measures concatenated list of the primary outcome measures for a study design_outcomes.measure where type='primary' nct_id & names: string containing comma delimited list of measures
ctgov all_secondary_outcome_measures concatenated list of the secondary outcome measures for a study design_outcomes.measure where type='secondary' nct_id & names: string containing comma delimited list of measures
ctgov all_sponsors concatenated list of the study sponsors sponsors.name nct_id & names: string containing comma delimited list of sponsor names
ctgov all_states concatenated list of the states where the study is being conducted facilities.state nct_id & names: string containing comma delimited list of state names

MeSH Thesaurus

All ClinicalTrials.gov data are available in the ctgov schema of the AACT database.

Clinicaltrials.gov assigns each study two sets of MeSH terms. Browse Conditions for conditions studied in the trial & Browse Interventions for interventions used in the trial. Earlier versions of the data did not include the full MeSH hierarchy, but studies now contain the full hierarchy.

The column mesh_type indicates if the term is a leaf identified as “mesh-list” or an ancestor identified as “mesh-ancestor”.

When data submitters provide information to ClinicalTrials.gov about a study, they’re encouraged to use Medical Subject Heading (MeSH) \ terminology for interventions, conditions, and keywords. The browse_conditions and browse_interventions tables contain MeSH terms generated by an algorithm run by NLM. The NLM algorithm is re-run nightly on all studies in the ClinicalTrials.gov database, and sources the most up-to-date information in the study record, the latest version of the algorithm, and the version of the MeSH thesaurus in use at that time.

The National Library of Medicine updates the MeSH Thesaurus each year. Please refer to the National Library of Medicine's (NLM) documentation for authoritative definitions of the data elements:

The online data dictionary includes links to sections of this NLM documentation to provide documentation about specific data element.