AACT Data Dictionary

AACT is composed of 52 tables that provide information related to clinical trials. The database contains multiple schemas, the main one being 'ctgov' which provides data retrieved from ClinicalTrials.gov. The main table in the ctgov schema is 'studies' which relates to all other ctgov tables through the NCT_ID. The ctgov schema also includes 2 tables that provide MeSH terms (Medical Subject Headings) which are published by the National Library of Medicine (NLM). The NLM has populated browse_conditions and browse_interventions tables with the MeSH terms they've determined help describe a study. The NLM updates the MeSH thesaurus each year. AACT provides some older versions of the MeSH thesaurus in the 'mesh_archive' schema.

AACT also includes a set of project schemas (prefixed with 'proj_') which contain datasets collected/curated by previous researchers who used the AACT database to conduct their study. These datasets enhance the value of the clinical trials data in a number of ways. Descriptions of project schema tables & columns are included in the data dictionary below. The Projects page provides more comprehensive information about these datasets.

Click here to view a listing of all 52 tables with description and row counts. Detailed information about all data elements included in these tables can be found in the data dictionary below.

AACT Data Elements

The Data Dictionary (table below) provides detailed information about each data element in the AACT database. Every study-related table/column in the AACT relational database is represented as a row in this table. (AACT includes a few administrative tables that contain data not directly related to studies. Information about these tables is not included in the data dictionary.)

Click on the icon that appears at the beginning of the data element row to view the section of the NLM documentation that defines that particular data element. (The icon appears only for data element defined by NLM.)

  • Sort: Click on a column header to sort the table alphabetically by the values in that column.

  • Search/Filter: An input box appears under each column name. Enter a search term in one or more boxes & press [Enter] to filter the display. You many enter values in multiple columns to further restrict a search. The filter ignores case and will find all data elements that 'include' the term you enter (ie. your term does not need to be the complete value.)

  • Search Support: The first column provides icons that support filtering. The larger magnifying glass in the top row toggles display of the filter row. The smaller magnifying glass below it can be clicked to launch a search (same as pressing [Enter]). Clicking the funnel icon will clear filter values.

  • Authoritative Source: Please refer to the National Library of Medicine's (NLM) documentation for official definitions of all study/protocol and results data data elements. The icon that appears at the beginning of a row will open a tab in your browser to display NLM information about the data element on that row.

  • Horizontal Scroll: All columns do not fit on a page; you may scroll horizontally to view and filter on additional information about the data elements.

  • Source Column This column identifies the XML tag path used to obtain values for the data element from ClinicalTrials.gov API. Click here for the XML format specifications (xsd) and here to see an example of a study as provided by NLM via their API; this represents the source information used to populate the AACT database.

  • Row Count Column Each table's primary key definition includes the table's current row count.

  • Enumerations Column Some AACT table columns are used to store a discrete set of predefined terms (enumerations). For example, Public::Study.enrollment_type can only contain values: 'Anticipated', 'Actual' or null. If the table/column is for a set of enumerations, the data dictionary presents each of the enumeration values along with the corresponding number of rows with that value.

AACT Tables

The list below describes the 52 study-related tables in the AACT database and provides current row counts for each. Project tables are also defined at the bottom of this table.

Schema Name Row Count Description Domain Rows per Study
ctgov baseline_counts 108,409 Sample size at baseline for each study group; usually a count of participants but can represent other units of measure such as 'hands', 'hips', etc. Results many
ctgov baseline_measurements 1,031,638 Summaries of demographic & baseline measures collected by arm or comparison group and for the entire population of participants in the clinical study.  Results many
ctgov brief_summaries 313,239 A single text column that provides a brief description of the study. Protocol one
ctgov browse_conditions 559,925 NLM uses an internal algorithm to assess the data entered for a study and creates a list of standard MeSH terms that describe the condition(s) being addressed by the clinical trial. This table provides the results of NLM's assessment Protocol many
ctgov browse_interventions 319,371 NLM uses an internal algorithm to assess the data entered for a study and creates a list of standard MeSH terms that describe the intervention(s) being addressed by the clinical trial. This table provides the results of NLM's assessment Protocol many
ctgov calculated_values 314,056 An AACT-provided table that contains info that's been calculated from the information received from ClinicalTrials.gov. For example, number_of_facilities and actual_duration are provided in this table. Protocol one
ctgov central_contacts 121,931 Contact info for people (primary & backup) who can answer questions concerning enrollment at any location of the study. Protocol many
ctgov conditions 520,304 Name(s) of the disease(s) or condition(s) studied in the clinical study, or the focus of the clinical study. Can include NLM's Medical Subject Heading (MeSH)-controlled vocabulary terms. Protocol many
ctgov countries 452,339 Countries in which the study has facilities/sites. Protocol many
ctgov design_group_interventions 678,565 A cross reference for groups/interventions. If a study has multiple groups and multiple interventions, this table shows which interventions are associated with which groups. Protocol many
ctgov design_groups 549,527 Defines the protocol-specified group, subgroup, or cohort of participants in a clinical trial assigned to receive specific intervention(s) or observations according to a protocol. Protocol many
ctgov design_outcomes 1,621,669 Description of planned outcome measures and observations that will describe patterns of diseases and traits/associations with exposures, risk factors or treatment. Protocol many
ctgov designs 314,056 Description of how the study will be conducted, including comparison group design and strategies for masking and allocating participants. Protocol one
ctgov detailed_descriptions 205,267 A single text column that provides a detailed description of the study protocol. Protocol one
ctgov documents 10,021 The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time. Results many
ctgov drop_withdrawals 285,130 Summarized information about how many participants withdrew from the study, when and why. This information explains disposition of participants relative to the numbers starting and completing the study (enumerated in the Milestones table). Results many
ctgov eligibilities 314,056 Information about the criteria used to select participants; includes inclusion and exclusion criteria Protocol one
ctgov facilities 2,200,316 Name, address and recruiting status of the facilities participating in the study. Protocol many
ctgov facility_contacts 276,722 Contact information for people responsible for the study at each facility. (primary and backup) Facility contact information is available if the facility status (Facilities.Status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. Contact information is removed from the publicly available content at ClinicalTrials.gov when the facility is no longer recruiting, or when the overall study status (Studies.Overall_status) changes to indicate that the study has completed recruitment. Protocol many
ctgov facility_investigators 195,278 Names of the investigators at each study facility. Facility investigator information is available if the facility status (Facilities.Status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. Investigator information is removed from the publicly available content at ClinicalTrials.gov when the facility is no longer recruiting, or when the overall study status (Studies.Overall_status) changes to indicate that the study has completed recruitment. Protocol many
ctgov id_information 431,755 Identifiers (other than the NCT ID) that uniquely identify the study such as that assigned by the sponsor, or an NCT ID that had previously been used for the study. Protocol many
ctgov intervention_other_names 278,204 Terns or phrases that are synonymous with an intervention. (Each row is linked to one of the interventions associated with the study.) Protocol many
ctgov interventions 543,769 The interventions or exposures (including drugs, medical devices, procedures, vaccines, and other products) of interest to the study, or associated with study arms/groups. Protocol many
ctgov keywords 875,816 Provides words or phrases that best describe the protocol. Keywords help users find studies in the database. Can include NLM's Medical Subject Heading (MeSH)-controlled vocabulary terms. Protocol many
ctgov links 49,664 Web site directly relevant to the protocol. (ie, links to educational, research, government, and other non-profit Web pages) Protocol many
ctgov milestones 394,134 Information summarizing the progress of participants through each stage of a study, including the number of participants who started and completed the trial. Enumeration of participants not completing the study is included in the Drop_Withdrawals table. Results many
ctgov outcome_analyses 160,356 Results of scientifically appropriate statistical analyses performed on primary and secondary study outcomes. Includes results for treatment effect estimates, confidence intervals and othe rmeasures of dispersion, and p-values. Results many
ctgov outcome_analysis_groups 310,018 Identifies the comparison groups that were involved with each outcome analysis Results many
ctgov outcome_counts 687,348 Sample size included in analysis for each outcome for each study group; usually participants but can represent other units of measure such as eyes 'lesions', etc. Results many
ctgov outcome_measurements 2,180,789 Summary data for primary and secondary outcome measures for each study group. Includes parameter estimates and measures of dispersion/precision. Results many
ctgov outcomes 290,058 Descriptions of outcomes, or observation that were measured to determine patterns of diseases or traits, or associations with exposures, risk factors, or treatment. Includes information such as time frame, population and units. (Specific measurement results are stored in the Outcome_Measurements table.) Results many
ctgov overall_officials 322,006 People responsible for the overall scientific leadership of the protocol including the principal investigator. Protocol many
ctgov participant_flows 38,269 Recruitment information relevant to the recruitment process & pre-assignment details (ie. significant events in the study that occur after participant enrollment, but prior to assignment of participants). Information about participant flow that applies to all milestones. Results one
ctgov pending_results 21,038 Provides information about events related to the submission of study results for quality control (QC) review before the results are publicly posted. Events reported: submissions, cancellations and returns for modifications. 'Unknown' is specified for cancellations that occurred before 05/08/2018 (when this data began being collected). When a study passes quality control review: 1) results_first_submitted_date is set to the study's first submission date, 2) results_first_submitted_qc_date is set to the submission date of the version of results that passed QC, 3) the study's pending_results rows are removed, and 4) the results are posted on ClinicalTrials.gov. The latest versions of all studies are posted every business day but, there can be unexpected delays. The results_first_posted_date value will usually be identified as an 'Estimate' when first posted. This will switch to 'Actual' (and the date may be adjusted) on the next posting cycle, when the true posting date is known. Results many
ctgov provided_documents 9,716 The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time.  Results many
ctgov reported_events 4,920,252 Summary information about reported adverse events (any untoward or unfavorable medical occurrence to participants, including abnormal physical exams, laboratory findings, symptoms, or diseases), including serious adverse events, other adverse events, and mortality. Results many
ctgov responsible_parties 294,888 People who have access to and control over the data from the study, have the right to publish study results, and have the ability to meet all of the requirements for the submission of study information. Protocol many
ctgov result_agreements 38,269 Info about whether an agreement exists between the sponsor & the principal investigators (PIs) that restricts the PIs ability to discuss study results at scientific meetings or other public or private forums, or to publish info concerning the study in scientific or academic journals after the study is completed. Results many
ctgov result_contacts 38,269 Point of contact for scientific information about the clinical study results information.  Results many
ctgov result_groups 963,628 Consolidated, aggregate list of group titles/descriptions used for reporting summary results information. Results many
ctgov sponsors 499,400 Name of study sponsors and collaborators. The sponsor is the entity or individual initiating the study. Collaborators are other organizations providing support, including funding, design, implementation, data analysis, and reporting. Protocol many
ctgov studies 314,056 Basic info about study, including study title, date study registered with ClinicalTrials.gov, date results first posted to ClinicalTrials.gov, dates for study start and completion, phase of study, enrollment status, planned or actual enrollment, number of study arms/groups, etc. Protocol & Results one
ctgov study_references 414,197 Citations to publications related to the study protocol and/or results. Includes PubMed Unique Identifier (PMID) and/or full bibliographic citation.  Protocol many
proj_tag_nephrology tagged_terms Table of terms determined to be nephrology-related by a team of Duke clinicians. The source of the terms is the 2010 MeSH thesaurus as well as free-text terms/phrases used to identify interventional studies registered in ClinicalTrials.gov between 2007 and 2010. Project n/a
proj_tag_nephrology analyzed_studies Information identifying the studies used to support the 2014 JK Inrig publication in Am J Kidney Disease. The studies identified were determined to be nephrology-related in the course of conducting the investigation. Project n/a
proj_cdek_standard_orgs cdek_organizations Table of standard organization names that serve to Project n/a
proj_cdek_standard_orgs cdek_synonyms Table of organization names that have been entered into ClinicalTrials.gov, along with the preferred (standard) name as determined by CDEK. Project n/a
proj_results_reporting analyzed_studies Trials that were determined likely subject to FDAAA provisions (highly likely applicable clinical trials, or HLACTs) from 2008 through 2013. Regression models were used to examine characteristics associated with reporting at 12 months and throughout the 5-year study period. Project n/a
proj_tag_study_characteristics oncology_studies 0 Trials determined to be oncology related for the purpose of the Study Characteristics investigation. Project n/a
proj_tag_study_characteristics mental_health_studies 0 Trials determined to be mental health related for the purpose of the Study Characteristics investigation. Project n/a
proj_tag_study_characteristics cardiovascular_studies 0 Trials determined to be cardiovascular related for the purpose of the Study Characteristics investigation. Project n/a
proj_tag_study_characteristics tagged_terms Table of terms determined to be either mental health, oncology or cardiovascular-related by a team of Duke clinicians. The source of the terms is the 2010 MeSH thesaurus as well as free-text terms/phrases used to identify interventional studies registered in ClinicalTrials.gov between 2007 and 2010. Project n/a

AACT Views & Functions

Database views & functions have been provided to facilitate data retrieval. These features are described below. The first table identifies a set of views that provide concatenate sets of values per study. These views are useful for people who need to generate a spreadsheet that contains one row per study and would like to include info for which studies often have multiple values, such as conditions. For example, if you need a comma-separated list of the conditions associated with a study, you may get that from the all_conditions view. These views are in the ctgov schema; all include 2 columns: NCT_ID & Names.

Here is a sample query:

    

aact=# select * from ctgov.all_conditions where nct_id='NCT00000146';

design_outcomes.measure (where type='primary')
View Source Data Sample Output
all_conditionsbrowse_conditions.mesh_termNCT00000146 | Multiple Sclerosis|Neuritis|Optic Neuritis
all_countriescountries.nameNCT00000919 | Italy|Puerto Rico|United States
all_design_outcomesdesign_outcomes.measureNCT00000208 | Drug use|Opiate craving|Withdrawal symptoms
all_facilitiesfacilities.nameNCT00000403 | University of Arizona Arthritis Center|University of Pittsburgh Medical Center|Arthritis Research Center Foundation|Indiana University Medical Center|Northwestern University Medical Center|University of Alabama at Birmingham
all_group_typesdesign_groups.group_typeNCT00002569 | Active Comparator|Experimental
all_id_informationid_information.id_valueNCT03803644 | CC-92480-CP-001|U1111-1224-6768
all_interventionsbrowse_interventions.mesh_termNCT03802422 | Vitamin B Complex|Hydroxocobalamin|Vitamin B 12|Vitamins
all_intervention_typesinterventions.intervention_typeNCT03802994 | Drug|Biological|Other
all_keywordskeywords.nameNCT03801915 | Pancreas and Liver Resections|Recurrence Free Survival|Well Tolerated Agent
all_primary_outcome_measuresNCT03802162 | AUCt of D324|AUCt of D797|Cmax of D324|Cmax of D797
all_secondary_outcome_measuresdesign_outcomes.measure (where type='secondary')NCT03803397 | Overall Survival|Progression-Free Survival|Tumor Response
all_sponsorssponsors.nameNCT03803254 | Guangzhou No.12 People's Hospital|Kaiping Central Hospital|Sun Yat-sen University
all_statesfacilities.stateNCT03802877 | Alberta|Manitoba|Ontario|Quebec

MeSH Thesaurus

All ClinicalTrials.gov data are available in the ctgov schema of the AACT database. The ctgov schema also includes two tables from another source; mesh_terms & mesh_headings provide a recent copy of the Medical Subject Headings (MeSH terms & headings) published by the National Library of Medicine. These 2 tables are included to help people who use MeSH terms to categorize studies. (ClinicalTrials.gov uses browse_conditions & browse_interventions to link studies to related MeSH terms.)

Earlier versions of the MeSH Thesaurus that were previously included in AACT are availabile in the mesh_archives schema.

The National Library of Medicine updates the MeSH Thesaurus each year. Select annual versions of MeSH terms are provided in the mesh_archive schema.

Please refer to the National Library of Medicine's (NLM) documentation for authoritative definitions of the data elements:

Connect
Download
Learn
Shared Data