What is AACT?
AACT is the database for Aggregate Analysis of ClinicalTrials.gov. This version of AACT is a PostgreSQL relational database containing information about clinical studies that have been been registered at ClinicalTrials.gov. AACT includes all of the protocol and results data elements for studies that are publicly available at ClinicalTrials.gov. Content is downloaded daily from ClinicalTrials.gov and loaded into AACT.
What population of studies is represented in AACT?
All studies registered and publicly available in ClinicalTrials.gov are included in AACT. The ClinicalTrials.gov was released for the registration of studies on February 29, 2000. The registry accepts interventional studies in which participants are assigned according to a research protocol to receive specific interventions, as well as observational studies. It also includes Expanded Access records which describe the procedure for obtaining an experimental drug or device for patients who are not adequately treated by existing therapy and who are unable to participate in a controlled clinical study.
The registration of studies and reporting of results and adverse events has been mandated to a large extent by requirements (both legal and institutional) implemented as part of the Food and Drug Administration Amendments Act (FDAAA), as well as by requirements introduced by the International Committee of Medical Journal Editors (ICMJE), the European Medicines Agency (EMA) and the National Institutes of Health (NIH) regarding registration and reporting of results of clinical studies. Table 1 describes the scope of these requirements.
Table 1: Scope of Interventional Studies Covered by Major Reporting Policies*
January 18, 2017. The policy is effective for applications for funding, including grants, other transactions, and contracts submitted on or after January 18, 2017. For the NIH intramural program, the policy applies to clinical trials initiated on or after January 18, 2017.
Every clinical trial funded in whole or in part by NIH is expected to be registered on ClinicalTrials.gov and have summary results information submitted and posted in a timely manner, whether subject to FDAAA 801 or not.
Timelines for registration and results/adverse event reporting are the same as for trials subject to FDAAA 801.
January 18, 2017. The policy is effective for applications for funding, including grants, other transactions, and contracts submitted on or after January 18, 2017. For the NIH intramural program, the policy applies to clinical trials initiated on or after January 18, 2017. Timelines for registration and results/adverse event reporting are the same as for trials subject to FDAAA 801.
|NCI Access Policy
The NCI issued its Policy Ensuring Public Availability of Results from NCI-supported Clinical Trials. Generally, for "all initiated or commenced NCI-Supported Interventional Clinical Trials whether extramural or intramural" (Covered Trials), "Final Trial Results are expected to be reported in a publicly accessible manner within 12 months of the Trial's Primary Completion Date regardless of whether the clinical trial was completed as planned or terminated earlier." This policy "will be incorporated as a Term and Condition of the award."
|FDAAA and Final Rule
The following must be registered in ClinicalTrials.gov ('Applicable Clinical Trials (ACTs)':
- Interventional studies of drugs, biologics, or devices (whether or not approved for marketing)
- Studies phases 2 through 4
- Studies with at least 1 US site or conducted under IND/IDE
Results and adverse event reporting is required for studies that meet the above registration requirements if they study drugs, biologics, or devices that are approved, licensed, or cleared by the FDA.
The Final Rule clarified the definition of an ACT, and expanded results and adverse events reporting requirements to include ACTs of unapproved products.
September 27, 2007. Studies initiated after this date, or with a completion date later than December 25, 2007 are subject to FDAAA requirements. Registration is required no later than 21 days after first patient is enrolled. Results and adverse events must be reported for these studies (if required) within 1 year of completing data collection for the pre-specified primary outcome.
For ACTs of devices not previously approved or cleared by FDA, public posting of registration information is delayed until after FDA approval/clearance.
September, 2008. Results reporting launched with optional adverse event reporting.
September, 2009. Adverse event information became required.
January 18, 2017. Final Rule for FDAAA 801 effective, with compliance expected as of April 18, 2017.
- Responsible parties of ACTs of devices not previously cleared or approved by FDA may authorize NIH to post registration information prior to FDA approval/ clearance.
- For ACTs of unapproved products, results reporting may be delayed for up to 2 additional years (i.e., up to 3 years total after the primary completion date).
Interventional studies of any intervention type, phase, or geographical location must be registered in ClinicalTrials.gov or other approved registry.
No results reporting requirements.
July 1, 2005. Studies initiated after this date must be registered before first patient enrolled; studies initiated before this date must be retrospectively registered to be considered for publication.
The following must be registered in ClinicalTrials.gov or other approved registry:
- Interventional studies of drugs or biologics (whether or not approved for marketing)
- Pediatric phase 1 studies;
- Studies in phases 2 through 4
- Studies taking place in at least 1 EU site
Results reporting required for all studies that meet registration requirements.
May 1, 2004. EMA launched EudraCT
March 22, 2011. EMA launched the EU Clinical Trials Register
October 11, 2013. EMA expanded EudraCT to include summary results.
* Adapted from The ClinicalTrials.gov results database – update and key issues and ClinicalTrials.gov summary of selected events, policies, and laws related to the development and expansion of ClinicalTrials.gov. For complete descriptions of policy requirements, see the references cited. EMA denotes European Medicines Agency; EU, European Union; FDAAA, Food and Drug Administration Amendments Act; ICMJE, International Committee of Medical Journal Editors; IDE, investigational device exemption; IND, investigational new drug application, NCI, National Cancer Institute; NIH, National Institutes of Health; US, United States.
Based on these policies, the following are examples of characteristics that may influence the likelihood that a study is included in the ClinicalTrials.gov registry:
- Interventional studies are more likely to be registered than observational studies.
- Studies that began before the ICMJE requirement in July, 2005 are less likely to be registered, especially if their results are unpublished (e.g., negative studies).
- Studies with drug, biological, or device interventions are more likely to be registered than studies of other interventions.
- Studies with at least one site in the United States or European Union are more likely to be registered than studies with no such sites.
- Studies involving a drug or device that is manufactured in the United States are more likely to be registered than studies involving a drug or device manufactured outside of the United States.
- Studies subject to an IND or IDE are more likely to be registered (i.e., if the study is intended to support approval for marketing in the United States).
- Phase 1 adult drug studies or small feasibility studies of devices are less likely to be registered.
- Studies in pediatric populations may be more likely to be registered.
- Studies with NIH funding are likely to be registered, regardless of intervention and phase.
- Although studies of FDA regulated devices are likely to be registered, the registration records for studies of devices not yet approved or cleared may not be publicly posted until after the device is cleared or approved.
Is the information in AACT up-to-date?
The AACT database is updated every night at midnight, so the information in AACT is one day behind that which appears in ClinicalTrials.gov. The nightly update process uses the ClinicalTrials.gov RSS feed to identify studies that were added or changed in ClinicalTrials.gov the previous day, and then uses the ClinicalTrials.gov API to retrieve current data for just these studies. Only new/changed studies are created/modified. This process takes about one and a half hours, depending on how many studies were added/changed in ClinicalTrials.gov the previous day.
Occassionally, studies must be removed from ClinicalTrials.gov, such as those that were accidentally entered twice. The nightly update process does not detect removed studies so once a month, the AACT database is dropped and completely refreshed.
Both the nightly (incremental) and monthly (full) updates are performed on a 'background' database so that the publicly accessible database remains accessible while updates take place. When the process completes, the updated version is copied to the public database. This 'copy step' takes about 5 minutes, during which time the database is inaccessible. Since the nightly load starts at midnight and takes about one and a half hours, this 5-minute downtime usually occurs sometime between 1am & 2am.
Please refer to the posted update schedule for more details.
How are unique studies identified in AACT?
Studies registered at ClinicalTrials.gov are identified by a unique identifier, the NCT_ID. Because of the quality assurance measures applied by ClinicalTrials.gov staff on registration entries, we can be reasonably certain that each study (i.e., NCT_ID) entered in ClinialTrials.gov refers to a unique clinical study, however a small number of duplicate records may exist in the database.
How does content in AACT compare to what is in ClinicalTrials.gov?
AACT includes all protocol and results data for every study that is publicly available at ClinicalTrials.gov. All publicly available content from the current record for a study is included in AACT as it appears in ClinicalTrials.gov. The AACT database preserves the content from the source XML files downloaded from the ClinicalTrials.gov API; content is not cleaned or manipulated in any way. However, to help facilitate queries using AACT, several additional variables derived from the raw content are included in AACT's Calculated_Values table.
The history of changes to a study record (available at the ClinicalTrials.gov archive site) is not included in the current version of AACT.
What types of questions can be investigated using ClinicalTrials.gov data?
The AACT database contains both ‘study protocol’ and ‘results data’ elements. The protocol (or registration) records describe the study characteristics including sponsor, disease condition, type of intervention, participant eligibility, anticipated enrollment, study design, locations, and outcome measures. Summary results data elements including participant flow, baseline characteristics, outcome results, and frequencies of serious and other adverse events are included in AACT. The article by Tse et al may be helpful in understanding the components of the basic results that are reported at ClinicalTrials.gov.
How can protocol/registration data be used?
We anticipate that investigators will use the current database to explore the characteristics of selected subsets of clinical studies (e.g., typical enrollment for a phase 3 study in breast cancer patients), and to compare and contrast these characteristics across different subgroups of studies (e.g., sponsor; device versus drug intervention; or prevention versus treatment).
How can results and adverse events data be used?
Researchers may be able to use the basic results and adverse events summary data reported at ClinicalTrials.gov for meta-analysis or systematic review (e.g., to compare the efficacy and safety of different types of diabetes therapies). However, because only a small subset of studies registered at ClinicalTrials.gov are required to report results, the results data from ClinicalTrials.gov will most likely be a useful supplement to traditional data sources used for a meta-analysis or systematic review, such as published and unpublished manuscripts and abstracts, rather than the core data source. Standard techniques for valid meta-analysis or systematic review (e.g., PRISMA statement) should be used when determining how to appropriately identify and aggregate summary data gleaned from ClinicalTrials.gov and/or literature.)
How should data elements be interpreted?
When interpreting this information, you’re encouraged to refer to the authoritative definitions provided by the National Library of Medicine (NLM). The most recent data element definitions are available on the NLM site for studies and results data. Data interpretation may depend on:
- How the question was phrased. For example, the definition of “Sponsor” does not necessarily imply that the sponsor is the agency paying for the clinical study, as might be expected from the common use of the term.
- Whether the respondent can enter a free-text answer to a specific question, or is restricted to a fixed set of possible responses. Note that the definition of a data element and the available responses may have changed over time. The most recent data element definitions are available at the ClinicalTrials.gov site for study and results data. A history of changes through September 2011 for the study definitions can be viewed in the AACT 2011 Data Dictionary. Monthly copies of the AACT database are bundled with the data element definitions, AACT schema, and AACT data dictionary that were current at the time the copy was created.
- Whether there is dependence between fields. Certain data elements need to be interpreted together with other data elements. For example, data elements such as enrollment date and completion date have a companion data element that indicates whether the value in the first field is an anticipated or actual value.
Note that the study record may be updated by the owner of the record at any time. Fields such as enrollment type may be changed from anticipated to actual, indicating that the value entered now reflects the actual rather than the planned enrollment. When data are downloaded, the result is a static copy of the database at that particular time point, and the history of changes made to the field is lost.
How complete and accurate are the data?
The presence of a record in a table indicates that information was submitted to ClinicalTrials.gov for at least one element in that table before the data were downloaded from ClinicalTrials.gov. Some data elements are more/less likely than others to have missing information, depending on several known factors. For example:
- The data element being required by the FDAAA and/or the ClinicalTrials.gov website. Refer to NLM's study and results data element definitions for specifics regarding these requirements. Requirements may have changed over the history of the ClincalTrials.gov database.
- The date when the data element was introduced. Not all data elements were included in the database at the time of its launch in 2000, but were added later. Studies registered after FDAAA when into effect must meet more requirements than studies registered earlier in the life of ClinicalTrials.gov.
- The branching structure of questions. The availability of certain questions to the person submitting data depends on answers to previous questions. For example, questions about bio-specimen retention are only available for observational studies. Therefore, interventional studies should be excluded when analyzing data elements pertaining to bio-specimens.
- The list of possible answers for data elements with a fixed set of responses. For example, questions that include “N/A” as a possible response are likely to have fewer missing values than questions that do not provide a “N/A” response.
“Missingness” of data may also depend on other unknown factors. Regardless of the cause of missing data, users of ClinicalTrials.gov data sets are encouraged to specify clearly how missing values and “N/A” values are handled in their statistical analysis. For example, are studies with missing values excluded from statistics summarizing that data element, or are they included? In some cases, missing values may be imputed based on other fields (e.g., if a study has a single arm, it cannot employ a randomized design).
Although the FDAAA and other requirements do not apply to all fields in the database, users might consider including only studies registered post-FDAAA (September 2007), or studies with a primary completion date after December 2007. This will help to limit the number of missing values across many data elements. Users could also consider annotating data elements used in analysis according to whether or not they are FDAAA-required fields, if the user believes this might affect the extent of missing data.
Even when the data elements for a particular study are complete, users are cautioned to have modest expectations about their accuracy. In particular, results data posted at ClinicalTrials.gov may not be subject to the same level of critical scrutiny as results published in a peer-reviewed journal. As described by Zarin and colleagues in 'The ClinicalTrials.gov results database – update and key issues', ClinicalTrials.gov has implemented several measures to ensure data quality. For example, NLM staff apply automated business rules that alert data-providers when required elements are missing or inconsistent. In addition, some manual review is performed by NLM, and a record may be returned to the data-provider if revision is required. However, ClinicalTrials.gov staff cannot always validate the accuracy of submitted data (e.g., against an independent source). As Zarin et al. note, “… individual record review has inherent limitations, and posting does not guarantee that the record is fully compliant with either ClinialTrials.gov or legal requirements” 
During our own analysis of the ClinicalTrials.gov database, several extreme values for numeric data elements were encountered, such as an anticipated enrollment of several million subjects. Before proceeding with aggregate analysis, users are encouraged to review data distributions in order to select appropriate analysis methods, and to run their own consistency checks (e.g., to compare whether the number of arm descriptions provided for the study matches the data element that quantifies the number of arms in the study design) as needed.
Use of appropriate statistical inference
If the AACT results data are to be used to support a meta-analysis or systematic review of the safety or efficacy of a particular intervention, then standard methods of meta-analysis or systematic review (e.g., the PRISMA statement should be used to appropriately account for study-to-study variability and other sources of uncertainty or bias. We recommend that authors consider the following points when deciding whether to report p-values, confidence intervals, or other probability-based inference when performing aggregate analysis of the ClinicalTrials.gov database:
Is the data-generating mechanism random?
Methods of statistical inference such as p-values and 95% confidence intervals are most appropriate when used to quantify the uncertainty of estimates or comparisons due to a random process that generates the data. Examples of such processes include selection of a random sample of subjects from a broader population, randomly assigning a treatment to a cohort of subjects, or a coin toss about which we aim to predict future results.
In the following examples, we recommend against reporting p-values and 95% confidence intervals because the data generating mechanism is not random.
Example 1: Descriptive analysis of studies registered in the ClinicalTrials.gov database. In this case, the “sample” equals the “population” (i.e., the group about which we are making conclusions) and there is no role for statistical inference because there is no sample-vs-population uncertainty to be quantified.
Example 2: Descriptive analysis of the “clinical trials enterprise” as characterized by the studies registered in ClinicalTrials.gov. Despite mandates for study registration (Table 1), it may be that some studies that are required to be registered are not. In this case the sample (studies registered in ClinicalTrials.gov) may not equal the population (clinical trials enterprise). However, it is likely that those studies not registered are not excluded at random, and therefore neither p-values nor confidence intervals are helpful to support extrapolation from the sample to the population. To support such extrapolation, we recommend careful consideration of the studies that are highly likely to be registered (see section above on Population), and to limit inference to this population so that sample-vs-population uncertainty is minimal.
How can I objectively identify important differences?
In practice, p-values and confidence intervals are often employed even when there is no random data generating process to highlight differences that are larger than “noise” (e.g., authors may want to highlight differences with a p-value < .001). While this practice may not have a strong foundation in statistical philosophy, we acknowledge that many audiences (e.g., journal peer reviewers) may demand p-values because they appear to provide objective criteria for identifying larger-than-expected signals in the data. While we don’t encourage reporting of p-values for this purpose, we do encourage analysts to specify objective criteria for evaluating signals in the data. Examples are provided:
a) Prior to examining the data, specify comparisons of major interest, or quantities to be estimated.
b) Determine the magnitude of differences that would have practical significance. (e.g., a 25% difference in source of funding between studies of 2 pediatric Conditions, or a difference in enrollment of 100 participants).
c) Determine appropriate formulas for quantifying differences between groups or summarizing population variability. This quantification could take into account of the observed difference, variability in the data, and the number of observations. Examples are provided:
- When summarizing a continuous characteristic such as enrollment, the analyst might choose to report the median and 5th to 95th percentiles.
- To quantify signal to noise, the analyst could calculate a t-statistic or a chi-squared statistic (without the p-value) and rank differences between 2 groups based on these values. The analyst might pre-specify a threshold (e.g., absolute value of 3) to flag notable differences.
Specific tips for working with the AACT database
- Users are encouraged to use the Schema Diagram to determine relationships between different AACT tables. These relationships determine how tables may be linked using tools such as SAS, R & SQL.
- The nct_id uniquely identifies each study; it serves as the primary key in the Studies table. Each record in the Studies table has a unique nct_id value. nct_id also appears in every table related to Studies so that every record in every table can link back to the study to which it refers.
- Every table other than Studies has a primary key named id, which provides an integer that uniquely identifies each row in that table. (The Studies table uses nct_id instead of id as the unique identifier for each row.)
- To link table information to the study to which it refers, you simply match on nct_id. For example, every record in the Conditions table with an NCT_ID of ‘NCT0000001’ refers to the study with the NCT_ID: ‘NCT0000001’ (saved in Studies.nct_id). The Conditions table may contain multiple records with an NCT_ID of ‘NCT0000001’ which means this study was defined in ClinicalTrials.gov as being associated with the Conditions listed in Conditions with that NCT_ID.
- Information in several tables are also related to information in other tables. In this case, the table that belongs to another table will include a foreign key that identifies the record to which it belongs. Foreign keys are always named according to a simple rule: the singular name of the related table followed by: ‘_id’. For example, Facility_Contacts includes a data element: facility_id which is the foreign key to Facilities.id
- Each record’s foreign_key (ie. facility_id) contains the value of the unique identifier (id) of the record in the other table to which it belongs. For example, a facility may have multiple contacts. To find the contacts for a particular facility, look for the records in Facility_Contacts where the value in facility_id is same as the value in id for that facility in Facilities. In short, tables are related to each other with this pattern: child.<parent_name>_id = parent.id
- Note that the ID assigned to a particular record (e.g., to a record in the Facilities table) is merely the method used to identify unique records in the database table, and to facilitate linking of records between database tables. The ID does not identify unique facilities in the real world. For example, if studies A and B are both enrolling patients at Duke University Medical Center, there will be one instance of Duke University Medical Center for each study, and these records will have different ID values, even though they may be the same physical research site.
What were the primary considerations when designing the database?
When designing the database, we tried to balance the following objectives:
- Present data exactly as it exists in ClinicalTrials.gov.
- Make the information as easy to understand & analyze as possible.
- Use consistent names and structures throughout the database. Make it predictable; minimize uncertainty.
- Provide value-added attributes, identify them as such, and keep them separate from the raw ClinicalTrials.gov content. (The Calculated_Values table contains data elements that were derived from existing data.)
- Table names are all plural. (ie. studies, facilities, interventions, etc.)
- Column names are all singular. (ie. description, phase, name, etc.)
- Table/column names derived from multiple words are delimited with underscores. (ie. mesh_term, study_first_submitted_date, number_of_groups, etc.)
- Case (upper vs lower) is not relevant since PostgreSQL ignores case. Studies, STUDIES and studies all represent the same table and can be used interchangably.
- Information about study design entered into ClinicalTrials.gov during registration is stored in AACT tables prefixed with Design_ to distinguish it from the results data. For example, the Design_Groups table contains registry information about anticipated participant groups, whereas the Result_Groups table contains information that was entered after the study has completed to describe actual participant groups. Design_Outcomes contains information about the outcomes to be measured and Outcomes contains info about the actual outcomes reported when the study completed.
- Where possible, tables & columns are given fully qualified names; abbreviations are avoided. (ie. description rather than desc; category rather than ctgry)
- Unnecssary and duplicate verbiage is avoided. For example: Studies.source instead of Studies.study_source
- Columns that end with _id represent foreign keys. The prefix to the _id suffix is always the singular name of the parent table to which the child table is related. These foreign keys always link to the id column of the parent table.
Child_Table.parent_table_id = Parent_Tables.id
For example, a row in Facility_Contacts links to it’s facility through the facility_id column.
Facility_Contacts.facility_id = Facilities.id
While we tried to rigorously adhere to these conventions, reality occassionally failed to cooperate, so compromises were made and exceptions to these rules exist. For example, to limit duplicate verbiage, we preferred the table name References over Study_References, however the word 'References' is a PostgreSQL reserved word and cannot be used as a table name, so Study_References it is.
How are arms/groups identified?
Considerable thought went into how to present arm and group information to facilitate analysis by simplifying naming and data structures while retaining data fidelity.
NLM defines groups/arms this way:
- Arm: A pre-specified group or subgroup of participant(s) in a clinical trial assigned to receive specific intervention(s) (or no intervention) according to a protocol.
- Group: The predefined participant groups (cohorts) to be studied, corresponding to Number of Groups specified under Study Design (for single-group studies.
In short, observational studies use the term ‘groups’; interventional studies use ‘arms’, though for the purpose of analysis, they both refer to the same thing. Because 'group' is more intuitive to the general public, AACT standardized on the term 'group(s)' and does not use the term 'arms'.
Participant Groups: Registry vs Results
When a study is registered in ClinicalTrials.gov, information is entered about how the study defines partipant groups. In AACT, this information is stored in the Design_Groups table, while info about actual groups that is entered after the study has completed is stored in the Result_Groups table. (AACT has not attempted to link data between these 2 tables.)
Result information, for the most part, is organized in ClinicalTrials.gov by participant group. Result_Contacts & Result_Agreements are the only result tables not associated with groups. This section describes how AACT has structured group-related results data.
AACT provides four general categories of result information:
- Participant Flow (Milestones & Drop/Withdrawals)
- Reported Events
The Result_Groups table represents an aggregate list of all groups associated with these result types. All result tables (Outcomes, Outcome_Counts, Baseline_Measures, Reported_Events, etc.) relate to Result_Groups via the foreign key result_group_id.
For example, Outcomes.result_group_id = Result_Groups.id.
ClinicalTrials.gov assigns an identifier to each group/result that is unique within the study. The identifier includes a leading character that represents the type of result (B for Baseline, O for Outcomes, R for Reported Event, and P for Participant Flow) followed by a number that uniquely identifies the group in that context. To illustrate... Study NCT001 had 2 groups: experimental & control, and reported multiple baseline measures, outcome measures, reported events and milestone/drop-withdrawals for each group. The following table illustrates how the Result_Groups table organizes the group information received from ClinicalTrials.gov in this case:
||All Baseline_Measures associated with this study's experimental group link to this row.
||All Baseline_Measures associated with this study's control group link to this row.
||All Outcome_Measures associated with this study's experimental group link to this row.
||All Outcome_Measures associated with this study's control group link to this row.
||All Reported_Events associated with this study's experimental group link to this row.
||All Reported_Events associated with this study's control group link to this row.
||All Milestones & Drop_Withdrawals associated with this study's experimental group link to this row.
||All Milestones & Drop_Withdrawals associated with this study's studies control group link to this row.
Notice that the integer in the code provided by ClinicalTrials.gov (ctgov_group_code) is often the same for one group across the different result types, but this is not always the case. In the example above, B1, E1 & P1 all represent the 'experimental group', so you're tempted to think that '1' equates to to the 'experimental group' for this study, however for Outcomes, O1 represents the control group. In short, the number in the ctgov_group_code often links the same group across all result types in a study, but for about 25% of studies, this is not the case, so it can't be counted on to indicate this relationship. (We had hoped to use a single row in Result_Groups to uniquely represent a participant group in the study and link all related results data (from the various tables) to that one row, however this was not possible. Therefore, one group will typically be represented multiple times in the Result_Groups table: once for each type of result data.
Information about dates
ClinicalTrials.gov has historically provided the month/year (without day) for several date values including start date, completion date, primary completion date and verification date. Because the 'day' was not provided, AACT stored these dates in the Studies table as character-type rather than date-type values. Character-type dates are of limited utility in an analytic database because they can't be used to perform standard date calculations such as determining study duration or the average number of months for someone to report results or identifying studies registered before/after a certain date.
NLM recently reported that ClinicalTrials.gov will start providing full date values (mm/dd/yy) for these date elements, however this only applies to new studies; studies entered prior to this announcement will continue to have only month/year date values. We considered various alternatives to handle dates given this issue. We decided to provide 2 columns in the Studies table for each date element: 1) a character-type column that displays the value exactly as it was received from ClinicalTrials.gov & 2) a date-type column that can be used for date calculations. If the date received from ClinicalTrials.gov has only month/year, in order to convert the string to a date, it is assigned the first day of the month. For example, a study with start date June, 2014 will have June, 2014 in the start_month_year column and 06/01/14 in the start_date column.
Information about dates related to 'pending results'
On May 9, 2018, the NLM added a new section of date information to the ClinicalTrials.gov API. The section is labeled "pendng_results" and serves to provide information about result submission activity while the results await quality control review.
NLM provides result submission date(s) for studies that have results awaiting quality control (QC) review. The results themselves are not publicly posted until the review is complete. The dates for three types of events related to results submission are reported in the Pending_Results table:
- Submission: The date(s) that study results were submitted to NLM for QC review.
- Submission Canceled: The date(s) that such submissions were canceled by the data provider. (Note: this value is set to "Unknown" if the cancellation occurred before 05/08/2018 when this data started to be collected).
- Returned: The date(s) that study results were returned to the data provider because they required modification.
The NLM reports that the following updates occur to this information when a study passes the quality control review:
- Study.results_first_submitted_date is populated with the earliest 'submitted date' from Pending_Results.
- Study.results_first_submitted_qc_date is populated with the submitted date of the version of results that passed QC.
- Result tables (Reported_Events, Outcomes, Baseline_Measurements, etc.) are populated with the result information that passed QC.
- All rows for the study with reviewed/approved results are removed from the Pending_Events table.
More information about the quality control (QC) review process and how this information is presented in ClinicalTrials.gov can be found in the December, 2017 NLM Technical Bulletin
Information about trial sites (Facilities and Countries)
Information about organizations where the study is/was conducted (aka. facilities, trial sites) is stored in the Facilities table. This represents the facility information that was included in the study record on the date that information was downloaded from ClinicalTrials.gov.
The name and email/phone for the contact person (and optionally, a backup contact) at a facility is available if the facility status (Facilities.status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. This information is stored in AACT in the Facility_Contacts table, which is a ‘child’ of the Facilities table. Facility-level contact information is not required if a central contact has been provided. Contact information is removed from the publicly available content at ClinicalTrials.gov (and therefore from AACT) when the facility is no longer recruiting, or when the overall study status (Studies.overall_status) changes to indicate that the study has completed recruitment.
Similarly, the names and roles of investigators at the facility are available if the facility status (Facilities.status) is ‘Recruiting’ or ‘Not yet recruiting’, and if the data provider has provided such information. This information is stored in AACT in the Facility_Investigators table, which is a ‘child’ of the Facilities table. Facility-level investigator information is optional. Facility-level investigator information is removed from the publicly available content at ClinicalTrials.gov (and therefore from AACT) when the facility is no longer recruiting, or when the overall study status (Studies.overall_status) changes to indicate that the study has completed recruitment.
AACT includes a Countries table, which contains one record per unique country per study. The Countries table includes countries currently & previously associated with the study. The removed column identifies those countries that are no longer associated with the study. NLM uses facilities information to create a list of unique countries associated with the study. In some cases, ClinicalTrials.gov data submitters subsequently remove facilities that were entered when the study was registered. Naturally these will not appear in AACT's Facilities table. If all of a country’s facilities have been removed from a study, NLM flags the country as ‘Removed’ which appears in AACT as Countries.removed = true.
The reasons facilities are removed are varied and unknown. A site may have been removed because it was never initiated or because it was entered with incorrect information. The recommended action for sites that have completed or have terminated enrollment is to change the enrollment status to “Completed” or “Terminated”; however, such sites are sometimes deleted from the study record by the responsible party. Data analysts may consider using Countries where removed is set to true to supplement the information about trial locations that is contained in Facilities, particularly for studies that have completed enrollment and have no records in Facilities.
Users who are interested in identifying countries where participants are being/were enrolled may use either the Facilities or Countries (where Countries.removed is not true) with equivalent results.
MeSH terms in Browse_Conditions and Browse_Interventions
When data submitters provide information to ClinicalTrials.gov about a study, they’re encouraged to use Medical Subject Heading (MeSH) terminology for interventions, conditions, and keywords. The Browse_Conditions and Browse_Interventions tables contain MeSH terms generated by an algorithm run by NLM. The NLM algorithm is re-run nightly on all studies in the ClinicalTrials.gov database, and sources the most up-to-date information in the study record, the latest version of the algorithm, and the version of the MeSH thesaurus in use at that time.
“Delayed Results” data elements are available in AACT
A responsible party of an applicable clinical trial may delay the deadline for submitting results information to ClinicalTrials.gov for up to two additional years if one of the following two certification conditions applies to the trial:
- Initial approval: trial completed before a drug, biologic or device studied in the trial is initially approved, licensed or cleared by the FDA for any use.
- New use: the manufacturer of a drug, biologic or device is the sponsor of the trial and has filed or will file within one year, an application seeking FDA approval, licensure, or clearance of the new use studied in the trial. A responsible party may also request, for good cause, an extension of the deadline for the submission of results.
Studies for which a certification or extension request have been submitted include the date of the first certification or extension request in the data element: Studies.disposition_first_submitted_date.
In general, the content that is contained in the AACT database preserves the content in the source XML files that are downloaded from ClinicalTrials.gov.