The AACT database now includes a set of supplemental schemas that present datasets collected & curated during previous AACT-based research. By including these data within the AACT database, the public can benefit from work that has been performed by other investigators. Since the information is directly accessible, it may be incorporated into queries on current clinical trials. It also serves to make previous research more transparent and help AACT users better understand assertions made by the previous investigators.
Database schemas are used to differentiate project-related data from ClinicalTrials.gov data. Data from ClinicalTrial.gov remain available in the ctgov schema and each project has a database schema in which the datasets for that project are available. All project schemas are prefixed with 'proj_'. With the release of AACT 4.1.0, all users of the live AACT database have immediate access to this information.
Datasets from the following three AACT-based research projects have been made available in this release:
proj_results_reporting: Anderson ML, Chiswell K, Peterson ED, Tasneem A, Topping J, Califf RM. Compliance with results reporting at ClinicalTrials. gov. New England Journal of Medicine. 2015 Mar 12;372(11):1031-9.
proj_tag_nephrology: Inrig JK, Califf RM, Tasneem A, Vegunta RK, Molina C, Stanifer JW, Chiswell K, Patel UD. The landscape of clinical trials in nephrology: a systematic review of Clinicaltrials. gov. American Journal of Kidney Diseases. 2014 May 1;63(5):771-80.
This feature will continue to be developed and your feedback is appreciated. Please email the AACT team with questions and suggestions.
This feature has been implemented as a separate Ruby on Rails application. AACT is now comprised of 3 applications: 1) AACT Core, 2) AACT Admin & 3) AACT Projects. All code for these three components is publicly available in github. Note: Implementing this feature required some changes to the way ClinicalTrials.gov data is loaded into AACT. Details about these changes are available upon request.
The National Library of Medicine (NLM) updates the MeSH thesaurus each year. To facilitate access to the set of terms used by previous research projects, a new schema named mesh_archive has been added to the live AACT database. Tables in this schema are named yYYYY_mesh_terms where YYYY identifies the version of that set of terms. For example, the 2010 set of MeSH terms is available in mesh_archive.y2010_mesh_terms.
The Calculated_Values table has 3 new columns that provide the number of primary, secondary & other outcome measures:
These are integer columns. Values are calculated by summing the number of rows in the design_outcomes table per study where outcome_type is primary/secondary/other.
The 'Row Count' & 'DB Section' columns have been removed from the data dictionary because this information is displayed further down on the same page in the section that defines AACT tables. (The information is table-specific, not column specific, so belongs in the section that describes the tables.) A column has been added to the data dictionary to display the database schema name. Although the schema name is also table-specific (not column-specific), it is presented in the data dictionary because, as a searchable column, it can be used to filter on all rows associated with a certain schema/project.
The data dictionary now includes rows to describe all project-related tables and columns.
The pagination numbers at the bottom of the data dictionary table were scrunched together. This has been fixed.
For AACT Admins Only: The AACT website page which lists information about all users has been enhanced; the information is now sortable & the table includes pagination. The option to download user information as a CSV or Excel file is selected, the content of the download only contains information that is of potential interest; attributes containing encrypted values no longer appear in the file. (This page is only accessible to AACT administrators.)
To simplify the management of user database accounts, we have created a role named 'read_only' in the AACT database and now assign all AACT users to this role. With this change, we are able to grant/revoke privileges to/from this one role rather than having to do it for each individual database user. (The search path must be specified for each individual user however, since it is not inheritable via the associated role.)
A process now records the total number of times each username submits a call to the public database. Currently, the process only collects information about the number of times a user makes a call to the database; it does not track the actual queries. The process uses a shell script to parse the public database logs every Sunday, counts the number of times each user posted a database event and saves this information to the db_user_activities table in the aact_admin database.
Until now, the AACT database has provided two of the six data elements related to individual participant data (IPD) sharing: 1) a yes/no value indicating whether the study planned to share this information and 2) a description of the plan. On August 24, 2018, the National Library of Medicine (NLM) added the other four IPD-related attributes to the ClinicalTrials.gov API, so they are now available in the Studies table of the AACT database.
The 'has_us_facility' value saved to the CalculatedValues table is now set to 'true' for studies that have at least one facility in the United States or a US Territory. The decision to include US Territories was based on NIH's 'Checklist for Evaluating Whether a Clinical Trial or Study is an Applicable Clinical Trial (ACT) Under 42 CFR 11.22(b) for Clinical Trials Initiated on or After January 18, 2017' (A country is considered a US Territory if it is one of those defined by the World Atlas.)
After the full database refresh that happens on the first of each month, we delete the previous month's set of daily static database copies and pipe-delimited file sets. Until now, this process has been manual. A process has been implemented to automatically remove these files on the first of the month.
With the release of AACT 4.0.0, we divided AACT into two separate applications: AACT & AACT-ADMIN. Some unnecessary code was left over in both apps. We've gone through and cleaned up the apps to remove superfluous code.
Previously, user information was backed up as one of the final steps in the nightly data load process. Now administrative tasks are performed by the AACT-ADMIN application, so user info backups are no longer a part of the data load process. (The AACT-ADMIN application is now responsible for backing up user info and the AACT application for loading the database.) A cron job has been setup to backup user information every morning at 4am.
When user information is backed up each morning, AACT administrators receive an email message that includes the backup file attachments and instructions about how to recover info from these files. The instructions in this email have been improved.
To simplify the process to grant/revoke user access to the public database, shell scripts have been created that can be quickly run to perform these tasks. The scripts are also used by rspec tests to confirm user maintenance functionality.
A user noted a critical error in the website documentation that describes how to create a local copy of the AACT database. The command to restore the database from a dump file downloaded from the website identified the default database 'postgres' rather than the aact database. The command has been corrected:
-> pg_restore -e -v -O -x --dbname=aact --no-owner --clean --create ~/Downloads/postgres_data.dmp
AACT has been divided into two applications: one solely dedicated to populating the AACT relational database with data from ClinicalTrials.gov and the other to manage all other supporting functionality such as maintaining user accounts and hosting this website. Both applications use Ruby on Rails and PostgeSQL, and are publicly available on github:
Users will not be directly affected by this change; it simply makes it easier to support the system and positions AACT to be more easily replicated by other organizations/people.
To comply with Article 17 of the General Data Protection Regulation (aka 'The Right to be Forgotten'), we have verified that AACT does not save any information about a user who has chosen to be removed from AACT.
Tables added to AACT in version 3.1.2 (Documents & Pending_Results) are now defined in the Table Definition table on the Data Dictionary page of the AACT website.
PostgreSQL recognizes mixed-case objects and requires double quotes when managing such objects. To avoid confusion and complexity, we now prevent the creation of mixed-case database usernames.
Added a page for technical documentation. (Accessible to AACT administrators only)
Added a page of instructions to stand up an instance of AACT on a Windows 10 machine.
On May 9, 2018, the National Library of Medicine (NLM) added data about 'pending results' to the ClinicalTrials.gov API. A Pending_Results table has been added to the AACT database to present this new information.
NLM provides result submission date(s) for studies that have results awaiting quality control (QC) review. The results themselves are not publicly posted until the review is complete. The dates for three types of events related to results submission are reported in the Pending_Results table:
The NLM reports that the following updates occur to this information when a study passes the quality control review:
The ClinicalTrials.gov API provides information about & links to documents related to a study. NLM provides the following information about these data:
The full study protocol and statistical analysis plan must be uploaded as part of results information submission, for studies with a Primary Completion Date on or after January 18, 2017. The protocol and statistical analysis plan may be optionally uploaded before results information submission and updated with new versions, as needed. Informed consent forms may optionally be uploaded at any time.
On May 3, 2018, NLM posted this comment to their API schema documentation:
As promised in 08/30/2017 entry above, old redundant date names have been retired and their tags removed. Please update systems to stop using the date on the left in favor of the date on the right.
obsolete tag replacement tag <firstreceived_date> <study_first_submitted> <firstreceived_results_date> <results_first_submitted> <firstreceived_results_disposition_date> <disposition_first_submitted> <lastchanged_date> <last_update_submitted>
All these date attributes are stored in the Studies table. On January 22, 2018, the obsolete date tags/columns were identified as deprecated and new columns were added that mimic the new labels defined by NLM. The columns are:
|Deprecated Column||Replacement Column|
With this release, the deprecated columns have been removed.
ClinicalTrials.gov has made changes to the API (adding new tags; removing deprecated tags), so we needed to update the studies used by automated test scripts; the tests need to use data that accurrately represents the current structure of the ClinicalTrials.gov API. The latest version of all test studies were downloaded and test scripts were updated to address all changes.
To ensure we're able to recover user account information if necessary, we have added a step to the nightly update process that extracts all data from user-related tables and user account information and emails this to AACT Administrators along with instructions about how to run the scripts to restore the information.
A page to display all registered users has been added. It is only accessible to AACT administrators.
The documentation that explains how to use SAS to connect to AACT needed to be tweaked. The sample script was missing the line that identifies the user's password. We also fixed some awkward-looking fonts.
All data retrieved from ClinicalTrials.gov is saved into a schema named 'ctgov'. Before, when standing up a new instance of the AACT database, we needed to manually create the ctgov schema, grant privileges to the database administrator and define 'ctgov' as the default schema. We have now modified the database initialization process so that the ctgov schema is automatically created so that the tables, views and indexes are saved there without requiring any extra manual steps.
If a user forgot their password and clicked the link to receive an email to reset it, the process raised an error after they entered their password and confirmation password. This bug has been fixed.
Prior to Version 3.1.0, the AACT database did not own any data; all information in AACT was retrieved from ClinicalTrials.gov. The database could be (and frequently was) wiped out and recreated from this data source.
With the introduction of a user registration feature, AACT is now the system of record for user account information and must therefore ensure copies of user-related information are backed up and can be restored if necessary. We've setup a daily pg_dump process to create copies of the admin database (which contains a table of Users), and a pg_dumpall --globals-only process to save the database accounts (username/password/access rights) created in the publicly accessible AACT database.
As noted, the only reason to backup the public AACT database is to ensure we have restorable copies of user accounts. Since the actual content of the database can be recovered from ClinicalTrials.gov, only account usernames, encrypted passwords and ACL information are backed up.
With this release, users of the live AACT database will need to register and receive an individual user account to access the database. Individual accounts will replace the single common login-name/password (aact/aact) that has been used until now. To register and get a database account, please visit the AACT website and click Sign-Up in the upper right corner of any page.
The registration process is automated, using standard methods to verify the email address you provide. This should take about 5 minutes. If you have questions or encounter problems, please send email with the word 'registration' in the subject line to firstname.lastname@example.org.
While your login-name & password will change from aact/aact to the login-name/password you define, all other connection information (hostname, database name, and port number) will remain the same.
The previous login-name/password (aact/aact) will remain active for several weeks while people become aware of this new requirement and have the chance to create and test their new database account.
User registration will allow us to contact people about scheduled downtimes and other events. It also helps us monitor and manage database activity.
You can download static copies of the database and the pipe-delimited flat file sets without creating an account; if you only use these resources, you need not register unless you wish to receive email notifications.
In preparation for future enhancements that will provide supplemental information to enhance/annotate ClinicalTrials.gov data, all current AACT tables (ie. tables containing only data retrieved from ClinicalTrials.gov) have been moved to a schema named 'ctgov'. All database user accounts will define 'ctgov' as the default schema, so SQL queries need not specify this.
Queries created to run against the previous version of AACT that do not explicitly prefix table names with 'public.' should continue to run without needing any change. If however, your queries have prepended 'public.' to the table names, you will need to either remove these prefixes or change them to 'ctgov.'
Note: This change has no impact on users of the pipe-delimited flat file extracts.
Until now, downloadable copies of the AACT database (a static pg_dump copy and a set of 40 pipe-delimited flat files) have been created once a month and made available on the download page of the AACT website. Several people have expressed interest in getting these downloadable resources more frequently. As of this release, a static copy of the database and a set of pipe-delimited files are created & published to the download page after each nightly load.
To prevent the accumulation of hundreds of copies of the database through the year, these daily copies will be available for download only until the end of the month. Downloadable copies made on the first of the month will continue to be archived and made permanently available via the website. Both daily and monthly downloadable files can be retrieved from the download page of the AACT website.
(Prior to January, 2018, downloadable copies were created monthly, but not on the first. Going forward, these should be consistently created and dated on the first of each month.)
Some columns contain a limited number of possible values; several such columns are enumerated in the data dictionary, displaying the total number of rows with each value and the percent distribution. For example: on Februrary 28, 2018 the enumeration summary for Designs.primary_purpose was:
We are now saving enumeration information to an administrative table so that trends can be identified with the passage of time. This information will also help us verify the accuracy of updates by comparing current percent distributions to previous distributions. If values change dramatically, an alert is sent to AACT administrators.
We have improved the process that updates the AACT database by making the following changes:
Each night we refresh a 'background' copy of the AACT database and then use pg_restore to copy it to the publicly accessible database. In the past, if people were logged into the public database, the pg_restore command hung and the refresh failed. To prevent this, all database sessions are now terminated before the update process runs the pg_restore command. This typically occurs around 1am EST.
To prevent users from logging in while the refresh is under way, the process locks the public database before starting. Until now, if the refresh terminated unexpectedly, the database remained locked and inaccessible. This has been fixed. Now we automatically detect when the process fails, unlock the public database, and send an email notification to AACT administrators to report the failure.
A validation test has been added to prevent the public database from being refreshed if the number of studies in the updated database appears to have decreased.
The email notification that is automatically sent to AACT administrators after every database refresh now provides the list of NCT IDs that were added or updated. If the refresh failed, this is now noted in the subject line.
Every table has an NCT_ID column that serves as the foreign key to the Studies table. These columns need to be indexed so that queries run within a reasonable amount of time. Until now, these indexes were missing.
The database server's 60 GB of diskspace is inadequate - usage exceeding 90%. We have upgraded the server's resources as follows:
SSD Disk: 60 GB increased to 200 GB
Memory: 16 GB increased to 32 GB
CPUs: 6 vCPUs increased to 16 vCPUs
This 'Release Notes' page has been enhanced to include past release notes and facilitate documentation of future updates.
If a date value includes only the month & year (no day), we save that value as a string in a column - these string-type columns have the suffix: 'month_year'. The value is also saved to as a date-type value in a column with a _date suffix. (Example: Studies.start_month_year & Studies.start_date) We have been setting the day to the first day of the month in these date-type conversions. A user noted that the last day of the month is a perferred value. They noted: “these dates (start, completion, primary_completion) define when registration & results are due. A missing day value that defaults to the 1st of the month is the most restrictive and the last of the month is the most generous – for the purposes of compliance assessments” To be consistent we made this change for all data elements that can provide just month/day.
While changing the data value for month_year data elements, we noticed that the date-type value for Outcomes.anticipated_posting_date was not being provided. We have added this column to the Outcomes table.
On February 9th, we decommissioned the AACT database hosted on Amazon Web Service and the AACT website hosted on Heroku.
The primary objective for this release is to move the website, database and related code to servers hosted by Duke University and DigitalOcean in order to provide users with a static IP address for the database and to reduce monthly costs for hosting platforms. Below is a more detailed list of changes.
The previous version of the AACT database was hosted on the Amazon Web Services (AWS) Relational Database Service (RDS); the AACT website and data processes were hosted on a Heroku server. As of January 22, 2018, the AACT public database will now be hosted on a DigitalOcean server and the website, supporting databases and all system software will reside on virtual Linux servers maintained by Duke University's Office of Information Technology.
Website and Data Processing Server:
Advantages of this configuration:
Static IP Address: The AACT database will have a static IP address which is needed by organizations that employ whitelists to secure their local area networks. (Some firewalls are configured to only allow data-traffic to/from certain IP-addresses.) While AWS users can setup static IP addresses for their virtual private networks, AWS does not provide a way to define a static IP address for a specific database instance. The lack of a static IP address was a significant problem for several users.
Support: The Duke University Office of Information Technology (OIT) has a team of highly qualified server administrator who use established practices and tested procedures to ensure upgrades and patches are applied and servers remain up-to-date and secure. A service agreement is in place to guarantee on-going support.
Positioned for Growth: We need to address performance issues as more people discover and query AACT. By using a third-party service like DigitalOcean, we can easily replicate the database server to distribute the load across machines. If organizations or individuals want a dedicated instance of the database because they need reliably fast response times or would like to enhance the database with custom views, triggers, procedures, etc., we can help stand up 'private' servers and have processes refresh them nightly so they remain updated along with the 'public' database.
Reduced Cost: We expect the new configuration to significantly reduce monthly overhead costs.
In the previous version of AACT, the public database was taken down each evening for about one hour to apply all the changes that had been made in ClinicalTrials.gov that day. Periodically, a full refresh of the database was conducted; this process took approximately 15 hours during which time the database was inaccessible. To minimize such downtime, the load process has been reconfigured so that a background database is updated while the publicly accessible AACT database remains available. When the process completes, the publicly accessible version of the database is restored (via pg_restore) which takes less than 5 minutes. This model also allows us to verify that the load process was successful before the public database is updated.
On August 30, 2017, the National Library of Medicine (NLM) began providing a new set of dates for each clinical trial via the ClinicalTrials.gov API. The Studies table in AACT has been adapted to include these new date-type data elements:
String-type data elements added:
NLM deprecated four date elements (displayed in the left column of the table below) and recommended that users start using the alternative date element (on the right). NLM wrote: "Some existing dates are now redundant. They will be kept for some time to provide an opportunity for users of the XML to update their systems before being removed at a later date, probably in 2018."
AACT continues to provide the deprecated data elements. They will continue to be available in AACT until NLM removes them from their API.
AACT has been upgraded to Ruby 2.4.0 & Rails 4.2.9 (Previously: Ruby 2.2.3 & Rails 18.104.22.168)
We now reboot the database before launching the full load to disconnect user connections. Previously, the full load would hang if active sessions were running, waiting for a quiet database before it would start.
A Use Case Gallery has been added to the AACT website.
References on the website to static copies of the AACT database are now called 'static database copies' instead of 'snapshots'. Using 'snapshots' to refer to static copies of the database was confusing because this term has always been used to refer to the annual set of visualizations that summarize (snapshot) the 'state of clinical trials'.
The database refresh failed when executing the final step that retrieved logging information from AWS. When it tried to look at log file: error/postgresql.log.2017-03-08-20, AWS raising error: This file contains binary data and should be downloaded instead of viewed. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterValue; Request ID: c3ff20fc-05a1-11e7-96d9-2dc5508b92a3) We now catch this error and skip over it.
We reviewed database activity to identify suspicious activity and created a preliminary instance of the AWS suppression list to block potential hackers.
The footer on each page of the AACT website includes: 'Read our Citation Policy here', but the actual link (https://www.ctti-clinicaltrials.org/briefing-room/citation-policy) was missing. This has been fixed.
A Public Announcement feature has been added to provide AACT administrators with the ability to dynamically publish temporary information on the AACT website. For example, when the database is temporarily down because it's being refreshed, we now notify users by posting a public announcement for the duration of the downtime.
A feature to interrogate AWS database log files has been added which saves information about database activity to an administrative table in AACT. We are now better able to monitor database use.
All administrative tables have been moved out of the public AACT database and into a separate database (aact_admin) which is accessible to AACT administrators only. Admin tables are:
CalculatedValue.has_us_facility was incorrectly set to false during incremental/nightly loads. This has been fixed.
The nightly incremental load was not finding all the added & changed trials from the ClinicalTrials.gov RSS feed. We now send 2 RSS calls to ClinicalTrials.gov to get them all. Also, if a call to the ClinicalTrials.gov API times out, it now tries 5 times before giving up.
The set of pipe-delimited files was not getting generated as expected because the process aborted when it tried to create an index on a non-existent column: Calcuated_Values.sponsor_type. This problem has been fixed.
We have added a table to the Data Dictionary page to summarize all AACT database tables and provide their current row counts.
An enhancement has been made to the Data Dictionary page: the enumerations column in the table now displays the percentage for each element in the dropdown.
The Guide for Researchers now provides the effective date (January 18, 2017) for the NIH's recently published policy.
Mailgun was re-configured to belong to CTTI. It had previously been registered under StudyCo.
This release represents a significant upgrade that aims to make AACT easier to access and use. Since 2010, the AACT database has been published twice a year as a package that would be current as of a particular date: March 27 for the first annual installation, and September 27 for the second.
The package contained the content of ClinicalTrials.gov as 1) an Oracle database instance, 2) a set of SAS cport files & 3) a set of pipe-delimited files. It also included documentation in spreadsheets. Each package was made available to the public on the CTTI website. These packages remain available here. Until now, the use of AACT involved download/setup that required relatively sophisticated technical skills. AACT users also reported that the information was not current enough and the documentation difficult to use.
The code that generates the database has been proprietary and inaccessible to others who might want to replicate the process. The code used to create the AACT database and website is now publicly available in github. In summary, we have rewritten AACT to make it easier to access and understand, and to encourage others to replicate and make use of any aspect of it.
The AACT database is immediately accessible in the cloud, eliminating the need for users to download and install the data.
Each month, a static copy of the AACT database is saved and made available for download. The database platform, Postgres is a popular free open source database platform and requires relatively less technical know-how to setup than other larger platforms such as Oracle.
The database schema has been simplified and employs consistent naming and design conventions.
Documentation has been moved from spreadsheets to this website, and provides instructions about how to access and use AACT with instructions on how to access and use AACT with a variety of popular desktop applications including SAS, R, Tableau, and PostgreSQL tools.
A 'Calculated Values' table provides commonly-used, pre-computed values for each study such as total number of facilities and number of months to report results.
The public is free to download and recreate the full system or any part of it. All related code (Ruby on Rails) is available in github. This includes the processes that pull data from ClinicalTrials.gov and populates the postgreSql database.
Providing the public with direct, query-able access to a database in the cloud is not a common model and we have yet to determine how well it will serve hundreds or thousands of simultaneous users, however AWS cloud services provides the most promising alternative for scalable solutions. Another notable challenge has been the time required (~15 hours) to load 220,000+ studies. With recent regulatory changes, it’s likely the amount of data in ClinicalTrials.gov will grow at a faster rate; therefore CTTI continues to investigate ways to improve performance and reliability.
A beta version was released on October 1, 2016. Existing AACT users were asked to test the new version and their advice/suggestions were considered and implemented through the end of 2016. The official launch occurred January 31, 2017, just in time for the HHS ‘final rules’ to take effect.