Full Refresh: A full refresh of the database is scheduled to run at midnight on the first day of each month. This process takes 15-20 hours. The database remains available during this time; however it must be locked down for about 5 minutes at the end when it's restored from a version of the database that was refreshed.
The full load begins by retrieving all data from ClinicalTrials.gov via the ClinicalTrials.gov API, therefore the AACT database created on the first of the month will contain data as it existed in ClinicalTrials.gov as of midnight the previous day. The actual date when the study record was released via the ClinicalTrials.gov API is stored in the Studies.NLM_Download_Date_Description field.
Why do a full refresh? ClinicalTrials.gov occasionally needs to merge studies that have been identified as duplicates; deleting one of the studies in the process. The incremental/nightly update in AACT cannot detect when studies have been removed from ClinicalTrials.gov, therefore we wipe out the database and run a full refresh to prevent AACT from accumulating these obsolete records. It should be noted that a study that is deleted from ClinicalTrials.gov will not be removed from AACT until the first day of the following month.
Incremental Update: A nightly, incremental update is scheduled to run each night at midnight; it takes about one and a half hours to complete. As with the full refresh, the database remains available for use throughout most of this process (excluding the 5 minutes it takes to perform the last step of restoring from the background/updated database). This process uses the ClinicalTrials.gov RSS Feed to retrieve studies that have been added or changed. The commands used to retrieve this information from ClinicalTrials.gov are:
https://clinicaltrials.gov/ct2/results/rss.xml?rcv_d=2&count=10000 (retrieves new studies)
https://clinicaltrials.gov/ct2/results/rss.xml?lup_d=2&count=10000 (retrieves changed studies)
Static Database Copies: The postgreSQL pg_dump command is used to create a static copy of the database after each update. It is zipped up with current documentation (the data dictionary, database schema and related information from the National Library of Medicine), and made available on the download static copies page. The database copies created after the nightly incremental loads remain available for download until the end of the month. Database copies created after the full refresh on the first of the month are archived and made permanently available for download. (Note: archived copies of the database created before January, 2018 were not created on the first of the month.)
Pipe-Delimited Flat Files: After each database update, all data from the database are extracted into a set of pipe-delimited flat files; one file per database table. These file sets can be obtained from the pipe-delimited flat file download page. Like the static database copies, file sets created after the nightly incremental load will remain available for download until the end of the month. File sets created after the full refresh on the first of the month are archived and permanently available. (Note: archived copies of pipe-delimited files created before January, 2018 were not created on the first of the month.)