Connect to AACT using SAS

This method for accessing the AACT database is recommended for SAS users wanting to run analyses or queries using the live database.

Note:

  • This method can also be used to create a local SAS copy of a particular instance of the database, for example by using PROC COPY as illustrated below in Example 3.e. However this is not recommended if a user has limited internet bandwidth or computational power (e.g., working on PC SAS). Two alternative methods for SAS users wanting to work with a static copy of the database are available and are recommended.

  • For SAS users with SAS/ACCESS interface to PostgreSQL and PostgreSQL software installed on their machine, a copy of the database can be downloaded and used to populate a local version of the database. SAS can then be used to connect to, and query, the local database. See instructions: “SAS – downloading and connecting to a local copy of the PostgreSQL AACT database”.

  • For SAS users without SAS/ACCESS interface to PostgreSQL, the AACT database tables can be downloaded in delimited text file format. See instructions: “SAS – downloading and working with the delimited text file version of a static copy of the AACT database”.

1

SAS Installation requirements

SAS software must be installed. SAS/ACCESS interface to PostgreSQL is required.

Note: These instructions were tested on SAS 9.4 (TS1M2) installed on a PC and on Linux.

If the steps described below do not work, the SAS/ACCESS interface to ODBC may need to be configured. For help, contact your SAS administrator, or refer to the SAS knowledgebase.

2

Assign a SAS libref to the PostreSQL database.

Using the LIBNAME statement in SAS, assign a libref (a shortcut name) to the PostgreSQL database, and specify the database connection options. The access credentials for the live AACT database are provided at the AACT Connect webpage.

For example:

      

libname cloud /* my libref */ postgres /* postgres engine */ user= (sign up/in to get a username) /* supply your login credentials */ password="your AACT password" database="aact" server="aact-db.ctti-clinicaltrials.org" port=5432 dbmax_text=32767 /* sets the length of long text variables */ preserve_tab_names=yes access=readonly ;

Note: In the PostgreSQL database, character variables are stored either as varchar(n) which is a ‘variable-length’ character type with a maximum limit on length, or as text character type which has unlimited length. SAS only has one type of character variable. For character variables of varchar(n) type, SAS will use the length limit assigned in the PostgreSQL database (default limit is n=255) for the variable. The dbmax_text= option in the libname statement is used to control the length assigned to character variables of text type. The maximum length available in SAS is 32767. Setting dbmax_text=32767 will minimize the risk of truncating information contained within these long variables. However, it may also result in excessively long variables that will increase storage and memory requirements, and reduce computational performance. When working with a particular dataset, the maximum character variable length may be set for that dataset using the dbmax_text data set option (see example in Step 3, below).

Refer to SAS documentation for additional PostgreSQL options and arguments for the LIBNAME statement.

If you are working in PC SAS, then after submitting the above LIBNAME statement you will be able to view the available database tables as SAS datasets using the Explorer window.

Sas datasets available
3

Use the assigned libref to perform tasks in SAS.

In these examples, the assigned libref is called ‘cloud’.

Example 3.a - Get listing of contents of the database

              

proc contents data=cloud._all_; run;

Example3a

Example 3.b - Count number of studies by values of the study_type variable using SAS PROC FREQ

              

title 'Number of studies by type'; proc freq data=cloud.studies; tables study_type; run; title;

Example3b

Example 3.c – Example SAS PROC SQL queries

              

* - count number of records in STUDIES table; * - count number of records by STUDY_TYPE; proc sql; title 'Number of studies'; select count(*) from cloud.studies; title 'Number of studies by type'; select study_type, count(*) as Count from cloud.studies group by study_type; quit; title;

Example3a

Example 3.d – Use SAS DATA step to extract and save content as a permanent SAS dataset

        

* !!! WARNING - this may take a long time !!!; * Consider extracting just a subset of records (e.g., 1000) first; * In the example below, the maximum length for character variables stored as ‘text’ type in the postgreSQL database is set at 1000; libname sasdata "<my file path>" ; /* folder for storing dataset */ data sasdata.studies; set cloud.studies (dbmax_text=1000 obs=1000); run;

Running this code in SAS will save a SAS dataset called ‘studies.sas7bdat’ in the file path specified by the libname sasdata … statement. This dataset will contain only the first 1000 records (obs=1000 argument) from the studies table in the live database.

Example 3.e – Use SAS PROC COPY to extract and save several tables as permanent SAS datasets

        

* !!! WARNING – this may take a long time !!!; * For example, the following code took >2.5 h to run on PC SAS; * over a home internet connection; proc copy in = cloud out = sasdata ; select studies sponsors ; /* select a subset of database tables */ run;

Running this code in SAS will save two SAS datasets, one called ‘studies.sas7bdat’ and the second called ‘sponsors.sas7bdat’. These will be saved in the file path specified by the libname sasdata … statement.

4

Alternative connection method using SAS PROC SQL Pass-Through

For SAS users intending to use SAS PROC SQL to connect to and query the live AACT database, the following method may be used in place of the methods described in Step 2 and Step 3.

In general, the connection method described in Step 2 above that uses a SAS/ACCESS LIBNAME statement with postgres engine name for the PostgreSQL interface is recommended. However, in some situations, connection using the PROC SQL Pass-Through Facility may be needed. For a description of advantages and disadvantages of the two approaches, refer to this information provided by SAS.

The following example illustrates use of the Pass-Through Facility in SAS PROC SQL. The connect to statement establishes the connection to the live PostgreSQL database and provides the database access credentials; the disconnect from statement closes the connection. This first ‘select’ query returns the total count of studies in the database; the second ‘select’ query counts the number of studies by values of the study_type variable.

        

proc sql; connect to postgres as aact ( user=(sign up/in to get a username) database="aact" server="aact-db.ctti-clinicaltrials.org" port=5432 ); title 'Number of studies'; select count(*) from connection to aact (select nct_id from studies); title 'Number of studies by type'; select study_type, count(*) as Count from connection to aact (select study_type from studies) group by study_type; disconnect from aact; quit; title;

Credentials

You will need the following information to access the database directly in the cloud.

    

Hostname: aact-db.ctti-clinicaltrials.org Port: 5432 Database name: aact User name: (sign up/in to get a username) Password: 'your AACT password'