Clinical Trials scraping

danfowler · September 23, 2015, 12:33pm

For our Open Trials project, we are aiming to index and make links between different data sources on clinical trials, drugs, and health conditons. Toward this end, we’re looking to incorporate structured data from ClinicalTrials.gov. We know lots work has been done on scraping Clinical Trials in the past (including by Open Knowledge ). We’ve come up with the following list on past work. Does anyone have experience here? Any pitfalls to avoid?

https://wwwcf2.nlm.nih.gov/nlm_eresources/eresources/search_database.cfm
https://cran.r-project.org/web/packages/rclinicaltrials/vignettes/basics.html

https://github.com/tinfante/ClinicalTrialsScraper
https://classic.scraperwiki.com/views/clinicaltrialsgov_test/

Also this:

jgkim · September 23, 2015, 11:03pm

There is a project called LinkedCT, which crawls and turns data from ClinicalTrials.gov into linked data, making links between different datasets including DrugBank, DailyMed, PubMed, Wikipedia, etc. However, I guess data on LinkedCT is not up-to-date.

You can find more details about the project in the paper at: ftp://ftp.cs.toronto.edu/csrg-technical-reports/596/LinkedCT.pdf

danfowler · September 24, 2015, 2:21pm

Thanks @jgkim, excellent suggestion!

tfmorris · September 28, 2015, 6:04pm

Why would you scrape it instead of just downloading and parsing the XML?

pwalsh · September 28, 2015, 6:31pm

For clinicaltrials.gov specifically, yes, that is what we will be doing. We are using “scraping” but we generally mean “data acquisition” :).

Topic		Replies	Views
Previous scraping work for trial sources other than ClinicalTrials.gov Open Trials	1	1878	September 28, 2015
Querying for trials that have data added Open Trials	0	1419	October 11, 2016
Documentation for large scale queries Open Trials	3	1633	November 25, 2016
List of Data Sources Open Trials	18	16300	October 27, 2015
Support OpenTrialsFDA to win Open Science Prize and help open up FDA clinical trial data Community open-data , openscience	0	926	December 6, 2016

Related topics