Clinical Trials scraping


For our Open Trials project, we are aiming to index and make links between different data sources on clinical trials, drugs, and health conditons. Toward this end, we’re looking to incorporate structured data from We know lots work has been done on scraping Clinical Trials in the past (including by Open Knowledge :smile:). We’ve come up with the following list on past work. Does anyone have experience here? Any pitfalls to avoid?

Also this:

ContentMining and Clinical Trials from petermurrayrust


There is a project called LinkedCT, which crawls and turns data from into linked data, making links between different datasets including DrugBank, DailyMed, PubMed, Wikipedia, etc. However, I guess data on LinkedCT is not up-to-date.

You can find more details about the project in the paper at:


Thanks @jgkim, excellent suggestion!

Previous scraping work for trial sources other than

Why would you scrape it instead of just downloading and parsing the XML?


For specifically, yes, that is what we will be doing. We are using “scraping” but we generally mean “data acquisition” :).