Documentation for large scale queries


#1

As a potential user I would be curious to do massive batch queries on all trials of the full database (either through web-API or local instance of your database)

Is there some place with documentation for such large queries ?

The documentation for end-users, contained in https://github.com/opentrials/docs , redirects to https://opentrials.net (which only allows limited queries).

More specifically:

a) Is there any web-API and documentation?

b) Do I need to have a local version of the database as implied by reading the code (and documentation) of https://github.com/opentrials/api ? Or is one instance of that API available for queries to your instance of the open trials database?

http://api.opentrials.net which is given in https://github.com/opentrials/api returns {“statusCode”:404,“error”:“Not Found”} (However, in reality its request code is 200, suggesting that api.opentrials.net might contain some sections that are presently not open…)

c) Is there any download / documentation for the full database ?

[ d) Is there any policy on massive web-scraping ? (e.g.: using weekends, or certain times…) ]


#2

Hi, I have some pointers that might help you out here. I will take on your first two questions, since that’s the area I’ve been mostly working on.

a) First, let’s drop a link to the API docs we have so far: http://api.opentrials.net/v1/docs/
Let us know if there’s anything missing from it, if there’s anything that should be in the Swagger docs and isn’t, it’s probably my fault.

b) Local database, I’m afraid. Our database server is bound to the local network and only responds to the API’s requests, which further exposes its endpoints to the world. If you want to run your own copy of the API, you also need the database to connect it to.

Regarding the status code / 404 response misleading couple, I’ll look into it tomorrow over coffee. This might be an issue actually we weren’t aware of.


  • is there any use case you’re trying to cover that would also make sense for the public OpenTrials site?
  • what do you understand by “large” / “massive” queries? Lots of criteria? Complicated table joins? Regex searches?

Thanks!


#3

Thank you for your reply, and the link to the API.

Primarily I would like to obtain the drug (or later via other resources: the drug target), and the start of the trial, the application scenario tested in the trial, and the current state. - For all trials.

I have not yet checked documentation in detail, but would assume that tables should be relatively easy to join. Regex is always useful, though I would presently anticipate to not require it when interacting with your API, given scheme outlined above (and aim to get data of all trials).


#4

Hi @tstoeger, we want to provide a DB dump, updated regularly. You can follow https://github.com/opentrials/opentrials/issues/488 to be notified when that’s done.