Academic/scientific publishers dataset


I’m looking for a list of academic/scientific publishers, either a dataset or something from which I could build one.

Do you know of something that could help me on this?



@marado - is this of any use ?

@marado Also see this database from SHERPA/RoMEO Publishers - v2.sherpa

Thank you, those two will be a good starting point.


Once you have it, please share the outcome with this group, as might be useful for further research in open ed

Hi there, and sure.
In fact, let me give you a little background on why I asked this.

There are several datasets involving scientific publications (research papers, for instance). In those, the information available is mostly the one needed by who made the data set, but there’s a lot of potential interesting questions one can ask while observing such a data set. How many authors do those papers have? What’s their field? Which publisher? While a data set usually answers one question such of these (the question that led to the creation to the dataset), it rarely provides the needed info for those who, looking at the dataset, ask the other questions.

In order to solve a couple of those “puzzles”, I looked for a way to complement that information. The common way to refer to a papel is by it’s DOI, but, while there are manual tools to search for DOIs, I didn’t find anything scriptable to gather metadata about a paper, given its DOI. So, this weekend I’ve written a small (and still an work in progress) tool that, given a DOI, will output some metadata about that paper: GitHub - marado/DOIsh: A DOI search shell .

While testing DOIsh against one of the datasets I’ve talked previously, I noticed that, while DOIsh uses the commonly accepted as being the most comprehensive DOI search interface (crossref), there is a non-neglegible amount of papers that still do not appear on crossref’s results. Reading some things on the web on how academic researchers (I’m not one) are meant to deal with this cases, the answer is always consistent, even if disturbing: search the DOI on Google. And, true enough, for the few cases I’ve tried it, it was an effective method of reaching the data, even if not efficient.

So, I’m now planning on enhancing DOIsh to, in the cases where there are no results found on crossref, to try and find the information using Google search, but in an automated way. While my preliminary tests show this is crude but feasible, I bumped into another issue: Google Search will give me (more or less) the URL to the paper and it’s title, but the best way to figure out it’s publisher’s name seems to be to figure it out by the paper’s URL. Which seems simple, but I’d need to cross-reference the domains of those URLs with a dataset listing publishers and their websites…

…and here we are: I now want to build a dataset that has, at least, publishers and their websites, so I can later use to enhance, so I can later use it enhancing other, existing datasets :slight_smile:

Now that I’ve gave the background, just a little disclaimer: I do this kind of stuff regularly, but out of my free time, and I don’t have any timeline nor deadline to do this. On the other hand, all my code is free software, and contributions are generally accepted :wink:

I’ll give news once a version of the said dataset has been made.
Thank you all!

1 Like

Hi there,

Initial work has been published here. Thank you all for your your commands and spreading the word.

1 Like

Hi @marado, thanks for sharing this, on a note re " @okfnedu is meant to have a list of all life sciences related journals, but it isn’t online…" As far as I recall we never mentioned it, it was a folk in twitter, as the only list we have is this short one pointing at where to publish research on open education, on APC free open access journals, so if you can fix this little detail it would be great Honest and reliable Open Access Journals in Open and Distance Education | Thoughts on Open Education

Thanks for sharing, will spread the word via twitter, thanks for sharing it with us