Reverse Engineering a Bibliography
November 23rd, 2009
Dear lazyweb,
I’ve been handed a file with the authors and titles of approximately 140 scientific papers, and would like to construct a proper bibliography (complete with journal titles, publication dates, DOIs, etc.). If there were just 20 entries, I’d do it by hand, but with 140, I’d like some kind of script. How would you do this? It can be as ugly as it needs to be…
Thanks in advance.
Bibsonomy has a fairly good API, probably a good chunk are there?
Sadly DBLP does not have a useful API last time I looked, nor does Google Scholar.
Sounds like a good 207 assignment.
Depends to some degree on which database you want to search. If they were biomedical papers, most languages have modules to search and retrieve from PubMed: e.g. BioRuby’s Bio::Pubmed.
I’m guessing they are not, in which case JabRef has command-line options to search various databases using keywords – see the –fetch option.
There is a possibly working python script that you might be able to shape to your needs: http://code.activestate.com/recipes/523047/
It basically screen scrapes Google scholar.
Pay someone.
Amazon’s Mechanical Turk should do the trick:
https://www.mturk.com/mturk/welcome
@Jon Do you have a budget for this? I don’t…
CrossRef provide a web service for just this purpose. You can even use it through Ubiquity. It’s at http://www.crossref.org/SimpleTextQuery/
This will convert to DOI. Armed with the DOI, you can then lookup Pubmed, CrossRef (through an API),or some comp sci database.
There was recently a question about this on the Blueobelisk stack exchange and there Egon recommended cb2bib.