Using BibDesk to organize PDFs of research papers

The basic problem is excellently described by Olivia Judson in this article. If you've ever had 85 files on your computer entitled "sdarticle121.pdf" and no idea what is in any of them, you will understand.

Dr. Judson highlighted two pieces of software that solved her problem: Zotero and Papers. Zotero is a Firefox extension, which I'll return to in a bit, but it's more about gathering new papers (and making those new papers easier to keep organized) than about organizing things you already have.

Papers, on the other hand, seemed like the tool I'd been looking for, and it's used and loved by other scientists I know. However, before deciding to pony up the $40, I did a bit of Googling ("reference management software" was a winner) to investigate the alternatives.


"Like iTunes for your PDF collection"

It turned out that the open-source BibDesk could do everything I wanted, for free. Plus, it stores all its information in the BibTeX format, which (a) is human-readable, so even if the BibDesk project goes bust someday, you will still have all the bibliographic information available, and (b) I already had all my bibliographies in anyway. I just had to open my existing .bib files from within BibDesk, then match up the .pdf files with the right .bib entries. (BibDesk will let you "link" as many files and/or URLs as you want to one bibliographic entry. The newer versions also keep track of the "Mac OS aliases" of the linked files, so you can move them around in Finder or whatever, and BibDesk will still know where they are. It's an awesome feature.)

This is pretty great, but it's still kind of tedious opening up a zillion BibDesk entries and hunting for the .pdf file. I ended up adapting other people's Applescripts to make it easier. My personal filing system had previously been to create filenames including the year, journal name, and first author, so I wrote an applescript that called Unix "locate" to look for those. But actually the most effective approach for recent PDFs (especially the ones with cryptic filenames such as "sdarticleXX.pdf") was to use an applescript to call the Unix tools "pdftotext" and "sed" to search the text for the DOI identifier, then to feed that to the fabulous ADS database to get the ADS-supplied BibTeX entry directly.

New applescript (2012 revision)

In order for it to work, you must have pdftotext and curl installed on your system. I think curl comes with OS X; pdftotext can be installed from fink (if you have it installed another way, you will have to change "/sw/bin/pdftotext" in the applescript to the correct path. If you don't know the path, try typing "which pdftotext" in Terminal).

To use it:

  1. Open a new entry in BibDesk and drag the PDF you want the bibliographic information for onto it. (You can do this from either Finder or Preview -- if the paper is already open in Preview, you can drag the icon from the titlebar. This is handy if you've just downloaded-and-opened the PDF out of Firefox.)
  2. Then, either

DOI and regular expressions

As an aside the "sed" commands used to search for the DOI are:
(1st try):

sed -n -e 's_.*[Dd][Oo][Ii][:)] *\([[:digit:]][[:digit:]]*.[[:alnum:])(.-]*/[[:alnum:])(.-][[:alnum:]):(.-]*[[:alnum:]-]\).*_\1_p'
(2nd try, in case the paper does not place "DOI" in front of the identifier):
sed -n -e 's_.* *\([[:digit:]][[:digit:]]\.[[:alnum:])(.-]*[[:alnum:])(.-][[:alnum:]):(.-]*[[:alnum:]-]\).*_\1_p'
Remember, if you use them in an applescript, you have to double all the backslashes ("\"). I ended up working the regexes out myself after Googling "doi regex" unexpectedly failed to produce anything useful, so I thought I'd save you the trouble.

Helpful BibDesk-related Resources

The original Bibdesk ADS script

Official list of Bibdesk applescripts

The State of Biomedical PDFs

DOI parsing

ADStoBibdesk (Safari only)


More Bibdesk Scripts:

Back to the home page.