papers.bib or: How I Learned to Stop Worrying and Love the Reference Manager
I recently completed and passed my phd thesis proposal. During my time
struggling to get myself together and organized, I gave up on trying to manage
BibTeX file by hand. Here, I’m going to describe the software and strict
workflow I’ve been using to manage a single thesis bibliography, papers.bib
.
My criteria were the following:
- Cross-platform
- Open source as heck
- Keeps all machines in sync
- Easy to restore an old version
- On-demand opening/viewing of the source PDF
- Usable offline
Software
I use three primary programs to manage my papers.bib
file:
- JabRef
- I chose JabRef as the reference manager because it is open source, cross-platform, and can open PDFs or URLs directly.
- As an added plus, it has support for plugins and a BibTeX downloader.
- git
- git was the obvious choice for versioning of the
papers.bib
. - Easy to back up on Github/Bitbucket/whatever.
- git was the obvious choice for versioning of the
- Syncthing
- Syncthing was chosen because it was a solid open-source replacement for Dropbox.
- Used for keeping previously downloaded PDF files in sync.
I can satisfy all my criteria with a combination of these three tools.
Setup
The first thing I do (did?) is to start a git repository in ~/papers/
. In
this folder I place my main papers.bib
file for JabRef to manage.
JabRef
There’s only one essential configuration option I rely on: the BibTeX key
generator. In JabRef’s preferences, I set the default pattern to be
[auth.etal]_[year]
. You can use whatever you fancy, but be sure to use
something with file system safe characters (e.g., avoid special characters like
:
).
Syncthing
I set up ~/papers
for syncing within Syncthing. In Syncthing, you can set up
ignore patterns for files it should ignore during sync. I use these:
.git
.git*
papers.bib
This means it will skip git-related stuff and the main papers.bib
, but git
will take care of all of those.
git
Likewise, I set git up to ignore the PDF files by placing this into the
.gitignore
: *.pdf
. Syncthing will be managing those PDFs.
Workflow
When I run across a paper I want to cite, or even read, I download all of the following:
-
BibTeX: sometimes I copy in an entry directly from the web source, and other times I use the JabRef web search & downloading feature. Very rarely have I needed to make entries manually. Manual entries are usually theses or books.
-
DOI/URL: All papers must have the DOI or URL. If I by chance lose a PDF, I can find the original source again.
-
PDF: Every paper must have the PDF associated with it. The only exception are library sources, which if possible I will make sure the URL points to the library online catalog entry or a place to buy it online (Amazon).
Once the BibTeX is in JabRef, the first thing I make sure to do is insert the DOI/URL if it doesn’t already have one. One thing I noticed is that sites like IEEEXplore don’t include the DOI in the downloaded BibTeX, but list it on the paper’s web page. I make sure to grab that.
The second thing I do is attach the PDF file. If you right click an entry, “Attach file” will be in the menu. Normally, the downloaded PDF name is horrible and gross. Hence, I use a handy plugin to help with that.
renameFile
There’s a plugin that is critical for my JabRef use: renameFile.
renameFile comes with two configuration options: folder and name pattern. I use
both, leaving the “folder” blank (i.e., it uses whatever directory papers.bib
is in to place PDFs). Because of the way we’ve configured JabRef’s key
generation option, I leave the name pattern as [bibtexkey]
.
After attaching a file, I simply hit “rename” in the plugin window, verify the
file is being renamed as expected, and I’m done. It will rename the PDF file to
match the BibTeX key and move the file to the ~/papers/
folder. Yay!
git commit
After I finish up adding new sources or finish writing for the day, I make sure
to check in the papers.bib
file into git. When committing, I always check
the git diff
to make sure nothing was removed, only added. That last bit
is critical, cause it can tell you when something is amiss. I also push the
changes to a public-facing Github repository.
Writing and collaboration
Having a single, dedicated papers.bib
comes with one major caveat when trying
to collaborate: people are going to insert things into the working
bibliography, hence breaking the workflow of the central bibliography
entirely! Not sure there’s much to do about that, but here’s my current
workaround.
Each paper I work on has it’s own separate git repo. I always merge in my
papers.bib
file as the “main” source and check it into git. That means git
is managing two separate versions of the “same file” in two separate repos,
which can certainly be confusing. Luckily, diff
makes it easy to determine
the differences between the working and central bibliographies.
Whenever someone makes a change to the working bibliography, I make sure to
immediately merge the new entries into my central bibliography by following
the workflow I describe above. If it is going to be in a paper with my name in
it, I am going to have it for future reference. I do this by literally checking
diff -us ~/papers/papers.bib path/to/collab/papers.bib
manually every time I
begin writing. I know, this part sucks. You could also make sure by checking
git whatchanged
after a git pull
.
After the new changes are merged into the central bibliography, I overwrite the working one with the central one. This ensures I can see whenever a change is introduced after I add in the DOI, URL, or PDF fields.
Summary
I know that seems like a lot of work – oh, it is – but trust me, it becomes so much easier to use after it is setup and working. Be vigilant in maintaining it and future you will thank you for having a central source for the references, along with links and PDFs.
One immediate need I’ve noted has to do with collaboration. While the workflow worked really well for my proposal as I was the only one working on it, collaboration immediately exposed flaws. For now, I’m manually working around this limitation.