Writing Academic Papers with Org-mode
Configs at the End
If you want to copy this workflow exactly, I will have all of the relevant configuration at the end of this document in a single source block. Just change the paths appropriately and you're good to go.
The Reason I'm Here
Rewind 1 year. I've got an i3 window with Vim on the left and a shell on the right up on my screen, with evince open on desktop 2 so I can see the output. I'm up to my ears in LaTeX, and things aren't making nearly as much sense as I think they should be. I'm designing the final exam which my students will sit in 1 week's time, and it's just not as easy as I'd like it to be.
Actually, though, the problem isn't LaTeX. Well, that's one of the problems, but it isn't the main one. The main problem I'm having is context switches.
Vim does not work smoothly with external processes. Now, you can (with work) make Vim work with external processes, but it isn't a simple task. If I were writing Vim's report card, it would include a sentence like, "Vim needs to practice playing cooperatively with others."
So when I knew that I would be embarking on a Masters, priority number 1 was finding a text-based flow that would let me write my thesis in plain text and export it for my professor. Enter Emacs.
Why Plaintext?
For the sake of those who stumbled onto this blog post by typing "How to write an academic paper" into your search engine of choice, let me explain briefly why I didn't want to write my paper in Google Docs or Microsoft Word.
Google Docs is a great Word Processor. We have used it at every school I have worked at in the last 5 years to great effect. It is simple, convenient, and reasonably quick.
It also excels at seamless collaboration. While a group of technically minded adults would have no problem collaborating using git, that isn't seamless and it isn't something you can get every single 6th grader in your school to do correctly every time.
But it falls down for long, complex documents which undergo multiple iterations.
I took my Masters from a list of possible ideas to a complete paper in a single document (plus a few supporting documents for doing statistical analysis).
Not only that, I can easily go back to any previous iteration of that process using git
.
Finally, I can be certain that the charts and tables in my paper are correct and current every time because they are generated fresh from the actual data every time I export my paper.
Why org-mode?
Now all of this is also possible in LaTeX. In fact, LaTeX is the intermediary step that my paper goes through on the way to the finished product. So why not just write it in LaTeX?
Well, there are a few reasons why I went with org-mode over LaTeX.
Familiarity
I use org-mode every day. My schedule is in org-mode (exported to .ics so that my Google calendar is up to date and visible to my colleagues). My plans, projects, and TODO lists are in org-mode. I take notes during class in org-mode. I write this blog in org-mode.
LaTeX, on the other hand, I have very little experience with. I have used it a little bit for building tests and things, but not enough to be fluent in it. Since I would also be learning a new writing environment setup, I decided to reduce the number of new things I needed to learn.
Integrates with my TODO list
Because my paper is an org-mode document, I can simply put TODO at the beginning of a section header and that section shows up in my org-agenda task list. This allowed me to outline my document, schedule when I planned to research, write, and proofread each section.
Export to multiple formats
LaTeX is usually used to export to PDF. I believe there are ways to export LaTeX to HTML or other formats, but I haven't ever used them. Org-mode exports to almost everything.
org-ref
The deciding factor for me was the package org-ref. I'll talk more about it now, but allowing me to use the Helm incremental search to filter my library for the exact source I wanted to cite, then insert that citation in the right spot and add it to the bibliography automatically was brilliant.
The Toolchain
Zotero for Library Management
While there are definitely Emacs tools for library management, Zotero excels at this particular task. With Zotero, I can drag a PDF sourced from Research Gate or another source, drag and drop it on the Zotero window, and it will autopopulate the bibliographic information. Additionally, it can generate citations for books from just the ISBN number, websites from just the web address, and lots of other sources. Finally, you can install the Zotero plugin for Firefox or Chrome and get citations into Zotero with the click of a button.
Installation
Download Zotero from their website. While you are there, go ahead and sign up for a free account. That way, you can easily transfer your library from computer to computer should you need to. You will also need to download the Zotero Better Bibtex Plugin. You may also want to grab the appropriate plugin for your browser of choice and the Zutilo plugin, but these two tools are optional.
Configuration
Now, you need to set up Zotero so that it creates the .bib file you plan to use for your paper. I have two bibliography files on my computer. A master file located in my home directory and a project-specific file located in my project's folder. The reason for this is two-fold.
- I want each project to have its own .bib file so that if someone downloads the project from the internet, they have the resources to build the PDF from the github repo.
- I want a fallback in case a specific project doesn't yet have a .bib file associated with it.
Whether you choose to have a master .bib file for all your projects or individual .bib files for each project, it is important that your .bib files stay in sync with your Zotero library. That's one of the main reasons for downloading the Better Bibtex plugin. One of the features of Better Bibtex is the ability to keep an exported .bib file up to date.
Here's what you need to do in order to get a .bib file in your project directory that stays up to date.
- In Zotero, click File -> Export Library
- For the format, be sure to select "Better Bibtex"
- Make sure you check the box "Keep Updated".
- For the save file dialog, put your file in your project directory with a reasonable name. I usually use
library.bib
.
Using Zotero
To get a citation into Zotero, the easiest way is to drag and drop the PDF of the paper or article onto the Zotero window. Zotero will then detect as much of the bibliographic data as possible (for older PDFs without OCR, this may be incomplete) and create a new entry. It will also copy the PDF into a folder in its own directory, so you can safely delete the PDF which you downloaded. Finally, if you have completed the configuration above, it will automatically export that library item into your library.bib file, making it available for searching and citing in Emacs.
Limitations of Zotero
Zotero is excellent for library management. But their notes interface leaves much to be desired for someone who is used to working with the Emacs/org-mode workflow. I would not recommend keeping any notes in Zotero. The whole goal of this toolchain is to use the best tool for each of the jobs. Zotero is the best tool for library management, but it is not the best tool for taking notes about the papers and books in your library.
PDF-Tools for Reading your Papers
Now that you've found some sources for your paper, you need to read them. Not only should you read them, you also need to keep notes on them to simplify writing your paper. For this, pdf-tools and helm-bibtex are excellent resources.
Installation (MacOS)
Installing PDF-tools on a Mac is, sadly, not as straight-forward as it should be. The instructions for doing so are found here.
The part that is missing (or at least potentially unclear is where you should define the pkg_config_path
environment variable.
This can be defined in your shell rc file (.bash_profile or .zshenv), but if you do that you will need to use exec-path-from-shell
to bring it into Emacs.
Alternatively, this can be defined inside Emacs, but then it would not be available outside of Emacs.
I elected to define it in my .zshenv file, in case I end up needed it elsewhere.
In that case, you need the following in your init.elected
(use-package exec-path-from-shell :custom (shell-file-name "path/to/your/shell" "This is necessary because some Emacs install overwrite this variable") (exec-path-from-shell-variables '("PATH" "MANPATH" "PKG_CONFIG_PATH") "This adds PKG_CONFIG_PATH to the list of variables to grab. I prefer to set the list explicitly so I know exactly what is getting pulled in.") :init (if (string-equal system-type "darwin") (exec-path-from-shell-initialize)))
This is not included in the big init.el dump at the end because there's another way to get this into Emacs, simply (setenv "PKG_CONFIG_PATH" "/usr/local/Cellar/zlib/1.2.8/lib/pkgconfig:/usr/local/lib/pkgconfig:/opt/X11/lib/pkgconfig")
.
Workflow
When in an org document (any document will do, but typically you would do this in your paper), pressing C-c ]
will open the helm-bibtex menu.
From here, you'll be presented with a list of all of the items in your library.
Use the helm incremental search to find the item you're looking for.
This view is the center of your citation/annotation workflow. From here, you can choose a library item to insert as a citation. You can open it in your PDF Viewer (If you're using pdf-tools as I recommend, that will be Emacs). You can also open an associated notes file, which would open an org file. I originally used this workflow because I could not get pdf-tools working correctly on my Mac. But making highlights and annotations directly into the PDF has the advantage of being transferable to collaborators and other computers which may not have Emacs set up on them. So my workflow right now does not use the notes file.
That said, I did find it useful to be able to write the lit review for each paper directly into an org file and then use M-x org-copy-subtree
to put it directly into my paper at the appropriate spot.
For now, though, collaborative concerns outweigh that convenience.
Since right now, we are taking notes, we want to open the PDF. So we search for the PDF, press <Tab>, and then <F2>. Assuming you have PDF-tools setup onyour computer, you should now have the PDF in Emacs.
From here, you can read the document and make annotations directly in the PDF. This is the only part of my workflow which requires me to take my hands off the keyboard as pdf-tools interacts with the specific parts of the PDF via mouse events.
But in short, you can highlight a relevant passage and press C-c C-a h
to add a highlight.
This pops up a mini-buffer where you can add your notes regarding the highlighted section.
Alternatively, you can press C-c C-a t
to add a text annotation which appears as a small sticky note on the screen.
I found those useful for annotating charts and tables.
Org-mode and org-ref for Writing Your Paper
You've found your sources, you've annotated them, and now it's time to write your paper. For this, org-mode is magnificent, especially when coupled with org-ref and helm-bibtex. I suspect the same would be true of ivy's bibtex plugin, but I like helm.
The Tools
Org-mode
Org-mode rocks, pure and simple. When writing a paper, you use the headers to represent the various sections, headers, and sub-headers of your paper. There are some modifications needed in order to export your work, especially if you're working in the humanities and need to publish in APA6 format. The modifications needed in your init.el are listed below in the code snippet, but you'll need a specific header for your document as well.
#+TITLE: <Insert Title Here> #+AUTHOR: <Your Name Here #+BIBILOGRAPHY: library.bib #+LaTeX_class: apa6 #+LaTeX_CLASS_OPTIONS: [a4paper] #+LaTeX_HEADER: \affiliation{<Your school, think tank, etc>} #+LaTeX_HEADER: \shorttitle{<A short version of the long title for page headers>} #+LaTeX_HEADER: \usepackage{breakcites} #+LaTeX_HEADER: \usepackage{apacite} #+LaTeX_HEADER: \usepackage{paralist} #+LaTeX_HEADER: \let\itemize\compactitem #+LaTeX_HEADER: \let\description\compactdesc #+LaTeX_HEADER: \let\enumerate\compactenum #+BEGIN_ABSTRACT *Abstract* You cannot use an org-mode header here. If you do, it trashes the table of contents for the apa6 document class. That's why Abstract is bolded manually. As you can see, I write my documents 1 sentence to a line. This is because I keep these documents under version control. A single English sentence is similar to a single line of code. You wouldn't run lines of code together in a production codebase, so don't run sentences together in a VC'ed text document. Latex and org-mode both interpret a single empty line as a paragraph break, so the fact that your source document is 1 sentence per line will not be visible to anybody other than you. #+END_ABSTRACT #+LaTeX: \tableofcontents
This initializes a number of LaTeX options and headers. Let's take them one by one.
- BIBLIOGRAPHY: This should be the path to the file that Zotero is exporting to. I always point this to the one inside the project directory rather than the master document saved in my home directory.
- LaTeX_CLASS and LaTeX_CLASS_OPTIONS: Together these define the LaTeX class of the document.
They are used as follows in the command:
\documentclass{$LaTeX_CLASS}$LaTeX_CLASS_OPTIONS.
Note that LaTeX_CLASS_OPTIONS must be inside brackets. - LaTeX_HEADER: These make macro calls or set variables which should be done in the header of the LaTeX document (In other words, before the content of the document begins). There are a number of these here. The variables are self-descriptive, but I will describe the packages below.
\usepackage{breakcites}
: This allow citations to word wrap. It may not be strictly necessary, but I thought it made the paper look nicer.\usepackage{apacite}
: This is necessary for apa6 compliant citations.\usepackage{paralist}
: Default LaTeX lists take up far too much space. This package reduces that.\let\itemize\compactitem
: This replaces the default\item
call with\compactitem
fromparalist
\let\description\compactdesc
and\let\enumerate\compactenum
: Same as above
Next, you write your abstract.
Wrap it in #+begin_abstract
and #+end_abstract
so that the apa6 class can find it.
Finally, add #+LaTeX: \tableofcontents
to place your table of contents.
Note that this is #+LaTeX
, not #+LaTeX_HEADER
.
The last thing you should add is at the very end of your paper. You should add the following two lines so that org-ref can build your bibliography.
If your paper should use a different citation style, you should import different packages at the top and use a different bibliographystyle
at the end.
If you are using the APA6 class, do not put the bibliography in its own header.
If you do, your PDF will have two headers for your bibliography.
It is annoying that your bibliography goes inside the final header of your Conclusions section, but it is necessary.
That may not be the case for other document classes.
Org-ref
Org-ref allows you to manage citations in org-mode. Getting started in org-ref is like getting started in helm or magit. It's intimidating at the beginning, but you don't need to understand all of it in order to handle writing your paper. In fact, much of the setup needed has already been described above (with the exception of init.el requirements).
To use org-ref, you'll press C-c ]
as you did when preparing to annotate.
Search by author name, article name, publication date... Basically anything in the .bib entry for the article.
Should you need to select more than article to cite, C-<space>
marks an article for citation.
Once you've selected the article(s) you want to cite, you'll can press <Enter>
and insert the default citation (which is typically what you want).
If you need an alternative citation format (perhaps one without parentheses), pressing <C-u Enter>
will get you the list of all possible citation formats.
There are lots, I didn't try them all.
Git to Track the Changes to Your Paper
Git is a distributed version control program. It allows you to track different versions of the files in a directory. When writing a paper, it allows you to go back to that version of the lit review that the professor liked, but keep all the work you've done on the methodology. It also lets you back up your paper easily (and for free) to Github, Bitbucket, Gitlab, or other remote git forges. I had to reformat my Mac halfway through my paper. I was able to work on it using a school desktop while waiting to get the Mac back, and sync all the changes back to Bitbucket and the Mac easily when IT was finished.
Magit
I used Magit for all my git committing, pushing, etc when writing this paper. But that is beyond the scope of this post. At a later date, I will explain how I use Magit for work.
Exporting with LaTeX
Assuming you have following the instructions to this point, when you are ready to export a version of your paper, you simply press C-c C-e l o
and your new version will open.
Currently, mine opens in Apple's PDF viewer.
I believe that is because the LaTeX command calls an outside process, so it uses the system default PDF reader.
In any case, I don't typically annotate my own papers, so that's not a serious issue for me.
Setting it up
Below are all of the relevant parts of my init.el.
These differ slightly from the one posted earlier this week because they have been updated and tweaked to reflect changes I made as I reflected on how this process worked for me.
For packages which are mandatory for this to work but which don't have any configuration specific to this task, I have simply included the shortest possible configuration for them.
If you want to know more about how I use those packages, take a look at the blog entries specific to those packages.
This assumes that you use use-package
, if you don't, you'll need to heavily adapt what you see here.
Initial Setup
package
and use-package
setup.
(require 'package) (setq package-enable-at-startup nil) (setq package-archives '(("org" . "http://orgmode.org/elpa/") ("gnu" . "http://elpa.gnu.org/packages/") ("melpa" . "http://melpa.org/packages/"))) (package-initialize) (unless (package-installed-p 'use-package) (package-refresh-contents) (package-install 'use-package)) (require 'use-package) (setq use-package-always-ensure t)
Exec-path-from-shell
As part of their ongoing war against developers, the Captains of industry in Cupertino have designed Macs so that GUI Emacs only ever reads environment variables from the default shell. This is obviously user hostile behavior since their system shell (bash) is from 2007, so nobody should use that shell. This package is designed to work around this.
(use-package exec-path-from-shell :custom (shell-file-name "/usr/local/bin/zsh") (exec-path-from-shell-variables '("PATH" "MANPATH" "PKG_CONFIG_PATH") :init (if (string-equal system-type "darwin") (exec-path-from-shell-initialize)))
Helm
(use-package helm :init (setq helm-split-window-default-side 'other) (helm-mode 1))
Org-mode
Org-mode is the thing that brought me to Emacs. There are a lot of customizations here.
Org QoL
This handles the Quality of life part of Org-mode. First, org-bullets beautifies the leading asterisks. Then, we hide the extra asterisks. Finally, we set the global shortcuts for org-store-link, org-agenda, and org-capture.
(use-package org-bullets :custom (org-hide-leading-stars t) :hook org)
Academic Paper Writing
Here are the settings I have used for writing academic papers. I write my papers in orgmode, then export them to PDF via LaTeX. This is one of the most fleshed out areas of my dotfiles, in large part because 90% of my Emacs time for the last 6 months has been related to my Master's Thesis in some way.
This sets the default bibtex file. I rarely use this in real projects. Most projects set their own bibliography.
(use-package helm-bibtex :custom (helm-bibtex-bibliography '("~/zotero.bib")) (reftex-default-bibliography '("~/zotero.bib")) (bibtex-completion-pdf-field "file") :hook (Tex . (lambda () (define-key Tex-mode-map "\C-ch" 'helm-bibtex))))
I use org-ref to manage my citations in my papers. This is the section for the support and configuration for org-ref.
(use-package org-ref :custom (org-ref-default-bibliography "~/zotero.bib"))
This block corrects the way that the TOC is displayed. It is SUPER important for the apa6 class that follows. APA6 has very strong opinions about how the TOC should be displayed, opinions that conflict directly with the default settings for exporting from orgmode.
(defun org-export-latex-no-toc (depth) (when depth (format "%% Org-mode is exporting headings to %s levels.\n" depth))) (setq org-export-latex-format-toc-function 'org-export-latex-no-toc)
Add apa6 to the org-latex-classes export for writing academic papers in APA6 format.
(add-to-list 'org-latex-classes '("apa6" "\\documentclass{apa6}" ("\\section{%s}" . "\\section*{%s}") ("\\subsection{%s}" . "\\subsection*{%s}") ("\\subsubsection{%s}" . "\\subsubsection*{%s}") ("\\paragraph{%s}" . "\\paragraph*{%s}") ("\\subparagraph{%s}" . "\\subparagraph*{%s}")))
This describes the export process from orgmode to LaTeX to PDF.
(setq org-latex-pdf-process '("latexmk -pdflatex='pdflatex -interaction nonstopmode' -pdf -bibtex -f %f"))
PDF-tools
Linux Install
The shell block below installs the libraries needed to run this on Linux. There is another workflow you have to use on MacOS, but I have not gotten it to work yet
sudo apt install libpng-dev zlib1g-dev sudo apt install lib-poppler-glib-dev sudo apt install libpoppler-private-dev sudo apt install imagemagick
MacOS Install
This assumes you have homebrew installed.
brew install poppler automake
PDF-tools config
PDF-tools allows me to annotate and view pdfs INSIDE emacs. This ties in with helm-bibtex for lit reviews. It's super awesome when it works, but thanks to Apple....
(use-package pdf-tools :pin manual ;; manually update :config ;; initialise (pdf-tools-install) ;; open pdfs scaled to fit width (setq-default pdf-view-display-size 'fit-width) ;; use normal isearch (define-key pdf-view-mode-map (kbd "C-s") 'isearch-forward) :custom (pdf-annot-activate-created-annotations t "automatically annotate highlights"))