Emacs for Study: PDF Conversion and Editing
Sometimes in the process of studying it is desirable to take the papers you are researching, which usually come as PDFs, and to convert them to an editable form. After you’re finished editing the final product could be PDF, Microsoft Word, web HTML, or something else. As this is a process that I’ve required for a number of different reasons, this tutorial covers the tools I use. Unlike other tutorials I’ve done, this process utilizes several tools beyond emacs. In particular it uses the open-source PDFToText program, part of the excellent Poppler library of PDF-management tools. The process I’ll show is Linux based but the tools are all open-source, so adapting for Windows or Mac should be relatively painless. I’d love to hear your Windows or Mac techniques in the comments.
The first step in both methods below is to acquire PDFToText. On my Fedora Linux, this is accomplished with the command line:
sudo yum install poppler-utils
Next, of course, acquire the desired PDF and navigate to its directory. From there we have several methods.
Method 1 (simpler): PDF -> Text -> Org
-
Convert the PDF to text.
pdftotext mypdf.pdf
This produces the file:
mypdf.txt
-
Save the text file as a .org file. This can be done at least two ways:
- In emacs, opening the text file and using C-x C-w (
write-file
) and specify a .org extension - In your shell, use
mv mypdf.txt mypdf.org
and THEN open in emacs, which will now open in org mode.
- In emacs, opening the text file and using C-x C-w (
-
Use orgmode to make your edits
Method 2 (faster): PDF -> Org
If you are feeling more bold you can skip the text format altogether, either going straight to a new org file or appending to an existing org file. This method uses shell piping.
-
Convert the PDF to text.
pdftotext mypdf.pdf - >> mypdf.org
The hyphen tells pdftotext to output directly to the command line, and the » directs this output to append to or create the specified file. This produces the file: mypdf.txt
-
Use orgmode to make your edits
Note that because >>
appends, you can create an org file with preliminary data first, if you so desire, and leave a section open at the end for the text. You might want to do this if you already have notes on the file, or have already specified a title, etc.
Last step: Editing and export
Now we are back in emacs, using org mode. You should be editing the previously created file. Orgmode makes heirarchical organization simple, and also has a wide variety of export options. In particular, you can create a heading by starting a line with the number of asterisks equal to the desired heading level; note that there should be no whitespaces. You can either manually type the number of asterisks, or you can place one asterisk and use M-RIGHT and M-LEFT to automatically change the level of the heading. In addition, the six text mark-up options covered in a previous video for bold, italics, etc. are all possible if necessary.
Once you’ve edited the content as you wish, there are a couple of head-line options you may need.
Specifying a title and/or authors
Orgmode allows you to preface your document as follows to specify an export title and authors fields, which will be specially handled by whatever format you choose for export.
#+TITLE: My paper
#+AUTHOR: Tory S. Anderson, Org Mode, PDFToText
Disabling header numbering
By default, orgmode exports your documents with sections numbered like 1.2.1, that is, .. and so on. Sometimes this is very handy, but other times I want unnumbered sections. This is accomplished by adding the following line to the header items.
#+OPTIONS: num:nil
Exporting
All in all, your file should now look something like this:
#+TITLE: My paper
#+AUTHOR: Tory S. Anderson, Org Mode, PDFToText
#+OPTIONS: num:nil
* Introduction
Intro text under level 1 header
** The chief problem
Sample excerpt
*** specially applying to
Text for Level 3 header, etc
Once you’ve tailored and annotated your document to you liking, you can choose from Orgmode’s extensive export options using:
C-c C-e and the resulting menu will guide you through a selection process. Note that if the format you want isn’t listed, it may need to be added. Some of what I add in my emacs init file can be seen below.
;; Org to HTML export
(setq
;; remove preamble
org-html-preamble nil
;; remove "made with org"
org-html-postamble nil
;; remove table of contents
org-export-with-toc nil)
(eval-after-load "org"
'(require 'ox-odt nil t)) ;; enable export to ODT
(eval-after-load "org"
'(require 'ox-md nil t)) ;; enable export to Markdown
(eval-after-load "org"
'(require 'ox-beamer nil t)) ;; enable export to Beamer Latex PDF slideshow
Please be aware that with orgmode exporting, as with all things emacs, you have a huge degree of options and customizability; for all your other needs, you are encouraged to check out the manual section on exporting in orgmode, here.
Other Notes
- Linebreaks will be exported on lines separated by one or more empty lines
- Images can be exported, as shown at the end of my previous YouTube Video or its accompanying blog post.