In Reading, Extracting And Storing Scholarly Information To Supercharge The Writing Process, I wrote about how I extracted both highlights and full-text of entire manuscripts in order to give me granular access to information. Although I’ve continued my extraction of highlights, the extraction of full text (by highlighting the entire document) proved much too time consuming. Instead, I’ve been experimenting with an alternative that is much quicker (as suggested by Andrew in the comments of the entry)—saving the entire manuscript as single-page PDF documents. Here is what I’ve been doing.

After highlighting a manuscript in Highlights.app, I extract my highlights (along with color tags) to Devonthink Pro using the built in export function. By default, Highlights.app saves my extracted highlights files to the DTP Inbox. I move the folder from the DTP Inbox to my Desktop. Within the moved folder I make two new sub-folders: 1. HighlightsX and 2. PDFx. I then move the extracted markdown files to the HighlightsX sub-folder.Within Bookends, I export the annotated pdf to my desktop. There, I open the file with Adobe Acrobat (any app able to add headers and split documents will work).

In the left most header I put the Bookends citation (Bookends: Edit: Copy Citation), in the center I put the DOI number, in the right header, I put the Bookends link (Bookends: Edit: Copy Hypertext Link: Copy as Text). These headings are added to each page of the PDF. I then split the manuscript into multiple single-page documents. I save the split PDF documents to the PDFx sub-folder.

I then move the parent folder my Dropbox Writing folder and Index the folder using DTP. Within DTP I make sure both the main folder and the subfolders will have their tags included (option click on a folder in DTP and make sure “Exclude from Tagging” is unchecked)

.Although this method is faster—there are trade offs. The “Find Also” feature of DTP depends on the words in a document. A document with too many words dilutes the accuracy of the semantic search. A page of text has far more words than an extracted paragraph and thus is slightly less accurate in finding granular information. The other trade-off comes in the amount of text that must be read when searching. It is faster to scan a paragraph versus a whole page of text in a PDF. Regardless, the savings in time using this method far exceeds the trade-offs in accuracy.Let me know what you think.