Tuesday, September 20, 2005

DEVONthink for Scientists: Part I

Managing information is a very important part of a modern scientists life. Lab notebooks, ideas scribbled on papers, proceedings, papers to be written, papers written 10 years ago, papers to be read, papers to be refereed and of course physics blogs. Many people have many different ways to cope with it. Some are very tidy with classified hanging folders and highly maintained list of bibliographies and some are very messy. I personally hate bunch of papers flying around filling desk and shelf space. I can never find things there. What can I do, I was using Yahoo before reading my first paper! So for the last few years I was in a search for effective collecting techniques for information. My dear idea notebook served me quite a while but it was just better than nothing. I couldn't easily cross reference and it didn't integrate into paper folders. But finally I feel like the issue is settled for quite a long while. Today I will talk about my final decision: DEVONthink.

DT is a (Mac-only) database management software for every kind of electronic document. It lately became well-known (read fashionable) thanks to Steven Johnson's NYTimes article and following blogs (here and here). Probably journalists and writers are the primary market for this software but I believe it is an invaluable tool also for scientists. So I'll try to give review (or a praise) of it for you.

Structure of the database for your documents in DT, looks like the good old file structure of your hard drive with folders and subfolders. You put in any kind of document (like txt, rtf, pdf, html, tex as well as image and quicktime files and yes doc!) inside and classify in a dedicated place where you can search inside the documents, edit and give links (with wiki capabilities) to each other. Up to this point you have many alternatives that you can do the same. But DT shines now on with its unmatched, intelligent text indexing abilities.

DT looks (at least) two things in your database. Which words go with which words inside the documents you chose and which files go which files inside the folders you classified. Using these data it decides which are the words that is characteristic to a file or a folder. It uses this information for the two magic buttons. First one is "Classify". When you put a new document "Classify" suggests you where to put it in your database. I "usually" know where I want to put it but just for fun I always click and see with joy that my folder is among the top suggestions and it really helps when I am undecided. Second one is "See Also". This button is like your personal librarian. It suggests documents related to the current one. After a while it becomes a common thing that DT reminds you little notes written down from a book or an abstract from an ArXiv read months ago and already forgotten. I hadn't recognized how much I forget, until using this and I am 24.

Both functions start to give reasonable results after tens of documents placed in a few folders and get better and better as your structure get bigger and better organized into a few levels of subfolders. I am very satisfied with the speed. It works seamlessly on my 1.5GHz, 512MB Powerbook with my primary database which has about 1.5K documents with 75K unique and 1.7M total words. (There are so much unique words because I have Turkish documents as well in the database. I am planning to move them into a separate database later. And I should say that it works also perfect in Turkish. It (statistically significant) sometimes guesses the (folder) name of columnists from just the text!)

You can also ask for related documents for just the selected part of the text which is very useful. But unfortunately it can not do the reverse yet. I mean, it can not suggest a certain paragraph of a document. So putting into too long pieces (like books) probably won't be very useful (may be just for search). Steven johnson suggest that the ideal length is 50-500 words, but I think physics papers are OK in length. Getting a relevant paper is enough most of the time, you can figure out where to read (which is the conclusion).

What I put in? Notes for my ideas, notes from books and papers I read, daily musings, abstracts (sometimes whole papers), blogs, news, columns, chapter summaries for Sakurai and Peskin, This Week's Findings in Mathematical Physics archives, theorems from Wikipedia etc etc etc.

One more unique feature is the fuzzy search. When you turn it on it not only searches for the words you supplied but also "related" words and words that look like yours with some suffixes and misspellings.

What else I like about it?

* There is a full screen edit mode. It takes you away from all the distractions of internet. Just you and your thoughts. I write everything here now. Plain texts can be seen green on black background which is very easy on the eye.

*If you like adding equations (preferably LaTeX) to your notes (like any physicist). You can use universal RTF format and Equation Service to put inline LaTeX created mini PDF formulas. You can go back and forth LaTeX and PDF as many times as you like. Mac services menu makes this process just a single key combination. (I use cmd+.)

* Integrates perfectly with your Safari. Again thanks to services menu any selection goes to your database with single key combination (I use cmd+[ for txt and cmd+] for rtf). There is also a browser in DT good for html capture and site sucker for whole site downloads.

* You can read your RSS and Atom feeds and save items as separate files with a script. There is a great Apple script support if you know, if not there is big library of them.

* You can have replicates (synchronized mirrors) of files in different folders.

* Exporting files as you put them in. This is very important if you want to move somewhere else one day. For example you can't do it with OneNote.

* They have a fast tech-support and lively forums with lots of writers, journalists, students and academicians.

* And many other things I haven't yet used so far; like index, summarize, concordance, searchable file comments etc.

DT comes in two flavors Personal (single database, no site sucker, no RSS etc) for $40 and Pro (everything above) for $75. There is %15 discount for students (yes grads also). You can try it for 150 using hours for free.

I hope I have convinced that you have never seen a software like this before, highly recommended. I wished there were some competition.

In the second part of my review (praise) I will give additional examples for using it as a scientist. Please share your experience in comments if you are a DT user.

UPDATE: Here is the second part.

Links to this post:

Create a Link

<< Home

Read them all?
See the archives for more!
Monologues to AI