DEVONthink for Scientists: Part II
In a recent post I gave an introduction to DT. Now let me give some examples of how to use it.
Here is one of the last things I did on my personal database. I put a new quotation from Sean Carroll. It was a small paragraph including the sentence "Things are not intrinsically interesting, they are found to be interesting by people". DT immediately suggested a small clipping from David Deutsch including the sentence "In the long run, the distinction between what is interesting and what is boring is not a matter of subjective taste but an objective fact". This is nothing but immensely thought provoking and I would never remember it. On the scientific side, it helps to locate standard references. For example, when I start writing something about renormalization, it reminds me to look at Peskin Chapter 12 which has a summary inside the database. I hand picked my notes and archive and organized them into folders very well. Hence it may not be very surprising.
Let's look at some bulk databases I built. Joanna Karczmarek of Rutgers kindly supplied me the abstracts of all the hep-th papers until July 2005, that makes 37586 items, 110,720 unique and 4,749,379 total words. This is not only a huge database but also very disorganized. Abstracts are collected into folders according to submission months. This is probably the best thing you can do to confuse an contextual search engine. But still DT did quite a nice job. Here is an example:
I chose the paper hep-th/9907001 of Parikh and Wilczek called "Hawking Radiation as Tunneling". First thing one can notice is that suggestions are dominated by the papers of the authors. This makes sense because authors' surnames are the most characteristic words for each document. But even this is not done blindly. DT offers bunch of Parikh papers, who wrote about the same subject extensively but no other Wilczek papers. Figuring out the authors' related other papers are not very interesting or useful, because I can do it myself. So here are the top five papers from other authors: hep-th/0505266 (Hawking Radiation as Tunneling through the Quantum Horizon), hep-th/0110289 (Radiation via Tunneling in the Charged BTZ Black Hole), hep-th/0504188 (Tunneling through the quantum horizon), hep-th/0503081 (Hawking Radiation as Tunneling for Extremal and Rotating Black Holes), hep-th/0207247 (Radiation via Tunneling from a de Sitter Cosmological Horizon). It is nice to see that some of the above papers are actually citing Parikh and Wilczek. Also the last one is interesting, in a sense that it is related but I wouldn't expect a computer to recognize that.
Another example is the database I built from a snapshot of the open mathematics encyclopedia PlanetMath with 4570 entries, 31,007 unique and 3,053,620 total words. This is a nice one because titles are organized into subject folders. When you look at entry for the Einstein field equations, here are the top five related entries: Ricci tensor, Einstein summation convention, Schrodinger's wave equation, Hartman-Grobman theorem, linear time invariant systems. Relation of the forth one is not apparent to me, may be this is something I should learn. Here is a different one: If I take Hamilton equations, it gives symplectic manifold, Hamiltonian vector field, cotangent bundle, Examples of symplectic manifolds, Poincare 1-form. (Yeah, I am too lazy to put a link for each word.) I leave it to you to decide whether these are useful enough. May be not, but the main purpose of the program is to help your memory, and it does this very well.
Some people argue that the best thing to do is to throw everything inside a single database. Besides the performance issues, if you put all of the above inside your personal database, they will naturally dominate the suggestion results. So my advice is to go for the Pro version and experiment with many databases till you are satisfied with the combinations.
Here is my bottom line; DEVONthink may not be perfect (when compared to a human who does not forget) but there is nothing better or even something to compare out there.
I know that there some services which try to do the same with the whole internet. I tried blinkx and it gives awful results, nothing useful and I haven't tried Watson yet, it is Windows-only. Although, I think this is a beautiful idea and we will use and enjoy such tools soon. That will be the next Google.
Here is one of the last things I did on my personal database. I put a new quotation from Sean Carroll. It was a small paragraph including the sentence "Things are not intrinsically interesting, they are found to be interesting by people". DT immediately suggested a small clipping from David Deutsch including the sentence "In the long run, the distinction between what is interesting and what is boring is not a matter of subjective taste but an objective fact". This is nothing but immensely thought provoking and I would never remember it. On the scientific side, it helps to locate standard references. For example, when I start writing something about renormalization, it reminds me to look at Peskin Chapter 12 which has a summary inside the database. I hand picked my notes and archive and organized them into folders very well. Hence it may not be very surprising.
Let's look at some bulk databases I built. Joanna Karczmarek of Rutgers kindly supplied me the abstracts of all the hep-th papers until July 2005, that makes 37586 items, 110,720 unique and 4,749,379 total words. This is not only a huge database but also very disorganized. Abstracts are collected into folders according to submission months. This is probably the best thing you can do to confuse an contextual search engine. But still DT did quite a nice job. Here is an example:
I chose the paper hep-th/9907001 of Parikh and Wilczek called "Hawking Radiation as Tunneling". First thing one can notice is that suggestions are dominated by the papers of the authors. This makes sense because authors' surnames are the most characteristic words for each document. But even this is not done blindly. DT offers bunch of Parikh papers, who wrote about the same subject extensively but no other Wilczek papers. Figuring out the authors' related other papers are not very interesting or useful, because I can do it myself. So here are the top five papers from other authors: hep-th/0505266 (Hawking Radiation as Tunneling through the Quantum Horizon), hep-th/0110289 (Radiation via Tunneling in the Charged BTZ Black Hole), hep-th/0504188 (Tunneling through the quantum horizon), hep-th/0503081 (Hawking Radiation as Tunneling for Extremal and Rotating Black Holes), hep-th/0207247 (Radiation via Tunneling from a de Sitter Cosmological Horizon). It is nice to see that some of the above papers are actually citing Parikh and Wilczek. Also the last one is interesting, in a sense that it is related but I wouldn't expect a computer to recognize that.
Another example is the database I built from a snapshot of the open mathematics encyclopedia PlanetMath with 4570 entries, 31,007 unique and 3,053,620 total words. This is a nice one because titles are organized into subject folders. When you look at entry for the Einstein field equations, here are the top five related entries: Ricci tensor, Einstein summation convention, Schrodinger's wave equation, Hartman-Grobman theorem, linear time invariant systems. Relation of the forth one is not apparent to me, may be this is something I should learn. Here is a different one: If I take Hamilton equations, it gives symplectic manifold, Hamiltonian vector field, cotangent bundle, Examples of symplectic manifolds, Poincare 1-form. (Yeah, I am too lazy to put a link for each word.) I leave it to you to decide whether these are useful enough. May be not, but the main purpose of the program is to help your memory, and it does this very well.
Some people argue that the best thing to do is to throw everything inside a single database. Besides the performance issues, if you put all of the above inside your personal database, they will naturally dominate the suggestion results. So my advice is to go for the Pro version and experiment with many databases till you are satisfied with the combinations.
Here is my bottom line; DEVONthink may not be perfect (when compared to a human who does not forget) but there is nothing better or even something to compare out there.
I know that there some services which try to do the same with the whole internet. I tried blinkx and it gives awful results, nothing useful and I haven't tried Watson yet, it is Windows-only. Although, I think this is a beautiful idea and we will use and enjoy such tools soon. That will be the next Google.








Links to this post:
Create a Link
<< Home