Citing and Finding Data 

 How much slower would scientific progress be if the near universal standards for scholarly citation of articles and books had never been developed.  Suppose shortly after publication only some printed works could be reliably found by other scholars; or if researchers were only permitted to read an article if they first committed not to criticize it, or were required to coauthor with the original author any work that built on the original.  How many discoveries would never have been made if the titles of books and articles in libraries changed unpredictably, with no link back to the old title; if printed works existed in different libraries under different titles; if researchers routinely redistributed modified versions of other authors' works without changing the title or author listed; or if publishing new editions of books meant that earlier editions were destroyed?  How much less would we know about the natural, physical, and social worlds if the references at the back of most articles and books were replaced with casual mentions, in varying, unpredictable, and incomplete formats, of only a few of the works relied on? 

 These questions are all obviously counterfactuals when it comes to printed matter, but remarkably they are entirely accurate descriptions of our [in]ability to reliably cite, access, and find quantitative data, all of which remain in an entirely primitative state of affairs. 


 Micah Altman and I have just written a paper on this subject that may be of interest.  The title is "A Proposed Standard for the Scholarly Citation of Quantitative Data" and a copy can be found  here .  The abstract follows.   Comments  welcome! 

 An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals.  A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books.  We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches.  Although the digital library field includes numerous creative ideas, we limit ourselves to onl those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.