From E-print to Journal Article – Concordance
How many e-prints on arXiv do actually, eventually appear in some scholarly journal or conference proceedings? Of course, that varies from discipline to discipline, and even within disciplines. There are groups of scientists who feel that they can do without scholarly journals and just publish e-prints. The advantages of submitting e-prints is a subject for a separate blog. If you can’t wait, there is an article on The Effect of Use and Access on Citations and on discussing E-prints and journal articles in astronomy: a productive co-existence.
The diagram below shows measures of “concordance” for some arXiv categories. It shows for each category the fraction of e-prints that appeared as journal article. This concordance is the result of e-prints against records in the SAO/NASA Astrophysics Data System (ADS).

First if all, the “dip” for 2009 is artificial: there is still a significant amount of e-prints that haven’t appeared yet in a journal or a conference proceedings. Especially conference proceedings usually take a long time to materialize. Because of the completeness level of ADS in astronomy and astrophysics, it is reasonable to say that the “concordance fraction” observed for astro-ph is the actual concordance fraction. In short, close to 90% percent of astro-ph e-prints are actual “pre-prints”. Since astro-ph has been sub-divided into sub-categories, it is interesting to see what the concordance is for these sub-categories. Here are the numbers for 2009:
astro-ph.CO: 77%
astro-ph.EP: 81%
astro-ph.GA: 85%
astro-ph.HE: 70%
astro-ph.IM: 61%
astro-ph.SR: 86%
The relatively low percentages for “Cosmology and Extragalactic Astrophysics” (CO), “High Energy Astrophysical Phenomena” (HE) and “Instrumentation and Methods for Astrophysics” (IM) are interesting. I have heard that especially in theoretical cosmology there are groups of people who only use arXiv as their publication vehicle. The instrumentation section contains reports that probably never appeared anywhere else. These numbers could also indicate that one discipline has relatively more conferences than others, and for conference proceedings the ADS usually depends on editor or organizer initiatives for meta data. I’m sure I’ll return to this subject at a later date. The publishing industry is (necessarily) going through a paradigm shift, and the role of e-prints is intimately connected to this process.