Linking to Data – Effect on Citation Rates in Astronomy
In the paper Effect of E-printing on Citation Rates in Astronomy and Physics we asked ourselves the question whether the introduction of the arXiv e-print repository had any influence on citation behavior. We found significant increases in citation rates for papers that appear as e-prints prior to being published in scholarly journals.
This is just one example of how publication practices influence article metrics (citation rates, usage, obsolescence, to name a few). Here we will be examining one practice that is very relevant to astronomy: is there a difference, from a bibliometric point of view, between articles that link to data and articles that do not? Specifically, is there a difference in citation rates between these classes of articles?
Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of “furthering science”. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. There seems to be a consensus that sharing data is a Good Thing. Let’s ignore the “why” and “how”, and focus on the sharing. You need to have both a willingness and a publication mechanism in order to create a “practice”. This is where citation rates come in: if we can say that papers with links to data get higher citation rates, this might increase the willingness of scientists to take the extra steps of linking data sources to their publications.
Using the data holdings of the SAO/NASA Astrophysics Data System we can do the analysis and see if articles with links to data have different citation rates. For the analysis, we used the articles published in The Astrophysical Journal (including Letters and Supplement), The Astronomical Journal, The Monthly Notices of the R.A.S. and Astronomy & Astrophysics including Supplement), during the period 1995 through 2000. Next we determined the set of 50 most frequently used keywords in articles with data links. The articles to be used for the analysis were obtained by requiring that they have at least 3 keywords in common with that set of 50 keywords. This resulted in a set of 3814 articles with data links and 7218 articles without data links. A random selection of 3814 articles was extracted for this set of 7218 articles.
First, we’ll create a diagram just like the one in figure 4 of the paper Effect of E-printing on Citation Rates in Astronomy and Physics, which shows the number of citations after publication as an ensemble average. In this figure 4 we used the mean number of citations (over the entire data set) to normalize the citations. For our current analysis we will use the total number of citations for normalization.
Our analysis shows that articles with data links are indeed cited more than articles without these links. We can say a little bit more by looking at the cumulative citation distribution. The figure below shows this cumulative distribution, normalized by the total number of citations for articles without data links, 120 months after publication.