Linking to Data – Effect on Citation Rates in Astronomy

In the paper Effect of E-printing on Citation Rates in Astronomy and Physics we asked ourselves the question whether the introduction of the arXiv e-print repository had any influence on citation behavior. We found significant increases in citation rates for papers that appear as e-prints prior to being published in scholarly journals.

This is just one example of how publication practices influence article metrics (citation rates, usage, obsolescence, to name a few). Here we will be examining one practice that is very relevant to astronomy: is there a difference, from a bibliometric point of view, between articles that link to data and articles that do not? Specifically, is there a difference in citation rates between these classes of articles?

Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of “furthering science”. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. There seems to be a consensus that sharing data is a Good Thing. Let’s ignore the “why” and “how”, and focus on the sharing. You need to have both a willingness and a publication mechanism in order to create a “practice”. This is where citation rates come in: if we can say that papers with links to data get higher citation rates, this might increase the willingness of scientists to take the extra steps of linking data sources to their publications.

Using the data holdings of the SAO/NASA Astrophysics Data System we can do the analysis and see if articles with links to data have different citation rates. For the analysis, we used the articles published in The Astrophysical Journal (including Letters and Supplement), The Astronomical Journal, The Monthly Notices of the R.A.S. and Astronomy & Astrophysics including Supplement), during the period 1995 through 2000. Next we determined the set of 50 most frequently used keywords in articles with data links. The articles to be used for the analysis were obtained by requiring that they have at least 3 keywords in common with that set of 50 keywords. This resulted in a set of 3814 articles with data links and 7218 articles without data links. A random selection of 3814 articles was extracted for this set of 7218 articles.

First, we’ll create a diagram just like the one in figure 4 of the paper Effect of E-printing on Citation Rates in Astronomy and Physics, which shows the number of citations after publication as an ensemble average. In this figure 4 we used the mean number of citations (over the entire data set) to normalize the citations. For our current analysis we will use the total number of citations for normalization.

Our analysis shows that articles with data links are indeed cited more than articles without these links. We can say a little bit more by looking at the cumulative citation distribution. The figure below shows this cumulative distribution, normalized by the total number of citations for articles without data links, 120 months after publication.


This graph shows that for this data set, articles with data links acquired 20% more citations (compared to articles without these links).

About these ads

~ by anopisthographs on June 3, 2011.

4 Responses to “Linking to Data – Effect on Citation Rates in Astronomy”

  1. [...] Linking to Data – Effect on Citation Rates in Astronomy (via Meters, Metrics and More) I’m not a big fan of bibliometricism …but this is definitely Quite Interesting In the paper Effect of E-printing on Citation Rates in Astronomy and Physics we asked ourselves the question whether the introduction of the arXiv e-print repository had any influence on citation behavior. We found significant increases in citation rates for papers that appear as e-prints prior to being published in scholarly journals. This is just one example of how publication practices influence article metrics (citation rates, usage, obsolesc … Read More [...]

  2. I’d love it to be true that just by linking to data one can increase citation rates, but I suspect this isn’t comparing like to like. It would be good to see an analysis of this that showed citations for papers linking to data that excluding large ‘everyone will cite’ data *release* papers for major surveys.

    • Actually, in this analysis I feel I am comparing apples with apples. I checked the data set for homogeneity and it is homogeneous in every way I checked. The papers aren’t just those linking to big, important data sets. There is no “cherry picking” (for more “citable” data sets) involved, because that would show up as an inhomogeneity in the citation distribution. So, linking to data increases the probability of getting cited more, just like submitting a paper as e-print does, but linking to data does not guarantee you will get more citations. It will be interesting to see what will happen when self-citations are excluded.

  3. [...] to now be investigating the effect of data sharing practices on citation rates. A June 2011 post on Meters, Metrics and More (a blog by Henneken) claims that articles with links to underlying data receive [...]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: