Indexing Matters – The Importance of Search Engine Behavior
What a search engine returns on a user query largely, if not completely, determines its usefulness for that user. Looking at usage bibliometrics allows to classify the behavior of different types of users, for example (see e.g. Usage Bibliometrics by Michael J. Kurtz and Johan Bollen). There are voices claiming that Google Scholar is a “threat” to scholarly information retrieval services (like the ADS and WoS, for example). The main reason why this is not the case becomes clear when we look at usage statistics. Here I will make a comparison of readership patterns from ADS and Google Scholar queries, as observed in ADS’s access logs. These readership patterns will give us the obsolescence of astronomy articles by ADS and Google Scholar users. In order to zoom in on people who use ADS professionally, I will only regard ADS users who query ADS 10 or more times per month. The journals I have used in the analysis are the main astronomy journals: Astrophysical Journal, Astronomical Journal, Monthly Notices of the R.A.S. and Astronomy & Astrophysics. In the figure below, a comparison is made between readership of frequent ADS users (read “professional astronomers”) and Google Scholar users.

Comparison of readership patterns from ADS and Google Scholar queries, as observed in ADS’s access logs. The red line marked with open circles shows the readership use by people using the ADS search engine. The blue line marked with 'x' corresponds with the readership use by people who used the Google Scholar engine. The orange line marked with closed circles shows the citation rate to the articles, while the purple line marked with ’+’ respresent their total number of citations.
All the quantities in the figure above are on a per article basis and have been normalized by the 1987 value. This was done so that we can compare apples with apples.
The fact that the obsolescence through Google Scholar is strongly correlated with the total number of citations is no coincidence: this is a direct consequence of the correlation between the PageRank and the total number of citations (see e.g. Chen et al. (2007) and Fortunato et al. (2006)). The consequence of this correlation is the following: Google Scholar does not provide what professional astronomers (and other frequent users) want. Google Scholar readership correlates with the reading habit of students. In short, Google Scholar currently is no threat to scholarly information retrieval services.
Bibliography
- Kurtz, Michael J. and Bollen, Johan (2010), “Usage Bibliometrics”, Annual Review of Information Science and Technology, vol 44, p. 3-64
- Henneken, E. et al. (2009), “Use of astronomical literature – A report on usage patterns”, Journal of Informetrics, vol. 3, iss. 1, p. 1
- Fortunato, S., Flammini, A., & Menczer, F. (2006), “Scale-Free Network Growth by Ranking”, Physical Review Letters, 96, 218701
- Chen P., Xie H., Maslov H., and Redner, S., (2007), “Finding scientific gems with Googles PageRank algorithm”, Journal of Informetrics, 1, 8