Through a glass darkly: citation rankings in paleontology
Note: this blog first appeared on the website for the Paleontological Society.
In the old days, before the advent of the internet and the personal computer, you kept up on the scientific literature by going to the library and examining the stacks of new journals that had just arrived. If you wanted to pursue an area of research and write your own paper, you searched the library stacks, guided by the list of references in the back of published papers and the huge published volumes, such as the Bibliography and Index of Geology, that catalogued the papers on the subject. You may also find out, perhaps guided by more senior scientists, that there were certain critical papers in your area that you needed to read.
All this, of course, has changed. You can receive emails from journals announcing newly published, or about to be published papers. The development of massive online reference databases, such as Web of Science, GeoRef, and Google Scholar have made searching for papers on a topic lightning fast, with the ability to immediately download the publication and enter it into a personal bibliographic database. And most usefully, the databases have links to both the papers in the paper’s reference list and those that, in turn, cite the publication in question, with each papers number of citations being tabulated. To a first approximation, one can assume that the more citations a paper has, the more important it is to read.
This is all wonderful; it has made doing due diligence of finding previous work, especially key studies, in a field far easier and less time consuming. But as with so much else of the online universe, such as social media, it has been coopted for other uses, some not so positive. In particular, it has allowed the quantification in science of what work is “important.” A paper that is heavily cited is assumed to be important, a journal that has lots of highly cited papers is said to have a “high impact factor,” and a researcher who publishes papers that receive lots of citations is clearly more important than one whose work seemingly receives little attention from their peers. The ability to quantify “importance” of a scholar was a revelation to those who wanted to assess their worthiness, especially when it came to granting tenure and promotions.
The ability to count citations to compare among researchers has given rise to a plethora of metrics designed to measure their relative importance more accurately. Although total citations of scholar’s output may seem the most useful, this could represent either a single highly cited paper, with the remainder being ignored, or many papers with a few citations each. One popular way to do this is the Hirsch h-index: the number x of papers by an author that have received at least x citations. A researcher with an h-index of 7 has seven papers with at least seven citations, whereas one with an h-index of 30 has thirty publications with at least thirty citations each. Obviously, increasing one’s h-index becomes increasingly difficult the higher it is.
But there are additional complexities. It is increasingly rare that a scientific paper is written by a single author; in some disciplines there may be ten’s or even hundred’s of coauthors listed. Each of these will get a single citation recorded. Having ones name on papers with multiple co-authors is one way to increase your citation rate and favors scientists who work with larger collaborative research groups (Nielsen and Andersen 2021). So indices have been developed that account for multiple authors, such as Schreiber’s hm (Schreiber 2008).
In addition, it is not unusual for an author to cite their own previous work on a subject (I often do). Excessive self-citation, however, is one way to increase the seeming importance of your work. Another way to game the system is to put together a cohort of people who will cite each other’s work, whether relevant or not, the so called “citation farms.” Finally, there are major differences in citation rates among scientific fields due to their size. A highly cited paper among paleontologists, for example, will have far fewer citations than even a sparsely cited publication by the far larger group of oncologists.
All this exposition is to provide the context for a related pair of recent papers that appeared in PLOS Biology (Ioannidis et al. 2019, Ioannidis et al. 2020). In the first paper, they review six citation metrics, describe a composite metric based on all of them, and present a database that includes the 100,000 most cited authors across all fields of science. In the second paper, and the one I will focus on here, they add an additional nearly 60,000 authors who rank within the top 2% within their disciplines. This allows the inclusion of fields, such as paleontology, that may have lower “citation densities.” Ioannidis et al. (2020) divide scientists among 22 fields and 176 subfields of research, with each researcher belonging to two subfields based on their publications, using a journal classifcation scheme called Science-Metrix. In addition to the various metrics, the authors include the percentage of papers that include self-citations, as well as metrics corrected for self-citation. Data is given for the period 1996–2019 (“career”) and for just the year 2019 (“snapshot”). Finally, they give the ranking of researchers not only overall, but in their own disciplines. All the records derived from Elsevier’s massive Scopus database and include nearly 700,000 authors who have published five or more papers. Here, I discuss what their analyses reveal about the status of paleontology in the citation universe, focusing on the snapshot for the year 2019.
Paleontology is one of the identified subfields. Within that, there are 18,435 authors who have at least five papers. This is 0.26% of the total number of such authors across all disciplines. Paleontology ranks 96th of the 176 subfields. For comparison, Geochemistry and Geophysics ranks 3rd (70,197 authors) and Evolutionary Biology 84th. What is surprising, however, is that Geology ranks 115th, Oceanography 109th, and Archeology 120th. Something seems amiss. The prime suspect may be how journals and thus the scientists are classified. Nearly all the “geochemists and geophysicists” have Geology as their second subfield; I suspect many of them would consider themselves geologists first. The two top-ranked scientists in Paleontology are Wallace Broecker and Nicholas Shackleton. This strongly suggests that the subfield “Paleontology” also includes paleoclimatology and paleooceanography. (I am more than willing to have the wide tent).
Restricting ouselves only to the “top 2%,” there are 493 researchers identified with Paleontology as their first subfield and an additional 474 who give it as their second subfield. Together these are 0.61% of the total “top scientists.” In Figure 1, I show the distribution of other subfields among those for whom paleontology is either the first or second subfield. Not surprisingly, the majority of these paleontologists are associated with one of the Earth science disciplines. There is also strong evidence of interactions with evolutionary biology, ecology, marine biology, and anthropology. The number of physics and engineering related researchers I suspect reflects the abiding interest of practitioners in those fields for topics such as mass extinction.
I also examined the roles of multiple authorship and of self-citation (citing ones own papers). The distribution of single-authored papers by the 967 researchers is shown in Figure 2A. Among this group, publishing single-authored papers is rare; the median percentage is 8.8%.
Although a small proportion of authors do not self-cite (Fig. 2B), the vast majority do, with an an average self-citation percentage of 18% (percent of their total citations that are from themselves). A few individuals, however, have self-citation percentages as high as 56%. As pointed out by Ioannidis et al. (2020), a “very high proportion of self-citations…may or may not be justifiable and may require a closer look at the citation practices of these scientists.”
An important issue concerning self-citation is whether it significantly inflates the ranking of those who self-cite to an excessive degree. The available data provide both overall rankings and rankings within the subfield, both with and without self-citations. I calculated the difference between the ranks within the primary subfield paleontology (Figure 3); positive values represent cases where self-citations boost the individuals ranking. Reassuringly, the median change was -1, suggesting that self-citation has overall little impact on rankings within the group, although it may have a significant effect for individual researchers.
I also attempted a rough analysis of what might be called demographics. Based on given names, I identified researchers in the career dataset as likely either male or female. Out of 1013 scientists in this list, only 78 had identifiable female given names; this is certainly an undercount, since many individuals were listed only by initials. Even allowing for this, the lack of representation of women scientists is not unexpected, given the strong historical component of the career list.(Stigall 2013, Warnock et al. 2020) This is supported by examining the data on first year of publication (the oldest cited reference assigned to a researcher). The median year of the first paper in this group is 1981, well before the rise in female numbers in the discipline in recent decades (Plotnick et al. 2014). Finally, again as would be expected, the vast majority of these highly cited scientists practice in the developed world, especially from English speaking countries. The largest representation is from the United States (36%) and Great Britain (19%). Only Canada, Germany, France, and Australia account for at least 3.5% each. This may be part of an overall trend of citation inequality (Nielsen and Andersen 2021) and the lingering effects of colonialism (Manias 2021, Monarrez et al. 2021, Raja et al. 2021).
Why does all this matter? It does give us a fuzzy picture of what the most successful scientists, at least as recognized by the publication metrics, are like. They are overwhelmingly male and English speaking, have been publishing for decades, rarely publish single authored papers, and do self-cite, but not to excess. Given their high h-indices, it is unlikely that they publish only on narrow taxonomic groups, where their papers will only be cited by fellow specialists. It also gives an also-fuzzy overall picture of the field. Paleontology, in the broad sense, has intellectual ties across a large range of other scientific disciplines, one I suspect is unmatched in most other sciences. It is also enduringly popular as a subject.
I have tried not to assign value judgements to this type of data; along with many of my colleagues, I am disturbed by obsession with citation indices and rank when it comes time to assess the worth of a scientist or a discipline. And citation rate is only one way to measure a scientists importance (Aksnes et al. 2019). It omits, for example, the key role of advising and mentoring. Taxonomic papers, which individually may receive few citations, nevertheless are the vital life blood of our field. Many of these papers are used in large databases, such as the Paleobiology Database, or are buried in “supplementary data,” but do not receive citation credit for being there, creating a “citation gap” (Payne et al. 2012).
For better or worse, this is the academic environment we live in. The more we know about it, the better we can survive as a field and individual researchers.
I would like to thank Alycia Stigall and Phil Novack-Gottshall for their highly useful comments and suggestions.
Aksnes, D. W., L. Langfeldt, and P. Wouters. 2019. Citations, Citation Indicators, and Research Quality: An Overview of Basic Concepts and Theories. SAGE Open 9(1):2158244019829575.
Ioannidis, J. P. A., J. Baas, R. Klavans, and K. W. Boyack. 2019. A standardized citation metrics author database annotated for scientific field. Plos Biology 17(8):e3000384.
Ioannidis, J. P. A., K. W. Boyack, and J. Baas. 2020. Updated science-wide author databases of standardized citation indicators. Plos Biology 18(10):e3000918.
Manias, C. 2021. Colonialism and palaeontology: connected histories. Palaeontological Association Newsletter 106:59–62.
Monarrez, P. M., J. B. Zimmt, A. M. Clement, W. Gearty, J. J. Jacisin, K. M. Jenkins, K. M. Kusnerik, A. W. Poust, S. V. Robson, J. A. Sclafani, K. T. Stilson, S. D. Tennakoon, and C. M. Thompson. 2021. Our past creates our present: a brief overview of racism and colonialism in Western paleontology. Paleobiology:1–13.
Nielsen, M. W., and J. P. Andersen. 2021. Global citation inequality is on the rise. Proceedings of the National Academy of Sciences 118(7):e2012208118.
Payne, J. L., F. A. Smith, M. Kowalewski, R. A. K. Jr., A. G. Boyer, C. R. McClain, S. Finnegan, and P. M. N.-G. L. Sheble. 2012. A lack of attribution: closing the citation gap through a reform of citation and indexing practices. Taxon 61:1349–1354.
Plotnick, R. E., A. L. Stigall, and I. Stefanescu. 2014. Evolution of paleontology: long-term gender trends in an earth science discipline. GSA Today 24(11):44–45.
Raja, N. B., E. M. Dunne, A. Matiwane, T. M. Khan, P. S. Nätscher, A. M. Ghilardi, and D. Chattopadhyay. 2021. Colonial history and global economics distort our understanding of deep-time biodiversity. Nature Ecology & Evolution.
Schreiber, M. 2008. A modification of the h-index: The hm-index accounts for multi-authored manuscripts. Journal of Informetrics 2(3):211–216.
Stigall, A. L. 2013. The Paleontological Society 2013: A snapshot in time. Priscum, 20(2):1–4.
Warnock, R., E. Dunne, S. Giles, E. Saupe, L. Soul, and G. Lloyd. 2020. Special Report: Are we reaching gender parity among Palaeontology authors? Palaeontology Newsletter 103:40–50.