Prevalence of predatory journals in OpenAlex
Introduction
In the brief time since its founding in 2022, the open scholarly publications database OpenAlex (Priem, Piwowar, and Orr 2022) has established itself as a leading alternative to commercial competitors due to its broad coverage of literature, free access to basic data and services, and quick developmental improvements. Its vendor, the non-profit OurResearch, has also been successful at securing financial support, which built confidence in users about the long-term viability and availability of OpenAlex. Despite its ongoing development and the accompanying frequent changes in data, OpenAlex has been studied as to its suitability as a data source for science studies (Alperin et al. 2024; Culbert et al. 2025; Thelwall and Jiang 2025; Hauschke and Nazarovets 2025; Céspedes et al. 2025; Zhang et al. 2024; Scheidsteger, Haunschild, and Bornmann 2025).
OpenAlex is named after the Great Library of Alexandria of Ptolemaic Egypt of Antiquity. The Great Library is said to have obtained its unprecedented collection by various means, not refraining from seizing any books found on ships entering the city’s port (Phillips 2010). Its administrators went to such lengths because their goal was to obtain a copy of every book ever written. OpenAlex is no less ambitious than its namesake in what it seeks to cover in its catalog, stating: “We strive to be as comprehensive and inclusive as possible, especially for works in other languages and the Global South.”1 OpenAlex has already succeeded in amassing a huge database of metadata on scholarly publications on the basis of the principles of comprehensiveness and inclusivity of content. OpenAlex accomplishes that by indexing all content from large sources like CrossRef and by web crawls.
Its stated mission, however, also explicitly says that OpenAlex should become an “open replacement for industry-standard scientific knowledge bases like Elsevier’s Scopus and Clarivate’s Web of Science”2. These databases, however, are characterized by principles that may be described as selectivity and exclusivity of indexed content. Both commercial scholarly databases pride themselves in their content selection, the deliberate choice to include some journals and other publication channels according to stated quality criteria and the more and more common suspension or delisting of sources which no longer meet quality criteria. This is contrast to OpenAlex which intentionally makes no such selection but recognizes that some users and use cases may require some content restrictions based on quality criteria3. For the time being, there are two on-board tools for OpenAlex users that may assist in content delimitation. The first is the selection of journals listed in the Directory of Open Access Journals (DOAJ). DOAJ does basic quality checks on journals that apply for inclusion.4 The second is the CWTS Core source filter. Core sources are defined as “international scientific journals and other scientific outlets in fields that are suitable for citation analysis” (Van Eck and Waltman 2024). Core journals are not defined by the scientific quality of the content they publish.
The availability of such filters, which may impose some basic scholarly quality thresholds, is a great advantage given that the scientific and scholarly publishing system has fallen into a deep crisis of trust, precipitated by fraudulent publishing. There exist now many ostensibly scientific journals that will publish any submission for a fee with no or only feigned peer review. Such journals, often called predatory, questionable, disreputable or fraudulent journals, frequently publish fake papers – papers reporting research that never actually took place and is wholly made up – , or studies that were undertaken but lack the necessary rigor to pass peer review at reputable journals. Many of these journals have managed to become part of the mainstream scientific publishing environment by obtaining a valid ISSN and minting CrossRef DOIs for papers, giving them some of the appearance of regular journals.
Given that these journals are overwhelmingly fraudulent – only pretending to publish actual scientific research vetted by peer review – and continuously expanding, it is becoming ever more critical to be able to recognize these journals and exclude them from consideration in, eg. scientific literature searches and bibliometric analyses – purposes for which OpenAlex is immensely useful. OpenAlex’s DOAJ filter only works for open access journals but not all fraudulent journals are open access nor is it viable for most use cases of OpenAlex to limit oneself to only open access journals. The Core sources filter, on the other hand, deliberately excludes non-English literature and nationally oriented journals, greatly reducing the usefulness of OpenAlex data beyond the internationally oriented scientific contributions. Predatory journals publish almost invariably English language papers. Moreover, both filters are not explicitly designed to identify and exclude disreputable journals. Their ability to filter out such journals is unknown.
To the best of our knowledge, no study has as of yet investigated the presence of known disreputable journals in OpenAlex. Therefore it remains an open question if there actually is a problematic presence of such sources in the OpenAlex corpus. One may assume so, because of the avowedly inclusive indexing policy of OpenAlex but that does not answer the question definitively about how extensive this issue actually is. In this contribution we address this issue by studying the presence of known fraudulent journals in OpenAlex. We use a sample of ‘predatory’ journals from Cabells Predatory Reports. We then check for these journals in the source search of OpenAlex.
Methods
We study the presence of predatory journals in OpenAlex by using a sample of journals identified as predatory by Cabells Predatory Reports (CPR). CPR is a commercial product designed to help scholars and research organizations to identify predatory journals. It is a catalog of journal reports that can be searched for titles. Each entry for an included journal contains the qualitative evidence that Cabells staff have collected to substantiate their decision to label a journal as predatory. The reports also contain identifying information such as the publisher name, ISSN and website. As an example, the report for the source ‘Endocrinology & Metabolism International Journal’ lists five minor or moderate violations, one of which is “The publisher displays prominent statements that promise rapid publication and/or unusually quick peer review (less than 4 weeks).” These violations are based on a published list of criteria5 (Teixeira da Silva et al. 2023).
Since Cabells is interested in customers subscribing to their knowledge base, we assume they have a sufficient incentive to produce accurate and trustworthy information. We therefore consider their data and classifications as acceptably reliable. This does not mean we assume that their decisions are always correct or that they are able to identify all fraudulent journals. It also does not mean that all articles published in these journals are fraudulent or poor quality without exception. But we have no reasons to distrust the data that they have assembled and hence use this as our source of information of predatory journals.
CPR is essentially a searchable list. At the time of our licenced access to the service, May 2025, the list comprised reports on 19.771 predatory journals. Based on the consideration of achieving a margin of error for percentages of +/- 5 percentage points at a 0.95 confidence level, we needed a sample of at least 385 observations. On CPR, pages of 10 reports are displayed and it is possible to navigate to any page by adjusting the URL which includes the page number. There were 1978 pages which we decided to randomly sample. We took a random sample of 40 integers of the range 1 to 1978 and used these as page numbers by inserting the integers into the URL to open the respective page. We used the full 10 reports on each page as sampled journals which gave us a sample size of n=400.
On each page, we copied the name of each journal and, if available, the publisher name and stored them in an Excel spreadsheet. For each journal, we searched its name in the sources search of OpenAlex and recorded if OpenAlex contains the journal of not. In some cases, we found that the journal was included although no works records were ascribed to it. We counted these journals as present.
As this is a random sample, we used it to calculate the share of journals recognized as predatory by CPR included in OpenAlex and to calculate the confidence interval for the value of this share. We can also extrapolate to the likely absolute number of journals in OpenAlex which are recognized as predatory according to Cabells, again with a sample-based margin of error.
From the description of the method it should be clear that we cannot use this sample to estimate the overall share or number of disreputable journals in OpenAlex. This is because it is not likely that CPR includes all predatory journals. There may be predatory journals included in OpenAlex which are not covered by CPR. These remain invisible to this analysis.
Results
We sampled a total of 400 journal titles from CPR and looked them up by source search on OpenAlex. We were able to find 146 of such predatory journals in OpenAlex. This is a share of 36.5 % with a 95 % confidence interval of 32 to 41 %. We can thus estimate that OpenAlex is likely to include between 32 and 41 % of predatory journals as identified by CPR. In line with its stated policy based on inclusivity and comprehensiveness, OpenAlex included information on a large share of known predatory journals.
From the total of 19.771 journals in CPR we can extrapolate that OpenAlex likely includes between about 6300 to 8200 of these journals (rounded to the nearest hundred). At the time of doing this research, OpenAlex included 209.800 journals as sources. This would mean that the amount of known predatory journals would be on the order of about 3 % of total OpenAlex journal sources.
An example of a predatory publisher with a large amount of indexed content in OpenAlex is OMICS Publishing Group from Hyderabad, India. CPR finds about 1000 results when searching for OMICS. OMICS was sued in the United States by the Federal Trade Commission for deceptive practices, including falsely claiming peer review activity, fees revealed only after acceptance and falsely claiming Journal Impact Factors. OMICS was ordered by the court to pay the US Government $50 million in fines by summary judgment. As of May 2025, OpenAlex includes 215.000 papers in OMICS journals. OMICS also operates under alternative names. Some of these subsidiaries, with their number of OpenAlex-covered publications in parentheses, are6:
- Allied Academies (c. 9000)
- Hilaris (c. 5000)
- Prime Scholars (c. 2000)
- Pulsus Group (c. 11.000)
- TradeScience (c. 1300)
- Insight Medical Publishing (c. 2600)
The fact that a single prominent predatory publisher (one is tempted to say publishing empire) is responsible for hundreds of thousands of records in OpenAlex should be considered a clear warning sign.
Discussion
Predatory and fraudulent scientific publishing has accelerated and grown to a scale where thousands of predatory journals are active globally. It is yet unknown what consequences this development may have for the credibility and acceptance of science among the public. Our investigation has confirmed that OpenAlex includes a large number of journals identified as predatory by Cabells. This is not surprising because the indexing policy of OpenAlex is as inclusive and comprehensive as possible with no screening for publication outlet quality in place. OpenAlex is transparent and up front about this position, so the onus and responsibility to deal with possible fraudulent content is squarely on the users. As the total number of likely fraudulent, predatory journals in OpenAlex is quite substantial, we believe it is imperative for users to be acutely aware of the presence of this type of sources. To summarize, OpenAlex likely includes thousands of predatory journals. Not all of these, however, also have publications indexed in OpenAlex – quite a lot seem to be empty journal records. The case of the most egregious and well-known predatory publisher, OMICS, within OpenAlex is alarming, as over two hundred thousand publication records in OpenAlex are attributed to journals by OMICS. However, open and inclusive data sources such as OpenAlex constitute an invaluable resource for researchers investigating the prevalence, characteristics, and content of questionable journals.
Users of OpenAlex should be aware of the presence of many questionable or fraudulent journals in its data and consider methods to adapt their samples accordingly.
References
Footnotes
Citation
@online{donner2025,
author = {Donner, Paul},
title = {Prevalence of Predatory Journals in {OpenAlex}},
date = {2025-06-03},
url = {http://www.open-bibliometrics.de/posts/20250603-QuestionableJournals/},
langid = {en}
}