Get Retraction Watch metadata from Crossref’s API
- Learn how to identify the update metadata in records pulled from Crossref REST API.
- Understand what filters to use to retrieve records based on update metadata.
- How to make sequential queries to check if a list of records have retractions.
What you need to know
As you might know, part of the scholarly metadata that you can find in Crossref’s REST API includes updates made to specific works. More recently we also annouced:
Retractions and corrections from Retraction Watch are now available in Crossref’s REST API. Back in September 2023, we announced the acquisition of the Retraction Watch database with an ongoing shared service. Since then, they have sent us regular updates, which are publicly available as a csv file. Our aim has always been to better integrate these retractions with our existing metadata, and today we’ve met that goal.1
In this tutorial we’ll show how to understand and retrieve this metadata. We need to understand how this information will be expressed in the REST API, so let’s take a look at both ends of the relationship: a record that has an update and a record that represents an update to another one:
Identify the update metadata
Let’s examine a record that contains update metadata: http://api.crossref.org/v1/works/10.1177/1758835920922055
In the following box you will see this record retrieved from the works
endpoint in JSON format, as it’s normally returned from the REST API. You can collapse any key by clicking the litle triangle to the left of the key name. Can you find the key called updated-by
?. Just click the magnifying glass icon on top and type or paste updated-by
.
You should see a lot of interesting information there, including the DOI that identify the retraction. Let’s take a look at the complementary record; use the same tools to now find the update-to
key:
If you expand the keys type
and source
you will understand why there are two entries in this record. In this case, the updates are coming from two sources: from Retraction Watch and also from the publisher.
Retrieve many records with updates
You can use the works
endpoint with the filter filter=update-type:retraction
in your regular API call, e.g. http://api.crossref.org/v1/works?filter=update-type:retraction
. Let’s use this in combination with rows
to get a quick idea of the number of records with retractions:
Don’t forget to add the mailto
parameter with your contact email address to direft your request to our Polite
pool.
Code
jsonedit(jsonlite::fromJSON("http://api.crossref.org/v1/works?filter=update-type:retraction&rows=0&mailto=learning@crossref.org"))
You can also go to our Gitlab repository and download a .CSV
file that you can open in your text or spreadsheet editor of choice: Retraction Watch data.
A practical example:
Let’s imagine that you want to inspect if a list of references contain retractions. We can write some relatively simple functions to perform sequential queries and generate a table with our results. Let’s start with a short example of three DOI:
Code
<- c(
dois "10.1177/1758835920922055",
"10.17227/pys.num60-20108",
"10.1186/s12889-024-20318-x"
)
What we will do is pass this list to a function that will create a list of queries, will include a delay to make sure we stay under the Crossref API rate limits, and also will include our email address to query the Polite
pool. Please ensure of adding your own email when building these queries.
Code
- 1
-
We will store our data into a variable called
list_queries
. - 2
- Then we will start our iteration…
- 3
-
This sets our queries to the
works
endpoint and we specifies that we will retrieve individual DOI. - 4
- Each request will include one DOI from our previous list.
- 5
- We add our email to the query.
- 6
- And we add a delay. This is paramount if we are using an extensive list of DOI.
Up to this point, we have built a list of queries, but these have not yet been submmited. We are separating these steps here for clarity but you can make a single piece of code that covers the entire operation.
Code
<- list_queries |>
my_item req_perform_sequential(progress = TRUE)
After this operation we wil get a list of responses that we can format as JSON and then extract our data of interest. In this case, we will get a TRUE/FALSE
result depending on wheter there is or not a retraction associated to it.
Code
|>
my_item map(\(x){
resp_body_json(x, simplifyVector = TRUE) |>
pluck('message', 'updated-by') |>
is.null()
|>
}) unlist() |>
cbind(dois) |>
as.data.frame() |>
rename(
"Has been retracted?" = V1,
"DOI" = dois
|>
) gt() |>
opt_interactive()
- You can get Retraction Watch metadata from the Crossref REST API.
- Find the
updated-by
and/orupdate-to
keys. - Use the filter
filter=update-type:retraction
to get multiple records.