Get Retraction Watch metadata from Crossref’s API

Author
Affiliation

Luis M. Montilla

Crossref

Published

January 30, 2025

In this tutorial you will:
  • Learn how to identify the update metadata in records pulled from Crossref REST API.
  • Understand what filters to use to retrieve records based on update metadata.
  • How to make sequential queries to check if a list of records have retractions.

What you need to know

As you might know, part of the scholarly metadata that you can find in Crossref’s REST API includes updates made to specific works. More recently we also annouced:

Retractions and corrections from Retraction Watch are now available in Crossref’s REST API. Back in September 2023, we announced the acquisition of the Retraction Watch database with an ongoing shared service. Since then, they have sent us regular updates, which are publicly available as a csv file. Our aim has always been to better integrate these retractions with our existing metadata, and today we’ve met that goal.1

In this tutorial we’ll show how to understand and retrieve this metadata. We need to understand how this information will be expressed in the REST API, so let’s take a look at both ends of the relationship: a record that has an update and a record that represents an update to another one:

Identify the update metadata

Let’s examine a record that contains update metadata: http://api.crossref.org/v1/works/10.1177/1758835920922055

In the following box you will see this record retrieved from the works endpoint in JSON format, as it’s normally returned from the REST API. You can collapse any key by clicking the litle triangle to the left of the key name. Can you find the key called updated-by?. Just click the magnifying glass icon on top and type or paste updated-by.

You should see a lot of interesting information there, including the DOI that identify the retraction. Let’s take a look at the complementary record; use the same tools to now find the update-to key:

If you expand the keys type and source you will understand why there are two entries in this record. In this case, the updates are coming from two sources: from Retraction Watch and also from the publisher.

Retrieve many records with updates

You can use the works endpoint with the filter filter=update-type:retraction in your regular API call, e.g. http://api.crossref.org/v1/works?filter=update-type:retraction. Let’s use this in combination with rows to get a quick idea of the number of records with retractions:

Tip

Don’t forget to add the mailto parameter with your contact email address to direft your request to our Polite pool.

Code
jsonedit(jsonlite::fromJSON("http://api.crossref.org/v1/works?filter=update-type:retraction&rows=0&mailto=learning@crossref.org"))
Tip

You can also go to our Gitlab repository and download a .CSV file that you can open in your text or spreadsheet editor of choice: Retraction Watch data.

A practical example:

Let’s imagine that you want to inspect if a list of references contain retractions. We can write some relatively simple functions to perform sequential queries and generate a table with our results. Let’s start with a short example of three DOI:

Code
dois <- c(
  "10.1177/1758835920922055",
  "10.17227/pys.num60-20108",
  "10.1186/s12889-024-20318-x"
)

What we will do is pass this list to a function that will create a list of queries, will include a delay to make sure we stay under the Crossref API rate limits, and also will include our email address to query the Polite pool. Please ensure of adding your own email when building these queries.

Code
1list_queries <- dois |>
2  map(\(x){
3    request("http://api.crossref.org/works/doi/") |>
4      req_url_path_append(x) |>
5      req_url_query(mailto="learning.hub@crossref.org") |>
6      req_throttle(rate = 30/60)
})
1
We will store our data into a variable called list_queries.
2
Then we will start our iteration…
3
This sets our queries to the works endpoint and we specifies that we will retrieve individual DOI.
4
Each request will include one DOI from our previous list.
5
We add our email to the query.
6
And we add a delay. This is paramount if we are using an extensive list of DOI.

Up to this point, we have built a list of queries, but these have not yet been submmited. We are separating these steps here for clarity but you can make a single piece of code that covers the entire operation.

Code
my_item <- list_queries |> 
  req_perform_sequential(progress = TRUE) 

After this operation we wil get a list of responses that we can format as JSON and then extract our data of interest. In this case, we will get a TRUE/FALSE result depending on wheter there is or not a retraction associated to it.

Code
my_item |> 
  map(\(x){
    resp_body_json(x, simplifyVector = TRUE) |>
      pluck('message', 'updated-by') |> 
      is.null()
  }) |> 
    unlist() |> 
    cbind(dois) |> 
    as.data.frame() |>
    rename(
      "Has been retracted?" = V1,
      "DOI" = dois
      ) |> 
    gt() |>
    opt_interactive()
In conclusion
  • You can get Retraction Watch metadata from the Crossref REST API.
  • Find the updated-by and/or update-to keys.
  • Use the filter filter=update-type:retraction to get multiple records.

Other resources