Incorrect DOI Retrieved via Crossref RESTful API

Hi,

We are using a Python script to retrieve the DOI for the following reference using the Crossref RESTful API, but the DOI returned does not appear to be correct.

Reference:

S. F. Yu, K. Yang, Y. Lei, and H. Yang, Grain refinement of heat affected zone of high strength low alloy steel by large heat input welding, Transactions of the China Welding Institution 29(03), 17-20 (2008)

Python script:
from crossref.restful import Works

works = Works()
results = works.query(
bibliographic=‘S. F. Yu, K. Yang, Y. Lei, and H. Yang, Grain refinement of heat affected zone of high strength low alloy steel by large heat input welding, Transactions of the China Welding Institution 29(03), 17-20 (2008)’,
author=‘Yu’
).filter(from_pub_date=‘2008’, until_pub_date=‘2008’, type=“journal-article”)

for item in results:
doi = item.get(“DOI”)
break

print(doi)

Result: 10.4028/www.scientific.net/msf.575-578.741

We searched this reference on Google and also used the Crossref Simple Text Query( SimpleTextQuery website), but no DOI was found. This suggests that the reference may not have a DOI. However, the RESTful API still returns one, which appears to be incorrect.

Question:
How can we improve our query to get retrieving an correct DOI?

How can we improve our query to avoid retrieving an incorrect DOI?

Are there additional filters or strategies to validate or refine the search to improve accuracy?

Any suggestions or recommended best practices would be appreciated.

Thanks,
Perumal N.

Hi Perumal,

Thanks for your question.

An query.bibliographic API query will always return results, with the closest possible match to your search terms, ranked by relevancy score. That doesn’t mean that the highest ranked result is an accurate match for the reference you queried. You need to determine that by comparing the resulting metadata to the metadata you queried for.

The REST API can’t really replicate the reference matching accuracy of the Simple Text Query UI.

Both the REST API and Simple Text Query do sort results by relevancy. STQ has a behind-the-scenes relevancy score cutoff where a match won’t be made if no results have a high enough relevancy score or if there are multiple identical options available. You could replicate that with the REST API, but you would have to make a decision on your own where the relevancy score cutoff should be; the API can’t do it for you. And where that cutoff is placed to minimize both false positives and false negatives can vary depending on the sort of queries you’re making.

Beyond that, what the API can’t currently do that STQ does, is parse the formatted references into their component parts. (e.g. it can’t determine that “Grain refinement of heat affected zone of high strength low alloy steel by large heat input welding” is the article title, while “Transactions of the China Welding Institution” is the journal title and compare those to their equivalent metadata elements in the API record) REST API can basically only search against a whole metadata record in totality. But STQ can parse out the journal title, article title, authors, etc. and search against those particular metadata fields.

The REST API can’t support that kind of string parsing currently. We do hope to make changes to improve reference matching through the REST API in the future.

One of our R&D Developers has written a very good series of blog posts about reference matching, which you might find interesting. I’ll link them below.

Best,
Shayn