Labs API

Labs API

Legacy

Area API
Language Python
Description The Labs API proxy system
Production URLs
Production Heartbeats
Quality No Sentry, no SONAR
Upstream services
Source Code
Products

The Labs API is a proxying annotation system that sits between users and the live REST API (Cayenne). The system aims to provide a bi-directional intercept to the live deposit system, prototyping new metadata schemas for members to test.

This is a prototype Crossref Labs system. It is not guaranteed to be stable and the metadata schema and behaviour may be subject to change at any time.

Infrastructure

Infrastructure can be found in the Crossref infrastructure repository, under crossref/research/micro.

Features

Deposit Proxy

The Crossref Labs API contains a deposit proxy that allows us to prototype schema changes.

Background

At present, when the metadata team want to change the schema of Crossref deposits, the process is painful. Community outreach can only be conducted in abstract terms of describing what we’d like to do and why, instead of showing what it would look like. These changes must then be thoroughly planned and deployed to the live environment before end-users can experiment with new fields etc. It is easier to persuade people of the merit of a schema change with a demonstration of the new system.

The Crossref Labs API Deposit Proxy system is designed to allow us to prototype schema changes in a safe environment. It is a proxy that sits between the Crossref API and the Crossref deposit system. It allows us to create a new API endpoint that can be used to deposit metadata. This endpoint can be used to prototype schema changes, and can be used by the community to test out new features.

How It Works

Figure 1: a diagram showing the workflow of the deposit proxy Figure 1: a diagram showing the workflow of the deposit proxy

The workflow is as follows, correlating to Figure 1, above:

  1. The user sends a deposit request to the Crossref Labs API Deposit Proxy at http://api.labs.crossref.org/deposit/. Your XML should specify the schema modification it wishes to apply via namespacing (e.g. http://www.crossref.org/schema/5.4.d0.name). Please see the example deposits in tests/live_deposit for more.
  2. The Deposit Proxy transforms the XML into standard 5.3.1, keeping track of the changes it has made to the deposit in a database (S3). This is handled in process_name.py (where “name” is the schema name).
  3. The Deposit Proxy then sends the transformed XML to the live Crossref deposit system.
  4. There is then a delay while the live REST API ingests newly deposited material.
  5. The user can then query the Labs API to see the new schema in action. Continuing the example of the “name” schema, the user might visit http://api.labs.crossref.org/works/10.5555/GRFG-ENGF?mailto=your@email.com to see the “name” schema in operation for the work 10.5555/GRFG-ENGF. You can append the querystring ?plain=true to disable schema modifications.
  6. When the user queries this endpoint, the Labs API Deposit Proxy will query the live Crossref API for the work 10.5555/GRFG-ENGF. It will then modify the JSON output using the earlier stored variables, essentially “undoing” the earlier transformation to 5.3.1.

Example Usage

  1. Modify and deposit (in standard Crossref format) the tests/live_deposit/journal.article5.4.d0.name.xml file to the Labs API Deposit Proxy at http://api.labs.crossref.org/deposit/. This will create a new deposit in the live system.
  2. Wait until your data has filtered into the live REST API. This usually happens within an hour, but can take 24 hours or more.
  3. Query the Labs API Deposit Proxy for your data at http://api.labs.crossref.org/works/insert_doi_prefix/insert_doi_suffix?mailto=your@email.com

Known Weaknesses

There are a number of known weaknesses in the system.

Desynchronization of the Live API and the Deposit Proxy

The deposit proxy offers the ability to model new schemas, repatching the JSON data on return. However, at present there is a bug in the design of the Crossref Live API that prevents us from checking that the data we have stored for patching definitely matches the data in the live system.

Until this is fixed, it is, therefore, possible for the data we have to fall out of sync with the live API. This will happen if a user:

  • Initially deposits using the Labs API proxy
  • Subsequently updates the metadata using the Live API
  • Requests the entry via the Labs API proxy

Example code to show how to resolve this is in the process_works function in process_name.py, but it is currently commented out until the deposit timestamp matches that submitted.

The solution to this is: if you want the Labs API to work, always use Labs API endpoints. If you modify a metadata record in the Live system, do not expect the Labs API to return good data. This is undefined behaviour.

Imperfect Transformations to 5.3.1

Backwards compatibility of schema to 5.3.1 is not perfect. Whenever a new schema is introduced, it is necessary to specify the transformation rules (e.g. “copy the value of ‘name’ into ‘first_name’ if ‘first_name’ is blank”).

Large Deposits

The Deposit Proxy uses the synchronous live API. It may not, therefore, be suitable for mass deposit and may time-out if you attempt this.

Technical Notes

Infrastructure

The system is part of the broader Labs API, running in a container on AWS Fargate. It uses a Docker container to run a Python script that proxies requests from the Crossref API to the Crossref deposit system. The system is deployed using Terraform.

Adding New Schemas

Adding new schemas is done by populating a directory under src/plugins/depositor_schema. Modules to transform XML from the new schema back to 5.3.1 and to repatch the JSON on return should be put in the src/plugins/depositor_rules directory, with the filename process_<schema>.py (e.g. “process_name.py”). These processing files are loaded dynamically at runtime.

Processing files (e.g. “process_name.py”) should contain the methods “process” and “repatch”. See process_names.py for example signatures.

Static Delivery

The system uses a static delivery system to deliver some JSON files. This allows us to simulate a return value from the Live API so that we can build new schema modifications. For example, a request to the Labs API of 10.5555/n0HRokm will result in the contents of src/plugins/static_delivery/DOI/307b5a6be483fbcaae5daffe4fb02730.json being served to the user.

Current Annotation Systems

Journals Route

  • container-ids.json (container IDs, Airflow)
  • member-id.json (member IDs/Sugar data, Airflow)

Members Route

  • domains.json (TLDs, Airflow)
  • member-profile.json (member/Sugar data, Airflow)
  • preservation.json (digital preservation data, currently not automated)
  • resolution.json (resolution statistics, Airflow)

Works Route

  • updates.json (retraction data, Airflow)
  • preservation.json (preservation data, currently not automated)