Labs API
Legacy
Area | API |
Language | Python |
Description | The Labs API proxy system |
Production URLs | |
Production Heartbeats | |
Quality | No Sentry, no SONAR |
Upstream services | |
Source Code | |
Products |
The Labs API is a proxying annotation system that sits between users and the live REST API (Cayenne). The system aims to provide a bi-directional intercept to the live deposit system, prototyping new metadata schemas for members to test.
This is a prototype Crossref Labs system. It is not guaranteed to be stable and the metadata schema and behaviour may be subject to change at any time.
Infrastructure
Infrastructure can be found in the Crossref infrastructure repository, under crossref/research/micro.
Features
- Proxy for Crossref Labs API with additional data fields. Example: http://api.labs.crossref.org/members/?mailto=your@email.com. Note the cr-labs- prefixed data annotations.
- Unpaginated routes (all records). You can use this by specifying a number of rows greater than found in the route or using the value “all”. These exist on a 24-hour time delay from the live API. Example: http://api.labs.crossref.org/members/?mailto=your@email.com&rows=10000000
- “Simple” mode on the unpaginated members route, using select. Note that you cannot select arbitrary fields. The only supported combinations are: id and primary-title or id and primary-title and names. These exist on a 24-hour time delay from the live API. Examples: http://api.labs.crossref.org/members/?mailto=your@email.com&rows=10000000&select=id,primary-title or http://api.labs.crossref.org/members/?mailto=your@email.com&rows=10000000&select=id,primary-title,names
- Deposit proxy for Crossref Labs API. See below.
Deposit Proxy
The Crossref Labs API contains a deposit proxy that allows us to prototype schema changes.
Background
At present, when the metadata team want to change the schema of Crossref deposits, the process is painful. Community outreach can only be conducted in abstract terms of describing what we’d like to do and why, instead of showing what it would look like. These changes must then be thoroughly planned and deployed to the live environment before end-users can experiment with new fields etc. It is easier to persuade people of the merit of a schema change with a demonstration of the new system.
The Crossref Labs API Deposit Proxy system is designed to allow us to prototype schema changes in a safe environment. It is a proxy that sits between the Crossref API and the Crossref deposit system. It allows us to create a new API endpoint that can be used to deposit metadata. This endpoint can be used to prototype schema changes, and can be used by the community to test out new features.
How It Works
Figure 1: a diagram showing the workflow of the deposit proxy
The workflow is as follows, correlating to Figure 1, above:
- The user sends a deposit request to the Crossref Labs API Deposit Proxy at http://api.labs.crossref.org/deposit/. Your XML should specify the schema modification it wishes to apply via namespacing (e.g. http://www.crossref.org/schema/5.4.d0.name). Please see the example deposits in tests/live_deposit for more.
- The Deposit Proxy transforms the XML into standard 5.3.1, keeping track of the changes it has made to the deposit in a database (S3). This is handled in process_name.py (where “name” is the schema name).
- The Deposit Proxy then sends the transformed XML to the live Crossref deposit system.
- There is then a delay while the live REST API ingests newly deposited material.
- The user can then query the Labs API to see the new schema in action. Continuing the example of the “name” schema, the user might visit http://api.labs.crossref.org/works/10.5555/GRFG-ENGF?mailto=your@email.com to see the “name” schema in operation for the work 10.5555/GRFG-ENGF. You can append the querystring ?plain=true to disable schema modifications.
- When the user queries this endpoint, the Labs API Deposit Proxy will query the live Crossref API for the work 10.5555/GRFG-ENGF. It will then modify the JSON output using the earlier stored variables, essentially “undoing” the earlier transformation to 5.3.1.
Example Usage
- Modify and deposit (in standard Crossref format) the tests/live_deposit/journal.article5.4.d0.name.xml file to the Labs API Deposit Proxy at http://api.labs.crossref.org/deposit/. This will create a new deposit in the live system.
- Wait until your data has filtered into the live REST API. This usually happens within an hour, but can take 24 hours or more.
- Query the Labs API Deposit Proxy for your data at http://api.labs.crossref.org/works/insert_doi_prefix/insert_doi_suffix?mailto=your@email.com
Known Weaknesses
There are a number of known weaknesses in the system.
Desynchronization of the Live API and the Deposit Proxy
The deposit proxy offers the ability to model new schemas, repatching the JSON data on return. However, at present there is a bug in the design of the Crossref Live API that prevents us from checking that the data we have stored for patching definitely matches the data in the live system.
Until this is fixed, it is, therefore, possible for the data we have to fall out of sync with the live API. This will happen if a user:
- Initially deposits using the Labs API proxy
- Subsequently updates the metadata using the Live API
- Requests the entry via the Labs API proxy
Example code to show how to resolve this is in the process_works function in process_name.py, but it is currently commented out until the deposit timestamp matches that submitted.
The solution to this is: if you want the Labs API to work, always use Labs API endpoints. If you modify a metadata record in the Live system, do not expect the Labs API to return good data. This is undefined behaviour.
Imperfect Transformations to 5.3.1
Backwards compatibility of schema to 5.3.1 is not perfect. Whenever a new schema is introduced, it is necessary to specify the transformation rules (e.g. “copy the value of ‘name’ into ‘first_name’ if ‘first_name’ is blank”).
Large Deposits
The Deposit Proxy uses the synchronous live API. It may not, therefore, be suitable for mass deposit and may time-out if you attempt this.
Technical Notes
Infrastructure
The system is part of the broader Labs API, running in a container on AWS Fargate. It uses a Docker container to run a Python script that proxies requests from the Crossref API to the Crossref deposit system. The system is deployed using Terraform.
Adding New Schemas
Adding new schemas is done by populating a directory under src/plugins/depositor_schema. Modules to transform XML from the new schema back to 5.3.1 and to repatch the JSON on return should be put in the src/plugins/depositor_rules directory, with the filename process_<schema>.py (e.g. “process_name.py”). These processing files are loaded dynamically at runtime.
Processing files (e.g. “process_name.py”) should contain the methods “process” and “repatch”. See process_names.py for example signatures.
Static Delivery
The system uses a static delivery system to deliver some JSON files. This allows us to simulate a return value from the Live API so that we can build new schema modifications. For example, a request to the Labs API of 10.5555/n0HRokm will result in the contents of src/plugins/static_delivery/DOI/307b5a6be483fbcaae5daffe4fb02730.json being served to the user.
Current Annotation Systems
Journals Route
- container-ids.json (container IDs, Airflow)
- member-id.json (member IDs/Sugar data, Airflow)
Members Route
- domains.json (TLDs, Airflow)
- member-profile.json (member/Sugar data, Airflow)
- preservation.json (digital preservation data, currently not automated)
- resolution.json (resolution statistics, Airflow)
Works Route
- updates.json (retraction data, Airflow)
- preservation.json (preservation data, currently not automated)