Metadata Bucket
Area | greenfield |
Quality | No Sentry, no SONAR |
Upstream data | |
Downstream services | |
Downstream data | |
Related services | |
Tags |
The Metadata Bucket is a private S3 bucket containing metadata and updates. It is designed to feed the Cayenne REST API. The metadata has unrestricted references.
Bucket Structure
Data should be added to the Metadata Bucket using the following key structure:
{doi-hash}/{filename}
Assuming a bucket name of crossref-metadata-bucket-staging
then some examples might be:
crossref-metadata-bucket-staging/8fd133785660bb26ebca632b8ca40104bef4ba7f/unixsd.xml
crossref-metadata-bucket-staging/8fd133785660bb26ebca632b8ca40104bef4ba7f/citation-update.json
Key Components
{doi-hash}
{doi-hash} is a sha1 hash of the lowercase DOI.
We use a hash for a number of reasons:
- DOIs can contain all kinds of characters, including non-printable ones, extra slashes, semicolons, question marks etc
- A hash supports much better prefix balancing than the literal DOI would. See here
We use sha1
for the hash because it is universal, improves distribution, and security isn’t a concern here.
{filename}
The filename of the metadata is based on the type of metadata. e.g. unixsd.xml
, citation-update.json
.
FAQ
Currently citation update file contains updates for multiple DOIs. Do we plan to keep one update file per DOI in this new architecture?
Yes, citation update files will only be for a single DOI with this model.
Metadata
Objects created in the metadata bucket should have a x-amz-meta-cr-doi
metadata property added. This property should have the lowercase DOI value relating to the object.