Tags |
graph LR; ftp(ftp.crossref.org); ingest(Cayenne Task Ingest Journals); index(Cayenne ES Journals Index) api(Cayenne /v1/journals); ftp-->ingest; ingest-->index; index-->api;
Journals are ingested by Cayenne from a location configured as [:location :cr-titles-csv]
, currently http://ftp.crossref.org/titlelist/titleFile.csv
. It is a CSV file containing information about journals, one per line.
The journals’ records are then indexed using Elasticsearch’s bulk API with a series of update
actions. Each such action results in adding a new record or updating the existing one. update
is used rather than index
, so that we do not erase the journal subjects that are added to journal records in a separate process (see funding data).
Journals are ingested once a day.