XML S3 Bucket Pusher

XML S3 Bucket Pusher

Area greenfield
Language Java
Description Tool to push XML clob/unixsd to S3 bucket
Quality No Sentry, no SONAR
Upstream services
Upstream data
Downstream data
Packages
  • org.crossref.qs.crmds.S3ClobPusher
Source Code

Running the tool

The tool needs to be built with the standard ant compile or ./bin/nbant compile on developer machine. It needs a default deployment property, and is set to use the developer myalter-1 property file (may need to revisit)

From standard developer environment, use the command line: bin/j --cp ./java/org/crossref/qs/crmds/ org.crossref.qs.crmds.S3ClobPusher to run the tool using all the defaults, for the entire DOI corpus.

Parameter options:

--endId # [default=0]
--startId # [default=500000000]
--awsSecretKey AWS Secret Key [default is read from AWS_KEY QS_CRMDS_XML_BUCKET_PUSHER_AWS_SECRET_KEY]
--awsAccessKey AWS Access Key [default is read from QS_CRMDS_XML_BUCKET_PUSHER_AWS_ACCESS_KEY]
--blockSize blockSize [default=1000]
--bucket s3 bucket name to use [default=api-metadata-repository-staging]
--concurrency # [default=10]
--help

Parameters can be supplied immediately after the class to run: bin/j --cp ./java/org/crossref/qs/crmds/ org.crossref.qs.crmds.S3ClobPusher --startId 0 --endId 1000

Running the cbc pusher

There is a second tool for the Cited By Count pushes that is seperate from the XML pushes. It runs similarly:

From standard developer environment, use the command line: bin/j --cp ./java/org/crossref/qs/crmds/ org.crossref.qs.crmds.S3CbcPusher to run the tool using all the defaults, for the entire DOI corpus.

Parameter options:

--awsSecretKey secret key
--awsAccessKey access key
--s3Bucket specify bucket
--startId citation id to start with
--endId citation id to end on

Also you can use tee for outputing to screen and log by adding at the end of the command: 2>&1 | tee -a 52-60Mrun.log