Running CAW with AWS Batch
Maxime U. Garcia 2017-11-16Last modified 2017-11-17
4 minute read
Following this post from Nextflow blog, I’m writing a small guide on how I’m doing that for CAW.
The useful documentation
- Working with Amazon S3 Buckets: AWS docs
- Getting starting with AWS Batch: AWS docs
- Get set up for AWS Batch: AWS docs
- Amazon Cloud - AWS Batch: Nextflow docs
Things that had to be done
- New
--outDir
parameter forCAW
to write output into aS3
bucket - New
S3
bucket for references:s3://caw-references/
- New
S3
bucket for test data:s3://caw-test-data/
- New
S3
bucket for results:s3://caw-test-results/Results
- New
S3
bucket for Nextflow work:s3://caw-test-results/work
- New TSV files for the test data:
- in the test data bucket:
s3://caw-test-data/tsv/
- in the GitHub
CAW
repository:
- in the test data bucket:
- Added AWS settings in my
~/.nextflow/config
file
// For AWS - Personal settings
aws {
accessKey = 'xxx'
secretKey = 'yyy'
region = 'eu-west-1'
}
- New AWS specific profile
profiles {
// Docker images will be pulled automatically
aws { // For running on AWS Batch with Docker
includeConfig 'configuration/aws-batch.config'
includeConfig 'configuration/docker.config'
includeConfig 'configuration/containers.config'
includeConfig 'configuration/genomes.config'
}
}
- New AWS Batch specific config file
aws-batch.config
bundleDir
has been replaced byparams.genome_base
ingenomes.config
for easier configuration.- using
params.genome_base
for referencing theS3
bucket:
params {
genome_base = params.genome == 'GRCh37' ? "s3://caw-references/grch37" : params.genome == 'GRCh38' ? "s3://caw-references/grch38" : "s3://caw-references/smallgrch37"
}
- Specify executor:
process.executor = 'awsbatch'
- New spot fleet role
AmazonEC2SpotFleetRole
cf AWS docs - Create Spot Fleet Role - New Job Queue
caw-job-queue
cf AWS docs - Creating a Job Queue - New compute environment
caw-compute-environment
cf AWS Batch console first-run wizard:- Unmanaged
- No job submission
- Provisioning Model for compute resources is:
Spot
- Spot fleet role is:
AmazonEC2SpotFleetRole
- Specify job queue:
process.queue = 'caw-job-queue'
Attempts
First try
nextflow run main.nf -profile awsbatch -w s3://caw-test-results/work --outDir s3://caw-test-results/Results --sample s3://caw-test-data/tsv/tiny-s3.tsv --verbose --genome smallGRCh37
Error:
Invalid AWS Batch job definition -- provide a Docker image name or a Batch job definition name
We do need a container for every process, even if the process does not need one.
Solution:
- Define containers for processes without one.
Second try
Error:
ERROR ~ Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
Problem might be due to bad AMI configuration.
Solution:
- Created a custom AMI for CAW cf Nextflow docs - Custom AMI and AWS docs - Creating a Compute Resource AMI.
- AMI is
ami-7ce35005
Third try
Create a managed compute environment, and user-specified AMI ID: ami-7ce35005
Error:
JobQueue caw-job-queue not found. (Service: AWSBatch; Status Code: 400; Error Code: ClientException; Request ID: 778694b4-cecf-11e7-b649-9f5e67e1cb80)
Forgot to create job queue when creating the compute environment. Solution:
- Create job queue cf AWS docs - Creating a Job Queue
Fourth try
Error:
Likely it has been terminated by the external system
Nextflow Docs:
This may happen when Batch is unable to execute the process script. A common cause of this problem is that the Docker container image you have specified uses a non standard entrypoint which does not allow the execution of the BASH launcher script required by Nextflow to run the job. Check also the Job execution log in the AWS Batch dashboard for further error details.
Error in AWS Batch dashboard is: Essential container in task exited
Most likely cause by:
bash: /home/ec2-user/miniconda/bin/aws: No such file or directory
Need to check if AMI is correctly configured.
/home/ec2-user/miniconda/bin/aws
is in the AMI- Check if path to
awscli
is specified in the right place. - Path seems to be in the right place
- Adding
executor.name = 'awsbatch'
Fifth try
Error:
Likely it has been terminated by the external system
Error in AWS Batch dashboard is: Essential container in task exited
Most likely cause by:
download failed: s3://caw-test-results/work/58/0cd09f18e95da23336db35af5c39ee/.command.run to - An error occurred (403) when calling the HeadObject operation: Forbidden
upload failed: - to s3://caw-test-results/work/58/0cd09f18e95da23336db35af5c39ee/.command.log An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
Missing AmazonS3FullAccess
attached as a policy for the Instance Role.
Solution:
- Add policy
AmazonS3FullAccess
toecsInstanceRole
Sixth try
Error:
Missing output file(s) `*_fastqc.{zip,html}` expected by process `RunFastQC (1234-1234N_1)`
It seems that no file were copied from the s3 source bucket into the s3 work directory bucket.
Check if the files in question exists.
≻aws s3 ls s3://caw-test-data/tiny/normal/tiny_n_L001_R1_xxx.fastq.gz
2017-11-20 11:25:32 45010 tiny_n_L001_R1_xxx.fastq.gz
≻aws s3 ls s3://caw-test-data/tiny/normal/tiny_n_L001_R2_xxx.fastq.gz
2017-11-20 11:25:32 47703 tiny_n_L001_R2_xxx.fastq.gz
They do exist, so probably an issue regarding permission.
Tried to correct it within a new Secure Groups settings.
But nothing seems to work, so asking Paolo to the rescue (cf: issue)
@pditomaso: @MaxUlysse if still not working open an issue including NF output,
.nextflow.log
file and task.command.run
,.command.log
,.command.out
and.command.err
Seventh try
This last one was finally the successful one B-).
Your email address will not be published