[![nf-core](https://maxulysse.github.io/assets/img/svg/nf-core_bytesize2_logo.svg "nf-core-bytesize")](https://nf-co.re)
## How nf-core/configs work?
Maxime Garcia / [ @gau](https://twitter.com/gau) / [ @MaxUlysse](https://github.com/MaxUlysse)
[The Swedish Childhood Tumor Biobank](https://ki.se/forskning/barntumorbanken) / [National Genomics Infrastructure](https://ngisweden.scilifelab.se/)
[ nf-co.re/events/2021/bytesize-2-configs](https://nf-co.re/events/2021/bytesize-2-configs), online - 2021/02/09
Note:
Working for The Swedish Childhood Tumor Biobank, located at KI, and sitting half-time at National Genomics Infrastructure from Scilifelab
---
[![Nextflow](https://maxulysse.github.io/assets/img/slides/nextflow.png "Nextflow")](https://www.nextflow.io/)
* Workflow manager
* Data driven language
* __Portable__
* executable on multiple platforms
* __Shareable and reproducible__
* with containers or virtual environments
* `Docker`, `Singularity` or `Conda`
Note:
I especially love Nextflow for its portability, shareability and reproducibility
---
## Run nf-core/eager on test data
* On a new machine (with `Docker` installed)
* Specify everything on the command line
```text
export NXF_VER=20.04.0 ; curl -s https://get.nextflow.io | bash
git clone --single-branch --branch eager https://github.com/nf-core/test-datasets.git data
nextflow run nf-core/eager -r 2.3.1 \
-with-docker nfcore/eager:2.3.1 \
--max_cpus 2 \
--max_memory 6.GB \
--genome false \
--fasta data/reference/Mammoth/Mammoth_MT_Krause.fasta \
--input data/testdata/Mammoth/mammoth_design_fastq.tsv
```
* container engine with `container:tag`
* available resources
* path to reference genome file
* path to input files
Note:
* Let's try a challenge and run nf-core/eager without config files or profile
* And here is why it is working
---
## With configs
[ nextflow.io/docs/latest/config.html](https://www.nextflow.io/docs/latest/config.html)
```text
nextflow run nf-core/eager -r 2.3.1 -c my_computer.config /
--genome false /
--fasta data/reference/Mammoth/Mammoth_MT_Krause.fasta /
--input data/testdata/Mammoth/mammoth_design_fastq.tsv
```
> `my_computer.config`
>
> ```groovy
> docker.enabled = true
> docker.fixOwnership = true
>
> params {
> max_cpus = 2
> max_memory = 6.GB
>}
>```
>
> `nf-core/eager/nextflow.config`
>
> ```grooyv
> process.container = 'nfcore/eager:2.3.1'
> ```
Note:
* I could have specify genome and input files in a config file as well
* But as I'm planning to run that only once, and not frequently I won't
---
## With profiles
[ nextflow.io/docs/latest/config.html#config-profiles](https://www.nextflow.io/docs/latest/config.html#config-profiles)
```text
nextflow run nf-core/eager -r 2.3.1 -profile test_tsv,docker
```
> `docker`
>
> ```groovy
> docker.enabled = true
> docker.fixOwnership = true
> ```
>
> `test_tsv`
>
> ```groovy
> params {
> max_cpus = 2
> max_memory = 6.GB
>
> genome = false
> fasta = 'data/reference/Mammoth/Mammoth_MT_Krause.fasta'
> input = 'data/testdata/Mammoth/mammoth_design_fastq.tsv'
> }
> ```
Note:
* profiles are like aliases for configs
* `test_tsv` is a profile used for CI tests, so it is use very frequently, that's why it is here in a profile
===
## With singularity
[ nextflow.io/docs/latest/config.html#config-profiles](https://www.nextflow.io/docs/latest/config.html#config-profiles)
```text
nextflow run nf-core/eager -r 2.3.1 -profile test_tsv,singularity
```
> `singularity`
>
> ```groovy
> singularity.enabled = true
> singularity.autoMounts = true
> ```
>
> `test_tsv`
>
> ```groovy
> params {
> max_cpus = 2
> max_memory = 6.GB
>
> genome = false
> fasta = 'data/reference/Mammoth/Mammoth_MT_Krause.fasta'
> input = 'data/testdata/Mammoth/mammoth_design_fastq.tsv'
> }
> ```
Note:
* If singularity is already installed, just change profiles, and voilà
* This is why I like Nexflow and nf-core, it's easy
---
## On my institutional HPC
* Which container/virtual environment engine?
* What are the available resources?
* Which `executor`?
* Where are the reference genome files?
* Where are the input files?
---
## Let's make a config file
```text
nextflow run nf-core/eager -r 2.3.1 -c my_hpc.config --project MUG_210209 /
--genome false /
--fasta /data1/maxime/workspace/nf-core/eager/data/reference/Mammoth/Mammoth_MT_Krause.fasta /
--input /data1/maxime/workspace/nf-core/eager/data/testdata/Mammoth/mammoth_design_fastq.tsv
```
`my_hpc.config`
```groovy
singularity {
cacheDir = '/data0/containers/'
enabled = true
runOptions = '-B /scratch:/scratch -B /shared:/shared'
}
process {
beforeScript = 'module load singularity/3.4.2'
executor = 'slurm'
clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}" }
}
params {
max_memory = 125.GB
max_cpus = 20
max_time = 240.h
}
```
Note:
* This won't work for you
---
## Let's make a profile
[ github.com/nf-core/configs#adding-a-new-config](https://github.com/nf-core/configs#adding-a-new-config)
`nf-core/configs/conf/my_hpc.config`
`nf-core/configs/nfcore_custom.config`
```groovy
my_hpc { includeConfig "${params.custom_config_base}/conf/my_hpc.config" }
```
__NB:__ Don't forget docs and CI tests.
_cf_ [ github.com/MaxUlysse/nf-core_configs/tree/my_hpc](https://github.com/MaxUlysse/nf-core_configs/tree/my_hpc)
---
# Tips
All nf-core pipelines are designed for a usage on a typical HPC, with reasonable default resources for each process.
===
`conf/base.config`
```groovy
process {
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 7.GB * task.attempt, 'memory' ) }
time = { check_max( 24.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 3
maxErrors = '-1'
withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
time = { check_max( 6.h * task.attempt, 'time' ) }
}
[...]
}
```
`nextflow.config`
```groovy
params {
max_cpus = 16
max_memory = 128.GB
max_time = 240.h
}
```
===
## Max resources
* It is just a threshold not to go over
* Will change the limit, not the resources
If you want to change the base resource,
look at the `cpus`, `memory` and `time` properties in the scope `process`.
===
## Change resource requirements
[ nextflow.io/docs/latest/config.html#process-selectors](https://www.nextflow.io/docs/latest/config.html#process-selectors)
```groovy
process {
withName: PROCESS_NAME {
maxRetries = 1
memory = 725.GB
cpus = 40
time = 24.h
}
withLabel: PROCESS_LABEL {
maxRetries = 3
memory = 110.GB
cpus = 20
time = 24.h
}
}
```
===
## Include a config file in a profile
[ nextflow.io/docs/latest/config.html#config-include](https://www.nextflow.io/docs/latest/config.html#config-include)
```groovy
includeConfig 'my_conf.config'
```
===
## Test online
```text
nextflow run nf-core/eager -r 2.3.1 -profile my_hpc --project MUG_210209 /
--custom_config_base https://raw.githubusercontent.com/MaxUlysse/nf-core_configs/my_hpc /
--genome false /
--fasta /data1/maxime/workspace/nf-core/eager/data/reference/Mammoth/Mammoth_MT_Krause.fasta /
--input /data1/maxime/workspace/nf-core/eager/data/testdata/Mammoth/mammoth_design_fastq.tsv
```
---
## Stay at home message
* Read the docs -> [ nextflow.io/docs/latest/config.html](https://www.nextflow.io/docs/latest/config.html)
* Check out the repo -> [ github.com/nf-core/configs](https://github.com/nf-core/configs)
* Stay tuned for future `nf-core/bytesize`
Note:
* Reads the docs, try things out, and don't hesitate to ask questions
* More talks are coming
---
## Get involved
[ nf-co.re/join](https://nf-co.re/join)
Note:
* Join us on Github
* Join our Slack
* Follow us on Twitter
* Follow us on Youtube
---
Note:
* Thank institutes and sponsors + collaborators
* Thank all nf-core contributors
---
## Join us
* [ nf-co.re/join](https://nf-co.re/join)
## References
* [ nextflow.io/docs/latest/config.html](https://www.nextflow.io/docs/latest/config.html)
* [ github.com/nf-core/configs](https://github.com/nf-core/configs)
* [ maxulysse.github.io/bytesize_2](https://maxulysse.github.io/bytesize_2)
* [ github.com/MaxUlysse/nf-core_configs/tree/my_hpc](https://github.com/MaxUlysse/nf-core_configs/tree/my_hpc)
Note:
Here are some important links, including the docs, and the slide for this talk on my website
If you have any question, now is the time