Running the Pipeline¶
This page explains how to run the MAD4HATTER pipeline, including which parameters are required, what workflows are available, and how to customize your run.
Understanding Parameters¶
Parameters with two hyphens (--) are pipeline-specific parameters defined by MAD4HATTER. Parameters with one hyphen (-) are Nextflow-defined parameters. For example, --readDIR is a pipeline parameter (two hyphens) and -profile is a Nextflow parameter (one hyphen).
Available Workflows¶
The MAD4HATTER pipeline offers three different workflows. Most users will run only the complete (default) workflow, however, in some cases you may choose one of the other two workflows.
1. Complete Workflow (Default) :¶
The primary and most commonly used workflow
What it does: Runs the full end-to-end workflow, from raw FASTQs through microhaplotype calling and drug-resistance profiling.
When to use: This is the standard workflow for processing your sequencing data end-to-end.
2. QC Only Workflow¶
What it does: Only the steps required to generate basic QC metrics.
When to use: Use this if you want to quickly check your data quality and coverage before running the full pipeline.
3. Postprocessing Only Workflow¶
What it does: Re-runs post-processing (alignment, filtering, masking, allele calling) on previously denoised data.
When to use: Use this if you want to re-analyze your DADA2 output with different parameters (e.g., different masking settings or alignment thresholds).
¶
Required Parameters¶
The parameters you need depend on which workflow you're running. To control which workflow you are running use --workflow_name. By default this is set to complete, so is only necessary when running qc or postprocessing
For Complete and QC Workflows¶
These parameters are required:
| Parameter | Description | Example |
|---|---|---|
--readDIR | Path to directory containing your FASTQ files (R1 and R2) | /path/to/data |
--pools | Comma-separated list of amplicon pools to analyze | D1,R1,R2 |
For Postprocessing Workflow¶
These parameters are required:
| Parameter | Description | Example |
|---|---|---|
--denoised_asvs | Path to dada2.clusters.txt from a previous run | /path/to/dada2.clusters.txt |
--pools | Comma-separated list of amplicon pools | D1,R1,R2 |
Runtime Profile (Required for all workflows)¶
You must specify a runtime profile using -profile:
docker- For local computersapptainerorconda- For HPC clusters- See Quick Start for details on choosing a profile
Pre-Configured Pools¶
The pipeline includes several pre-configured pools that are ready to use. These pools can be used individually or combined depending on your data.
| Pool Name | Description | Shorthand Name | Legacy pool name |
|---|---|---|---|
| MAD4HatTeR Pools | |||
| D1.1 | A primer pool targeting 165 high diversity targets and 5 loci targetting the ldh gene in P. falciparum and in 4 non-falciparum Plasmodium species (P. vivax, P. malariae, P. ovale, and P. knowlesi) | D1 | 1A |
| R1.1 | 82 targets informing drug and diagnostic resistance, Plasmodium species identification, immune-related targets | - | 1B |
| R1.2 | R1.2 is a reduced version of primer pool R1.1, containing 47 targets, designed to reduce primer dimers and increase sensitivity | R1 | 5 |
| R2.1 | 31 targets to provide information on drug and diagnostic resistance, Plasmodium species identification, immune-related targets. This complements R1, covering missing codons | R2 | 2 |
| PfPHAST Pools | |||
| M1.1 | 41 targets for drug and diagnostic resistance, Plasmodium spp. identification, csp, and 10 diversity targets. "Minimal" set for prioritary markers and TES classification | M1 | - |
| M2.1 | 15 targets for drug and diagnostic resistance, Plasmodium spp. identification, csp, and 10 diversity targets. "Minimal" set for prioritary markers and TES classification. Complements M1.1 for missing codons | M2 | - |
| M1.addon | 4 targets for P. vivax, P. malariae, P. ovale and P. knowlesii mitochondrial cytb targets for increased sensitivity in non-Pf detection (recommended to add to M1.1) | - | - |
| Other Pools/ Panels | |||
| 4cast | 4CAST is a small multiplex of four highly polymorphic antigenic loci: CSP, AMA1, SERA2 and TRAP. For more information, see LaVerriere et al. (2021). | - | - |
| ama1 | A single-target panel for analyzing diversity in the apical membrane antigen 1 (AMA1) gene, which is important for invasion of red blood cells by Plasmodium parasites. This panel is useful for studying AMA1 genetic diversity and population structure. For more information, see Miller et al. (2017). | - | - |
| AMPLseq | 129-locus panel composed of drug resistance markers, highly diverse loci for inferring relatedness, and a locus to detect Plasmodium vivax co-infection. For more information, see LaVerriere et al. (2021). | - | - |
| spotmalaria_grc1 | Spot malaria pool for GRC1 targets. For more information, see Jacob CG et al. (2021) | - | - |
| spotmalaria_grc2 | Spot malaria pool for GRC2 targets. For more information, see Jacob CG et al. (2021) | - | - |
| spotmalaria_spec | Spot malaria pool for species identification targets. For more information, see Jacob CG et al. (2021) | - | - |
| v1 | Version 1 of the MADHatTeR panel | - | - |
| v2 | Version 2 of the MADHatTeR panel | - | - |
Using Pre-Configured Pools¶
To use one or more pre-configured pools, simply specify them in the --pools parameter:
# Use a single pool
--pools D1.1
# Combine multiple pools
--pools D1.1,R1.2,R2.1
# Use shorthand names
--pools D1,R1,R2
The pipeline will automatically locate the configuration files for these pools. No additional setup is required.
Common Combos
The most common combination currently being used for MAD4HatTeR is D1.1, R1.2, and R2.1. This can be processed using --pools D1,R1,R2.
Combining Pools
You can combine any of the pre-configured pools by listing them as a comma-separated list. The pipeline will process all targets from the specified pools together.
Custom Pools
If you need to create your own custom pools or panels, see the Custom Pools page for detailed instructions.
Optional Parameters¶
These parameters can be customized but have defaults:
| Parameter | Description | Default | When Needed |
|---|---|---|---|
--outDIR | Output directory for results | results | To specify custom output location |
--workflow_name | Which workflow to run | complete | To run qc or postprocessing workflows |
--refseq_fasta | Path to targeted reference sequence | Auto-generated from pools | To use a custom reference |
--genome | Path to full genome file covering all targets | Not used | To generate reference from genome |
--amplicon_info | Path to custom amplicon info file | Auto-generated from pools | To use custom amplicon definitions |
Reference Sequences
You can provide either --refseq_fasta (targeted reference) or --genome (full genome). If neither is provided, the pipeline will automatically build a reference from the pool configurations. If both are provided, --refseq_fasta takes priority.
Workflow Examples¶
Complete Workflow (Default)¶
This is the most common workflow - it processes your data from start to finish.
nextflow run main.nf \
--readDIR /path/to/your/data \
--outDIR /path/to/results \
--pools D1,R1,R2 \
-profile docker
What happens:
- Reads are demultiplexed and adapters removed
- Sequences are inferred using DADA2
- Sequences are aligned, filtered, masked, and converted to allele tables including microhaplotypes and pseudo_cigars.
- Quality control metrics are calculated
- Resistance markers are extracted
Outputs: All pipeline outputs including allele_data.txt, coverage files, and quality reports. For more information, see Pipeline Outputs.
QC Only Workflow¶
Use this to quickly assess your data quality before running the full pipeline.
nextflow run main.nf \
--readDIR /path/to/your/data \
--outDIR /path/to/results \
--pools D1,R1,R2 \
--workflow_name qc \
-profile docker
What happens:
- Reads are demultiplexed and adapters removed
- Quality control metrics are calculated
Outputs: - amplicon_coverage.txt - Coverage per amplicon - sample_coverage.txt - Coverage per sample
Quick Quality Check
This workflow is much faster than the complete workflow and helps you verify your data is suitable before running the full analysis, but is not required.
Postprocessing Only Workflow¶
Use this to re-analyze DADA2 output with different parameters.
nextflow run main.nf \
--denoised_asvs /path/to/previous/run/dada2.clusters.txt \
--outDIR /path/to/new/results \
--pools D1,R1,R2 \
--workflow_name postprocessing \
-profile docker
What happens:
- Takes DADA2 output from a previous run as input
- Re-runs alignment, filtering, masking, and allele calling
- Generates new allele tables
Outputs: - allele_data.txt - Final allele table with updated parameters
No Raw Data Needed
This workflow doesn't require --readDIR since it uses pre-processed DADA2 output (--denoised_asvs).
Parameter Reference Quick Guide¶
Minimum Command (Complete Workflow)¶
With Custom Output Directory¶
nextflow run main.nf \
--readDIR /path/to/data \
--outDIR /custom/output/path \
--pools D1,R1,R2 \
-profile docker
With Custom Targeted Reference¶
nextflow run main.nf \
--readDIR /path/to/data \
--pools D1,R1,R2 \
--refseq_fasta /path/to/reference.fasta \
-profile docker
With Whole Genome Reference¶
nextflow run main.nf \
--readDIR /path/to/data \
--pools D1,R1,R2 \
--genome /path/to/whole_genome_reference.fasta \
-profile docker
Getting Help¶
To see all available parameters and their descriptions:
This displays the complete help message with all pipeline parameters.
Next Steps¶
- Review outputs: See Pipeline Outputs for details on what files are generated
- Customize parameters: See Advanced Pipeline Parameters for all parameter options
- Configure resources: See Pipeline Resources for computational resource settings
- Use custom pools: See Custom Pools for setting up bespoke amplicon panels
- Understand modules: See Core Pipeline Modules for details on each processing step