Using Bespoke Pools/Panels¶
This guide explains how to configure and use custom pools/panels that aren't included in the default pipeline configuration. If you think that the panel you are working with would be useful for others, then please raise an issue and label it as a feature request, or submit a pull request if you have already configured it for long-term use.
Overview¶
The pipeline uses pools to organize targets. Each pool requires two specific configuration files:
amplicon_info.tsv- Defines amplicon locations, primers, and target informationtargeted_reference.fasta- Reference sequences for each amplicon. This can be generated within the pipeline by using the--genomeflag with a full genome reference that covers all targets.
By default, the pipeline includes pre-configured pools (e.g., D1, R1, R2). See Pre-Configured Pools for a full list. This page explains how to add your own custom pools.
Pools vs. Panels
The term pools is used because MAD4HATTER is a modular panel that can be run by combining different pools depending on the use case. If your assay uses a single pool design, you can think of a pool as a panel and use one single pool to define the complete panel.
When to Use Custom Pools¶
Use custom pools when:
- You have a custom amplicon panel not included in the default configuration
- You want to add new amplicons to an existing panel
- You're developing a new panel design
Setting Up Custom Pools¶
There are two options for setting up a custom pool. Option 1 (Using Command-Line Parameters) is recommended for one-off or short-term analysis, such as testing a new panel design. Option 2 (Adding to the Pipeline Configuration) is recommended for long-term use and sharing functionality between users. If you think this panel would be useful for others, then please follow Option 2 or reach out for help.
Option 1: Using Command-Line Parameters¶
Step 1: Prepare Your Amplicon Info File¶
Create a tab-separated file describing the targets and primers within your panel/pool with the following columns.
Example File
See an example amplicon info file: D1.1_amplicon_info.tsv
| Column | Description | Example |
|---|---|---|
target_id | Unique identifier for the target | Pf3D7_01_v3-insertStart-insertEnd |
chrom | Chromosome name (must match genome/reference) | Pf3D7_01_v3 |
insert_start | Start position of the amplicon insert | 1000 |
insert_end | End position of the amplicon insert | 1200 |
fwd_primer | Forward primer sequence | ATCGATCGATCG |
rev_primer | Reverse primer sequence | GCTAGCTAGCTA |
pool | Pool (or panel) name | MyCustomPool |
Coordinates
- All coordinates should be 0-based (e.g.,
insert_start) - If using the resmarker module, make sure the coordinates are relative to the full 3D7 reference.
Step 2: Generate a Reference¶
You can either create a targeted reference that includes only the reference sequences for each target, or supply a whole genome reference that covers all targets in your amplicon info file. If using a genome, the pipeline will use the amplicon info to create a targeted reference automatically.
Option A: Using a Genome Reference
- Ensure the genome covers all targets defined in the amplicon info file. If your panel includes targets for multiple species, you may need to combine genomes.
- Ensure the chromosome names in the genome match the
chromcolumn in your amplicon info file. - Supply to the pipeline using the
--genomeflag.
Option B: Using a Targeted Reference File
- FASTA file with reference sequences for each amplicon
- Header format:
>target_id(must matchtarget_idin amplicon_info.tsv) - Supply to the pipeline using the
--refseq_fastaflag.
Example File
See an example targeted reference file: D1.1_reference.fasta
Step 3: Run the Pipeline¶
Example: Using a genome reference
nextflow run main.nf \
--readDIR /path/to/data \
--pools MyCustomPool \
--amplicon_info /path/to/MyCustomPool_amplicon_info.tsv \
--genome /path/to/full_genome.fasta \
-profile docker
Example: Using a targeted reference
nextflow run main.nf \
--readDIR /path/to/data \
--pools MyCustomPool \
--amplicon_info /path/to/MyCustomPool_amplicon_info.tsv \
--refseq_fasta /path/to/MyCustomPool_reference.fasta \
-profile docker
Option 2: Adding to Pipeline Configuration¶
Step 1: Prepare Your Files¶
Two files are required to configure the pipeline: an amplicon info file and a targeted reference.
Create a tab-separated file called <your_pool_name>_amplicon_info.tsv describing the targets and primers within your panel/pool with the following columns.
Example Files
See example files from the D1.1 pool: - D1.1_amplicon_info.tsv - Example amplicon info file - D1.1_reference.fasta - Example targeted reference file
| Column | Description | Example |
|---|---|---|
target_id | Unique identifier for the target | Pf3D7_01_v3-insertStart-insertEnd |
chrom | Chromosome name (must match genome/reference) | Pf3D7_01_v3 |
insert_start | Start position of the amplicon insert | 1000 |
insert_end | End position of the amplicon insert | 1200 |
fwd_primer | Forward primer sequence | ATCGATCGATCG |
rev_primer | Reverse primer sequence | GCTAGCTAGCTA |
Coordinates
- All coordinates should be 0-based (e.g.,
insert_start) - If using the resmarker module, make sure the coordinates are relative to the full 3D7 reference.
Create a targeted reference file called <your_pool_name>_reference.fasta:
- FASTA file with reference sequences for each amplicon
- Header format:
>target_id(must matchtarget_idin amplicon_info.tsv)
Step 2: Add Files to Configuration¶
Create a folder named after your pool/panel under panel_information/ and add the files you created in Step 1. For example:
panel_information/
└── MyCustomPool/
├── MyCustomPool_amplicon_info.tsv
└── MyCustomPool_reference.fasta
Add your new pool to conf/panel.config:
params {
pool_options = [
'MyCustomPool': [
amplicon_info_path: 'panel_information/MyCustomPool/MyCustomPool_amplicon_info.tsv',
targeted_reference_path: 'panel_information/MyCustomPool/MyCustomPool_reference.fasta'
]
]
}
Then run with:
nextflow run main.nf \
--readDIR /path/to/data \
--pools MyCustomPool \
-c conf/custom.config \
-profile docker
Troubleshooting¶
Pool Not Found Error¶
If you see an error like "The following pools were not found in configuration":
- Check pool name spelling - Pool names are case-sensitive
- Verify config file - Ensure your config file is loaded with
-c conf/custom.config - Check file paths - Verify paths in your config are correct (relative to project root or absolute)
File Path Issues¶
- Use relative paths from the project root, or
- Use absolute paths if files are outside the project directory
- Ensure file paths in config match actual file locations
Amplicon Info Format Errors¶
- Verify all required columns are present
- Check for tab-separated format (not spaces)
- Ensure
target_idvalues match between amplicon_info.tsv and reference.fasta headers
Best Practices¶
- Use descriptive pool names - Avoid spaces and special characters. Use the format
chrom-insert_start-insert_endfor target IDs to conform with MAD4HATTER naming conventions. - Keep files organized - Use consistent naming:
PoolName_amplicon_info.tsvandPoolName_reference.fasta - Validate files first - Check your
amplicon_info.tsvformat before running - Test with QC workflow - Run
--workflow_name qcfirst to verify pool configuration - Document your pools - Keep notes on pool design and amplicon targets
Getting Help¶
- Validation errors: Check file formats and paths
- Pool configuration: Verify your config file syntax
- File format questions: See example files in the repository's
panel_information/directory - Issues or questions: Report issues on GitHub