Using Bespoke Pools/Panels¶

This guide explains how to configure and use custom pools/panels that aren't included in the default pipeline configuration. If you think that the panel you are working with would be useful for others, then please raise an issue and label it as a feature request, or submit a pull request if you have already configured it for long-term use.

Overview¶

The pipeline uses pools to organize targets. Each pool requires two specific configuration files:

amplicon_info.tsv - Defines amplicon locations, primers, and target information
targeted_reference.fasta - Reference sequences for each amplicon. This can be generated within the pipeline by using the --genome flag with a full genome reference that covers all targets.

By default, the pipeline includes pre-configured pools (e.g., D1, R1, R2). See Pre-Configured Pools for a full list. This page explains how to add your own custom pools.

Pools vs. Panels

The term pools is used because MAD4HATTER is a modular panel that can be run by combining different pools depending on the use case. If your assay uses a single pool design, you can think of a pool as a panel and use one single pool to define the complete panel.

When to Use Custom Pools¶

Use custom pools when:

You have a custom amplicon panel not included in the default configuration
You want to add new amplicons to an existing panel
You're developing a new panel design

Setting Up Custom Pools¶

There are two options for setting up a custom pool. Option 1 (Using Command-Line Parameters) is recommended for one-off or short-term analysis, such as testing a new panel design. Option 2 (Adding to the Pipeline Configuration) is recommended for long-term use and sharing functionality between users. If you think this panel would be useful for others, then please follow Option 2 or reach out for help.

Option 1: Using Command-Line Parameters¶

Step 1: Prepare Your Amplicon Info File¶

Create a tab-separated file describing the targets and primers within your panel/pool with the following columns.

Example File

See an example amplicon info file: D1.1_amplicon_info.tsv

Column	Description	Example
`target_id`	Unique identifier for the target	`Pf3D7_01_v3-insertStart-insertEnd`
`chrom`	Chromosome name (must match genome/reference)	`Pf3D7_01_v3`
`insert_start`	Start position of the amplicon insert	`1000`
`insert_end`	End position of the amplicon insert	`1200`
`fwd_primer`	Forward primer sequence	`ATCGATCGATCG`
`rev_primer`	Reverse primer sequence	`GCTAGCTAGCTA`
`pool`	Pool (or panel) name	`MyCustomPool`

Coordinates

All coordinates should be 0-based (e.g., insert_start)
If using the resmarker module, make sure the coordinates are relative to the full 3D7 reference.

Step 2: Generate a Reference¶

You can either create a targeted reference that includes only the reference sequences for each target, or supply a whole genome reference that covers all targets in your amplicon info file. If using a genome, the pipeline will use the amplicon info to create a targeted reference automatically.

Option A: Using a Genome Reference

Ensure the genome covers all targets defined in the amplicon info file. If your panel includes targets for multiple species, you may need to combine genomes.
Ensure the chromosome names in the genome match the chrom column in your amplicon info file.
Supply to the pipeline using the --genome flag.

Option B: Using a Targeted Reference File

FASTA file with reference sequences for each amplicon
Header format: >target_id (must match target_id in amplicon_info.tsv)
Supply to the pipeline using the --refseq_fasta flag.

Example File

See an example targeted reference file: D1.1_reference.fasta

Step 3: Run the Pipeline¶

Example: Using a genome reference

nextflow run main.nf \
  --readDIR /path/to/data \
  --pools MyCustomPool \
  --amplicon_info /path/to/MyCustomPool_amplicon_info.tsv \
  --genome /path/to/full_genome.fasta \
  -profile docker

Example: Using a targeted reference

nextflow run main.nf \
  --readDIR /path/to/data \
  --pools MyCustomPool \
  --amplicon_info /path/to/MyCustomPool_amplicon_info.tsv \
  --refseq_fasta /path/to/MyCustomPool_reference.fasta \
  -profile docker

Option 2: Adding to Pipeline Configuration¶

Step 1: Prepare Your Files¶

Two files are required to configure the pipeline: an amplicon info file and a targeted reference.

Create a tab-separated file called <your_pool_name>_amplicon_info.tsv describing the targets and primers within your panel/pool with the following columns.

Example Files

See example files from the D1.1 pool: - D1.1_amplicon_info.tsv - Example amplicon info file - D1.1_reference.fasta - Example targeted reference file

Column	Description	Example
`target_id`	Unique identifier for the target	`Pf3D7_01_v3-insertStart-insertEnd`
`chrom`	Chromosome name (must match genome/reference)	`Pf3D7_01_v3`
`insert_start`	Start position of the amplicon insert	`1000`
`insert_end`	End position of the amplicon insert	`1200`
`fwd_primer`	Forward primer sequence	`ATCGATCGATCG`
`rev_primer`	Reverse primer sequence	`GCTAGCTAGCTA`

Coordinates

All coordinates should be 0-based (e.g., insert_start)
If using the resmarker module, make sure the coordinates are relative to the full 3D7 reference.

Create a targeted reference file called <your_pool_name>_reference.fasta:

FASTA file with reference sequences for each amplicon
Header format: >target_id (must match target_id in amplicon_info.tsv)

Step 2: Add Files to Configuration¶

Create a folder named after your pool/panel under panel_information/ and add the files you created in Step 1. For example:

panel_information/
└── MyCustomPool/
    ├── MyCustomPool_amplicon_info.tsv
    └── MyCustomPool_reference.fasta

Add your new pool to conf/panel.config:

params {
    pool_options = [
        'MyCustomPool': [
            amplicon_info_path: 'panel_information/MyCustomPool/MyCustomPool_amplicon_info.tsv',
            targeted_reference_path: 'panel_information/MyCustomPool/MyCustomPool_reference.fasta'
        ]
    ]
}

Then run with:

nextflow run main.nf \
  --readDIR /path/to/data \
  --pools MyCustomPool \
  -c conf/custom.config \
  -profile docker

Troubleshooting¶

Pool Not Found Error¶

If you see an error like "The following pools were not found in configuration":

Check pool name spelling - Pool names are case-sensitive
Verify config file - Ensure your config file is loaded with -c conf/custom.config
Check file paths - Verify paths in your config are correct (relative to project root or absolute)

File Path Issues¶

Use relative paths from the project root, or
Use absolute paths if files are outside the project directory
Ensure file paths in config match actual file locations

Amplicon Info Format Errors¶

Verify all required columns are present
Check for tab-separated format (not spaces)
Ensure target_id values match between amplicon_info.tsv and reference.fasta headers

Best Practices¶

Use descriptive pool names - Avoid spaces and special characters. Use the format chrom-insert_start-insert_end for target IDs to conform with MAD4HATTER naming conventions.
Keep files organized - Use consistent naming: PoolName_amplicon_info.tsv and PoolName_reference.fasta
Validate files first - Check your amplicon_info.tsv format before running
Test with QC workflow - Run --workflow_name qc first to verify pool configuration
Document your pools - Keep notes on pool design and amplicon targets

Getting Help¶

Validation errors: Check file formats and paths
Pool configuration: Verify your config file syntax
File format questions: See example files in the repository's panel_information/ directory
Issues or questions: Report issues on GitHub