Getting Started
This guide will help you get started, even if you’re new to bioinformatics pipelines.
Prerequisites
Before you begin, you’ll need:
- Nextflow - The workflow management system that runs the pipeline
- Installation instructions: Nextflow website
- Alternative: Install via conda:
conda install -c bioconda nextflow
- Java 11 or higher - Required by Nextflow
- Check if you have Java:
java -version - If not installed, download from Oracle or use your system’s package manager
- Check if you have Java:
- One of the following runtime environments (choose based on your setup):
- Docker - For local computers (recommended for beginners)
- Apptainer/Singularity - For HPC clusters
- Conda - Alternative dependency management
Quick Start Guide
Step 1: Choose Your Environment
- Are you running on:
Step 2: Prepare Your Data
Make sure your sequencing data is organized:
- Forward reads (R1) and reverse reads (R2) in FASTQ format
- Files should be in a single folder
- Example:
/path/to/data/sample1_R1.fastq.gzand/path/to/data/sample1_R2.fastq.gz
Step 3: Run the Pipeline
Follow the instructions for your chosen environment below.
Docker
Best for: Running on your own computer (Mac, or Linux). Docker automatically handles all software dependencies. The pipeline will download the required Docker image automatically on first run - no manual setup needed!
Prerequisites
- Docker Desktop installed and running
Basic Command
nextflow run main.nf \
--readDIR tests/example_data/example_fastq \
--pools D1,R1,R2 \
-profile docker
!!! tip “First Time Running?” The first time you run with Docker, it will download the pipeline image (this may take a few minutes). Subsequent runs will be much faster!
Apptainer
Best for: High-performance computing clusters, grid computing, or shared computing resources
Prerequisites
- Apptainer installed on your cluster
- Access to a cluster with a job scheduler (currently supports SGE or slurm)
Step 1: Build the Apptainer Image
First, pull the container image onto your cluster:
apptainer pull docker://eppicenter/mad4hatter:latest
Alternatively, you can build the container image from scratch on your cluster:
apptainer build mad4hatter.sif Apptainer
This creates a file called mad4hatter.sif that contains all the software needed.
!!! note “One-Time Setup” You typically only need to build/pull the image once. After that, you can reuse the mad4hatter.sif file.
Step 2: Run the Pipeline
nextflow run main.nf \
--readDIR tests/example_data/example_fastq \
--pools D1,R1,R2 \
-profile sge,apptainer \
-c conf/custom.config
Important parameters:
-profile sge,apptainer- Use a job scheduler (e.g., SGE or slurm) with Apptainer-c conf/custom.config- Configuration file for resource allocation
!!! note “Job Scheduler” Replace sge with your cluster’s job scheduler if different. Contact your system administrator if you’re unsure which scheduler your cluster uses.
Conda (Alternative Option)
Best for: Users who prefer conda for package management.
!!! tip “Using conda” This is not a recommended way to run the pipeline and limitted support will be available for running using conda.
Prerequisites
Basic Command
nextflow run main.nf \
--readDIR tests/example_data/example_fastq \
--pools D1,R1,R2 \
-profile conda
!!! note “Conda Environments” Conda will automatically create and manage the required software environments. This may take longer on the first run as it installs dependencies.
No-Code Option: Terra Platform
Don’t want to use the command line? The pipeline is also available on Terra, a cloud-based platform with a graphical interface.
- Access the workspace: Terra MAD4HATTER Workspace
- No installation required!
Next Steps
- Review the outputs - See Pipeline Outputs for details
- Understand pipeline usage - Check the Running the Pipeline page for more advanced ways of running the pipeline.
Getting Help
- Command-line help: See all available parameters and options by running:
nextflow run main.nf --helpThis will display the complete help message with all pipeline parameters and their descriptions.
- Documentation: Browse the other pages in this documentation
- Getting in touch: Report bugs, feature requests, and questions as an issue on the GitHub repository. Alternatively, reach out to the EPPIcenter team (kathryn.murie@ucsf.edu).