Getting Started¶
This guide will help you get started, even if you're new to bioinformatics pipelines.
Prerequisites¶
Before you begin, you'll need:
-
Nextflow - The workflow management system that runs the pipeline
- Installation instructions: Nextflow website
- Alternative: Install via conda:
conda install -c bioconda nextflow
-
Java 11 or higher - Required by Nextflow
- Check if you have Java:
java -version - If not installed, download from Oracle or use your system's package manager
- Check if you have Java:
-
One of the following runtime environments (choose based on your setup):
- Docker - For local computers (recommended for beginners)
- Apptainer/Singularity - For HPC clusters
- Conda - Alternative dependency management
Quick Start Guide¶
Step 1: Choose Your Environment¶
- Are you running on:
Step 2: Prepare Your Data¶
Make sure your sequencing data is organized: - Forward reads (R1) and reverse reads (R2) in FASTQ format - Files should be in a single folder - Example: /path/to/data/sample1_R1.fastq.gz and /path/to/data/sample1_R2.fastq.gz
Step 3: Run the Pipeline¶
Follow the instructions for your chosen environment below.
Docker¶
Best for: Running on your own computer (Mac, or Linux). Docker automatically handles all software dependencies. The pipeline will download the required Docker image automatically on first run - no manual setup needed!
Prerequisites¶
- Docker Desktop installed and running
Basic Command¶
nextflow run main.nf \
--readDIR tests/example_data/example_fastq \
--pools D1,R1,R2 \
-profile docker
First Time Running?
The first time you run with Docker, it will download the pipeline image (this may take a few minutes). Subsequent runs will be much faster!
Apptainer¶
Best for: High-performance computing clusters, grid computing, or shared computing resources
Prerequisites¶
- Apptainer installed on your cluster
- Access to a cluster with a job scheduler (currently supports SGE or slurm)
Step 1: Build the Apptainer Image¶
First, pull the container image onto your cluster:
Alternatively, you can build the container image from scratch on your cluster:
This creates a file called mad4hatter.sif that contains all the software needed.
One-Time Setup
You typically only need to build/pull the image once. After that, you can reuse the mad4hatter.sif file.
Step 2: Run the Pipeline¶
nextflow run main.nf \
--readDIR tests/example_data/example_fastq \
--pools D1,R1,R2 \
-profile sge,apptainer \
-c conf/custom.config
Important parameters: - -profile sge,apptainer - Use a job scheduler (e.g., SGE or slurm) with Apptainer - -c conf/custom.config - Configuration file for resource allocation
Job Scheduler
Replace sge with your cluster's job scheduler if different. Contact your system administrator if you're unsure which scheduler your cluster uses.
Conda (Alternative Option)¶
Best for: Users who prefer conda for package management.
Using conda
This is not a recommended way to run the pipeline and limitted support will be available for running using conda.
Prerequisites¶
Basic Command¶
nextflow run main.nf \
--readDIR tests/example_data/example_fastq \
--pools D1,R1,R2 \
-profile conda
Conda Environments
Conda will automatically create and manage the required software environments. This may take longer on the first run as it installs dependencies.
No-Code Option: Terra Platform¶
Don't want to use the command line? The pipeline is also available on Terra, a cloud-based platform with a graphical interface.
- Access the workspace: Terra MAD4HATTER Workspace
- No installation required!
Next Steps¶
- Review the outputs - See Pipeline Outputs for details
- Understand pipeline usage - Check the Running the Pipeline page for more advanced ways of running the pipeline.
Getting Help¶
-
Command-line help: See all available parameters and options by running:
This will display the complete help message with all pipeline parameters and their descriptions. -
Documentation: Browse the other pages in this documentation
- Getting in touch: Report bugs, feature requests, and questions as an issue on the GitHub repository. Alternatively, reach out to the EPPIcenter team (kathryn.murie@ucsf.edu).