Description
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads
to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s
of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome,
its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Software Category: bio
For detailed information, visit the Bowtie2
website.
Available Versions
The current installation of Bowtie2
incorporates the most popular packages. To find the available versions and learn how to load them, run:
module spider bowtie2
The output of the command shows the available Bowtie2
module versions.
For detailed information about a particular Bowtie2
module, including how to load the module, run the module spider
command with the module’s full version label. For example:
module spider bowtie2/2.5.1
Module | Version |
Module Load Command |
bowtie2 | 2.5.1 |
module load bowtie2/2.5.1
|
Bowtie2 Example
The following example demonstrates how to run a Bowtie sequence alignment on multiple cpu cores on a single HPC node. More details information about the aligner can be found here.
Note that Bowtie cannot be executed across multiple nodes.
Create a temporary directory
We start by creating a temporary directory and copying the Bowtie2 example files into it.
module load gcc bowtie2
mkdir -p /scratch/$USER/bowtie_temp
cp -r $EBROOTBOWTIE2/example /scratch/$USER/bowtie_temp
- The
$USER
variable will expand to your computing ID so you’ll be using your personal scratch directory.
- The
EBROOTBOWTIE2
environment variable is set to the Bowtie2 installation directory after you load the bowtie2
module.
- Note that you have to preload the
gcc
module before loading bowtie2.
The Slurm Job Script
The Slurm script defines the HPC resources needed to run the Bowtie2 indexing and alignment. Bowtie2 can utilize multiple cpu cores on a single compute node. It does not support execution on multiple nodes.
Let’s create a textfile that serves as our job script, alignment.slurm
, with the following content:
#!/bin/bash
#SBATCH --job-name=bowtie2_example
#SBATCH --cpus-per-task=8
#SBATCH --time=00:10:00
#SBATCH -o Bowtie_test.o%j
#SBATCH --partition=standard
#SBATCH --account=<YOUR_ALLOCATION>
#Load the Bowtie Module
module load gcc
module load bowtie2
# Change to temp working directory with example files
cd /scratch/$USER/bowtie_temp
# Indexing a reference genome
bowtie2-build ./example/reference/lambda_virus.fa lambda_virus
# Aligning example reads, standard example
bowtie2 -p $SLURM_CPUS_PER_TASK -x lambda_virus -U ./example/reads/reads_1.fq -S align.sam
# Paired-end example
bowtie2 -p $SLURM_CPUS_PER_TASK -x lambda_virus -1 ./example/reads/reads_1.fq -2 ./example/reads/reads_2.fq -S align2.sam
# Local alignment example
bowtie2 -p $SLURM_CPUS_PER_TASK --local -x lambda_virus -U ./example/reads/longreads.fq -S align3.sam
-
You need to replace <YOUR_ALLOCATION>
with your own HPC allocation name.
-
The $USER
variable will expand to your computing ID so you’ll be using your personal scratch directory.
-
The $SLURM_CPUS_PER_TASK
variable is set by the job scheduler to match the --cpus-per-task
directive, in this case 8 cpus core per task (job run).
Submitting the Job
To run the above script, type:
sbatch alignment.slurm
This will return a message like this:
Submitted batch job 3184590
Check the Job Status
To check the status of the job, run
squeue -u <username>
where <username>
is your UVA computing ID that you used to log in.
To see a history of your jobs, run:
sacct
Output
Because of parallel processing, the aligned reads might appear in the output SAM file in a different order than they were in the input FASTQ. You can add the --reorder
flag to your command so that the order does not change, but it is typically not necessary.
Troubleshooting
Caution: If you create the Slurm job script on a Windows computer and then upload it to the HPC system, you’ll probably get an error when you run it with sbatch that says:
sbatch: error: Batch script contains DOS line breaks (\r\n)
sbatch: error: instead of expected UNIX line breaks (\n).
To fix this, run
dos2unix alignment.slurm
This will remove unwanted \r from text files.