Examples and Tutorials
This page provides practical examples and tutorials for using modtector in various scenarios.
Table of Contents
Basic Example
Scenario
You have BAM files from modified and unmodified samples and want to detect RNA modifications.
Data
mod_sample.bam: Modified sample BAM fileunmod_sample.bam: Unmodified sample BAM filereference.fa: Reference FASTA filestructure.dp: Secondary structure file
Step-by-Step Process
1. Generate Pileup Data
# Process modified sample
modtector count \
-b mod_sample.bam \
-f reference.fa \
-o mod_count.csv \
-t 4
# Process unmodified sample
modtector count \
-b unmod_sample.bam \
-f reference.fa \
-o unmod_count.csv \
-t 4
2. Calculate Reactivity
modtector reactivity \
-M mod_count.csv \
-U unmod_count.csv \
-O reactivity.csv \
-t 4
3. Normalize Reactivity Signals
modtector norm \
-i reactivity.csv \
-o normalized_reactivity.csv \
-m winsor90
4. Generate Plots
modtector plot \
-M mod_count.csv \
-U unmod_count.csv \
-o plots/ \
-r normalized_reactivity.csv \
-t 4
5. Evaluate Accuracy
modtector evaluate \
-r normalized_reactivity.csv \
-s structure.dp \
-o results/ \
-g gene_id
Expected Outputs
output/
├── mod_count.csv # Raw pileup data (modified)
├── unmod_count.csv # Raw pileup data (unmodified)
├── reactivity.csv # Reactivity scores
├── normalized_reactivity.csv # Normalized reactivity data
├── plots/ # Visualization plots
│ ├── signal_distribution.svg
│ ├── reactivity_plot.svg
│ └── comparison_plot.svg
└── results/ # Evaluation results
├── evaluation_comprehensive.txt
├── roc_curve.svg
└── pr_curve.svg
Advanced Example
Scenario
You have multiple samples and want to perform comprehensive analysis with different methods and parameters.
Data
Multiple BAM files for different conditions
High-quality reference sequences
Experimentally determined secondary structures
Advanced Workflow
1. Batch Processing with Windowing
# Process multiple samples with windowing
for sample in mod1 mod2 mod3 unmod1 unmod2 unmod3; do
modtector count \
-b ${sample}.bam \
-f reference.fa \
-o ${sample}_count.csv \
-w 1000 \
-t 8
done
2. Advanced Normalization
# Use dynamic windowing for normalization
for sample in mod1 mod2 mod3 unmod1 unmod2 unmod3; do
modtector norm \
-i ${sample}_count.csv \
-o ${sample}_norm.csv \
-m winsor90 \
--dynamic \
--bases AC \
--linear
done
3. Multiple Reactivity Methods
# Compare different reactivity calculation methods
modtector reactivity \
-M mod1_norm.csv \
-U unmod1_norm.csv \
-O reactivity_current.csv \
-s current \
-m current
modtector reactivity \
-M mod1_norm.csv \
-U unmod1_norm.csv \
-O reactivity_ding.csv \
-s ding \
-m siegfried \
--pseudocount 0.5 \
--maxscore 5
4. Statistical Comparison
# Compare different samples
modtector compare \
-M mod1_norm.csv \
-U unmod1_norm.csv \
-o comparison_t_test.csv \
-t t-test \
-d 20 \
-f 1.5
modtector compare \
-M mod1_norm.csv \
-U unmod1_norm.csv \
-o comparison_mann_whitney.csv \
-t mann-whitney \
-d 20 \
-f 1.5
5. Comprehensive Evaluation
# Evaluate with different parameters
modtector evaluate \
-r reactivity_current.csv \
-s structure.dp \
-o results_current/ \
-g gene_id \
--auto-shift \
--base-matching
modtector evaluate \
-r reactivity_ding.csv \
-s structure.dp \
-o results_ding/ \
-g gene_id \
--auto-shift \
--base-matching
SVG Plotting Examples
Scenario
You want to visualize reactivity data on RNA secondary structure using SVG plots.
Data Requirements
Reactivity CSV file with position, base, and score data
SVG template file with RNA structure
Optional: Reference sequence for alignment
Basic SVG Plotting
1. Simple SVG Plot (SVG-Only Mode)
# Basic SVG plotting without regular plots
modtector plot \
-o svg_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv
2. Multi-Signal SVG Plotting
# Plot all signal types (stop, mutation, etc.)
modtector plot \
-o svg_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv \
--svg-bases ATGC \
--svg-signal all \
--svg-strand +
3. Single Signal SVG Plotting
# Plot only stop signals
modtector plot \
-o svg_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv \
--svg-bases AC \
--svg-signal stop \
--svg-strand +
Advanced SVG Plotting
4. SVG Plotting with Alignment
# Use reference sequence for alignment
modtector plot \
-o svg_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv \
--svg-bases ATGC \
--svg-signal all \
--svg-strand + \
--svg-ref reference.fa \
--svg-max-shift 10
5. Negative Strand SVG Plotting
# Plot negative strand data
modtector plot \
-o svg_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv \
--svg-bases ATGC \
--svg-signal stop \
--svg-strand -
6. Both Strands SVG Plotting
# Plot both positive and negative strands
modtector plot \
-o svg_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv \
--svg-bases ATGC \
--svg-signal all \
--svg-strand both
Combined Mode (Regular + SVG)
7. Generate Both Regular and SVG Plots
# Generate both regular plots and SVG plots
modtector plot \
-M mod_sample.csv \
-U unmod_sample.csv \
-o combined_output/ \
--svg-template rna_structure.svg \
--reactivity reactivity_data.csv \
--svg-bases ATGC \
--svg-signal all
Real-World Examples
Example 1: Human 18S rRNA Structure
# Plot Human 18S rRNA with multiple signals
modtector plot \
-o human_18s_output/ \
--svg-template Human_18S.svg \
--reactivity md_HEK293_Human_norm.csv \
--svg-bases ATGC \
--svg-signal all \
--svg-strand +
Output files:
rna_structure_colored_stop.svg- Stop signal visualizationrna_structure_colored_mutation.svg- Mutation signal visualization
Example 2: E.coli 16S rRNA Structure
# Plot E.coli 16S rRNA with alignment
modtector plot \
-o ecoli_16s_output/ \
--svg-template Ecoli_RNA_structure.svg \
--reactivity EC_DMS-seq.csv \
--svg-bases AC \
--svg-signal all \
--svg-strand + \
--svg-ref Ecoli_RNA_structure.svg \
--svg-max-shift 5
Output files:
rna_structure_colored_score.svg- Single signal visualization
Example 3: DMS-seq Specific Analysis
# DMS-seq analysis (A and C bases only)
modtector plot \
-o dms_seq_output/ \
--svg-template rna_structure.svg \
--reactivity dms_data.csv \
--svg-bases AC \
--svg-signal stop \
--svg-strand +
SVG Output Files
File Naming Convention
Multi-signal:
rna_structure_colored_[signal_type].svgrna_structure_colored_stop.svgrna_structure_colored_mutation.svg
Single-signal:
rna_structure_colored_score.svg
SVG File Structure
Original template: Complete RNA structure template
Colored circles: Reactivity scores mapped to positions
Color bars: Legend showing score ranges
Multiple SVG elements: One for each signal type
Troubleshooting SVG Plotting
Common Issues
No circles in SVG output
# Check CSV format and position mapping head -5 reactivity_data.csv # Verify SVG template has position labels grep -c "title>" rna_structure.svg
Wrong signal type plotted
# Check CSV headers for signal columns head -1 reactivity_data.csv # Use specific signal type --svg-signal stop # or mutation, score
Missing bases in output
# Check base filtering --svg-bases ATGC # All bases --svg-bases AC # A and C only
Quality Control
# Check SVG file size and content
ls -la svg_output/
wc -l svg_output/*.svg
grep -c "<circle" svg_output/*.svg
Best Practices
Use appropriate base filtering: Match your experimental method
DMS-seq:
--svg-bases ACSHAPE-seq:
--svg-bases ATGC
Choose correct signal type: Based on your data
Single signal:
--svg-signal scoreMultiple signals:
--svg-signal all
Use alignment when needed: For sequence mismatches
Reference sequence:
--svg-ref reference.faMax shift:
--svg-max-shift 10
Optimize for your data: Adjust parameters
Strand selection:
--svg-strand +(default)Base filtering: Match experimental method
Batch Processing
Scenario
You have many samples and want to process them efficiently.
Batch Script Example
#!/bin/bash
# Configuration
BAM_DIR="/path/to/bam/files"
REFERENCE="/path/to/reference.fa"
OUTPUT_DIR="/path/to/output"
THREADS=8
# Create output directories
mkdir -p ${OUTPUT_DIR}/{count,norm,reactivity,plots,results}
# Process all BAM files
for bam_file in ${BAM_DIR}/*.bam; do
sample_name=$(basename ${bam_file} .bam)
echo "Processing ${sample_name}..."
# Generate pileup data
modtector count \
-b ${bam_file} \
-f ${REFERENCE} \
-o ${OUTPUT_DIR}/count/${sample_name}_count.csv \
-t ${THREADS}
# Normalize signals
modtector norm \
-i ${OUTPUT_DIR}/count/${sample_name}_count.csv \
-o ${OUTPUT_DIR}/norm/${sample_name}_norm.csv \
-m winsor90 \
--bases AC
done
# Calculate reactivity for all pairs
for mod_sample in ${OUTPUT_DIR}/norm/*_mod_*.csv; do
unmod_sample=${mod_sample/_mod_/_unmod_}
if [ -f "${unmod_sample}" ]; then
sample_name=$(basename ${mod_sample} _norm.csv)
echo "Calculating reactivity for ${sample_name}..."
modtector reactivity \
-M ${mod_sample} \
-U ${unmod_sample} \
-O ${OUTPUT_DIR}/reactivity/${sample_name}_reactivity.csv \
-t ${THREADS}
fi
done
Parallel Processing
#!/bin/bash
# Use GNU parallel for maximum efficiency
parallel -j 4 modtector count \
-b {} \
-f reference.fa \
-o {.}_count.csv \
-t 2 ::: *.bam
Custom Analysis
Scenario
You want to analyze specific regions or use custom parameters.
Region-Specific Analysis
# Extract specific regions first
samtools view -b mod_sample.bam chr1:1000-2000 > mod_region.bam
samtools view -b unmod_sample.bam chr1:1000-2000 > unmod_region.bam
# Process specific regions
modtector count \
-b mod_region.bam \
-f reference.fa \
-o mod_region_count.csv \
-t 4
modtector count \
-b unmod_region.bam \
-f reference.fa \
-o unmod_region_count.csv \
-t 4
Custom Normalization Parameters
# Use custom window size and offset
modtector norm \
-i input.csv \
-o output.csv \
-m winsor90 \
--window 500 \
--window-offset 100 \
--bases AC
Custom Reactivity Parameters
# Use custom pseudocount and maxscore
modtector reactivity \
-M mod.csv \
-U unmod.csv \
-O reactivity.csv \
--pseudocount 0.1 \
--maxscore 20
Troubleshooting Examples
Low Coverage Issue
# Check coverage
samtools depth mod_sample.bam | awk '{sum+=$3} END {print sum/NR}'
# If coverage is low, increase depth threshold
modtector plot \
-M mod_norm.csv \
-U unmod_norm.csv \
-o plots/ \
-d 20 # Lower depth threshold
Memory Issues
# Reduce thread count
modtector count \
-b large_sample.bam \
-f reference.fa \
-o output.csv \
-t 2 # Use fewer threads
# Use windowing to reduce memory usage
modtector count \
-b large_sample.bam \
-f reference.fa \
-o output.csv \
-w 1000 # Use smaller windows
Poor Normalization
# Try different normalization methods
modtector norm \
-i input.csv \
-o output_percentile.csv \
-m percentile28
modtector norm \
-i input.csv \
-o output_boxplot.csv \
-m boxplot
# Compare results
modtector plot \
-M output_percentile.csv \
-U output_boxplot.csv \
-o comparison_plots/
Low Evaluation Scores
# Check data quality
modtector evaluate \
-r reactivity.csv \
-s structure.dp \
-o results/ \
-g gene_id \
--no-auto-shift # Try without auto-shift
# Use different signal types
modtector evaluate \
-r reactivity.csv \
-s structure.dp \
-o results/ \
-g gene_id \
-t mutation # Use only mutation signals
Performance Optimization
Large Dataset Processing
# Use maximum threads
modtector count \
-b large_sample.bam \
-f reference.fa \
-o output.csv \
-t 16 # Use all available cores
# Use windowing for large genomes
modtector count \
-b large_sample.bam \
-f reference.fa \
-o output.csv \
-w 5000 # Use larger windows
Memory Optimization
# Process in chunks
split -l 10000 large_input.csv chunk_
for chunk in chunk_*; do
modtector norm \
-i ${chunk} \
-o ${chunk}_norm.csv \
-m winsor90
done
Disk Space Management
# Compress intermediate files
gzip *.csv
# Remove intermediate files after processing
rm *_count.csv *_norm.csv
# Keep only final results
mkdir final_results
mv reactivity.csv plots/ results/ final_results/
Real-World Examples
Example 1: m6A Detection
# m6A-specific analysis
modtector norm \
-i input.csv \
-o output.csv \
-m winsor90 \
--bases A # Focus on A bases
modtector reactivity \
-M mod.csv \
-U unmod.csv \
-O m6A_reactivity.csv \
-s ding \
--pseudocount 0.5
Example 2: m1A Detection
# m1A-specific analysis
modtector norm \
-i input.csv \
-o output.csv \
-m winsor90 \
--bases A # Focus on A bases
modtector reactivity \
-M mod.csv \
-U unmod.csv \
-O m1A_reactivity.csv \
-s current \
--maxscore 5
Example 3: Pseudouridine Detection
# Pseudouridine-specific analysis
modtector norm \
-i input.csv \
-o output.csv \
-m winsor90 \
--bases U # Focus on U bases
modtector reactivity \
-M mod.csv \
-U unmod.csv \
-O psi_reactivity.csv \
-m siegfried
Quality Control Examples
Data Quality Assessment
# Check BAM file quality
samtools flagstat mod_sample.bam
samtools view -c mod_sample.bam
# Check pileup data quality
head -100 mod_count.csv
wc -l mod_count.csv
# Check normalization results
modtector plot \
-M mod_norm.csv \
-U unmod_norm.csv \
-o qc_plots/ \
-c 0.1 \
-d 10
Result Validation
# Compare different methods
modtector reactivity \
-M mod.csv \
-U unmod.csv \
-O reactivity_method1.csv \
-s current
modtector reactivity \
-M mod.csv \
-U unmod.csv \
-O reactivity_method2.csv \
-s ding
# Compare results
modtector compare \
--mode reactivity-results \
--reactivity1 reactivity_method1.csv \
--reactivity2 reactivity_method2.csv \
-o method_comparison.csv
Integration Examples
Snakemake Workflow
rule count:
input:
bam = "data/{sample}.bam",
fasta = "ref/reference.fa"
output:
csv = "results/count/{sample}_count.csv"
shell:
"modtector count -b {input.bam} -f {input.fasta} -o {output.csv} -t {threads}"
rule norm:
input:
csv = "results/count/{sample}_count.csv"
output:
csv = "results/norm/{sample}_norm.csv"
shell:
"modtector norm -i {input.csv} -o {output.csv} -m winsor90 --bases AC"
rule reactivity:
input:
mod = "results/norm/{sample}_mod_norm.csv",
unmod = "results/norm/{sample}_unmod_norm.csv"
output:
csv = "results/reactivity/{sample}_reactivity.csv"
shell:
"modtector reactivity -M {input.mod} -U {input.unmod} -O {output.csv}"
Nextflow Workflow
process count {
input:
file bam
file fasta
output:
file "*.csv"
script:
"""
modtector count -b ${bam} -f ${fasta} -o output.csv -t ${task.cpus}
"""
}
process norm {
input:
file csv
output:
file "*.csv"
script:
"""
modtector norm -i ${csv} -o output.csv -m winsor90 --bases AC
"""
}
These examples demonstrate various ways to use modtector for different scenarios and requirements. Choose the approach that best fits your specific needs and data characteristics.