Results
This pipeline outlines the key stages for analyzing all-to-all interactome sequencing data, focusing on the steps that lead to the final results of RNA-DNA contact pairs, RNA annotation, and significant peaks of chromatin-interacting RNAs.
1. Input and Preprocessing
Read input data (FASTQ files) and validate sample information
Perform quality control with FASTQC
2. Deduplication (optional)
Remove duplicate reads using tools like:
fastq-dupaway
fastuniq
clumpify
3. Trimming
Trim low-quality bases and adapters using tools like:
fastp
Trimmomatic
BBduk
cutadapt
4. Bridge Processing (for specific experiment types)
For experiments like GRID-seq, RADICL-seq, iMARGI, etc., process the bridge sequences that connect RNA and DNA parts
Use tools like BBMerge or PEAR to merge paired-end reads
Separate RNA and DNA parts based on the bridge sequence
5. Alignment
Align RNA and DNA reads to the reference genome using tools like:
HISAT2
STAR
Bowtie2
6. Post-alignment Processing
Filter aligned reads for uniqueness and mismatches
Convert BAM files to BED format
7. Contact Generation
Join RNA and DNA parts to create raw contacts
Perform strand detection and correction
8. CIGAR Filtering (optional)
Filter contacts based on CIGAR strings to improve quality
9. Merging Replicates
Combine data from replicate experiments
10. Chromosome Splitting (optional)
Split data by chromosomes for parallel processing
11. Annotation and Voting
Annotate RNA parts of contacts using reference annotation
Perform voting to resolve conflicting annotations
12. Background Model Generation
Create a background model for normalization
13. Normalization
Normalize raw contacts using the background model
Perform additional normalization steps (N2, scaling)
14. Peak Calling (for One-to-All experiments)
Use MACS2 to call significant peaks of chromatin-interacting RNAs
15. Statistics and Visualization
Generate statistics at various stages of the pipeline
Create plots and visualizations of the results
16. MultiQC Report
Compile a comprehensive quality control report using MultiQC
Main Results
The main results of this pipeline are:
Pairs of RNA and DNA contacts, stored in tab-separated files
Annotation of the RNA parts of the contacts
Significant peaks of chromatin-interacting RNAs (for One-to-All experiments)
Various statistics and quality control metrics throughout the process
Note: This pipeline is flexible and can handle different types of all-to-all interactome sequencing data, with options to customize the workflow based on the specific experiment type and analysis requirements.