Tools to handle reads sequenced with unique molecular identifiers (UMIs).
Tools to handle reads sequenced with unique molecular identifiers (UMIs).
Incorporate the UMI into the read name in order to later identify while processing mapped reads.
umitools trim --end 5 unprocessed_fastq NNNNNV > out.fq
For any given start site, save only one read per UMI. Writes bed3+ to stdout with before and after counts per start.
umitools rmdup unprocessed.bam out.bam > before_after.bed
Specifying --mismatches
will, for a given start site, merge all UMIs within that
edit distance into a single unique hit. For example, if a new UMI is within a single
mismatch of any existing observed UMIs for a start position, it will be merged and
considered a duplicate. The mismatch can occur at any position, regardless of the
IUPAC sequence you're using.
pip install pysam editdist