#解压 $ tar -jxvf bwa-*.tar.bz2 $ cd bwa-*; # 编译BWA $ make $ echo'PATH=$PATH:/path/bwa--*' >> ~/.bashrc $ source ~/.bashrc
git
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
git clone https://github.com/lh3/bwa.git cd bwa make ##command Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ
构建index
1 2 3
bwa index [ –p prefix ] [ –a algoType ] <in.db.fasta> # 根据reference genome data(e.g. ref.fa) 建立 Index File: bwa index ref.fa -p genome # 可以不加-p genome,这样建立索引都是以ref.fa为前缀
Program: samtools (Tools for alignments in the SAM format) Version: 1.9 (using htslib 1.9)
Usage: samtools <command> [options]
Commands: -- Indexing dict create a sequence dictionary file faidx index/extract FASTA fqidx index/extract FASTQ index index alignment
-- Editing calmd recalculate MD/NM tags and '=' bases fixmate fix mate information reheader replace BAM header targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates
-- File operations collate shuffle and group alignments by name cat concatenate BAMs merge merge sorted alignments mpileup multi-way pileup sort sort alignment file split splits a file by read group quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA
-- Statistics bedcov read depth per BED region depth compute the depth flagstat simple stats idxstats BAM index stats phase phase heterozygotes stats generate stats (former bamcheck)
-- Viewing flags explain BAM flags tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM
-b output BAM # 该参数设置输出 BAM 格式,默认下输出是 SAM 格式文件 -h print header for the SAM output # 默认下输出的 sam 格式文件不带 header,该参数设定输出sam文件时带 header 信息 -H print SAM header only (no alignments) # 仅仅输出文件的头文件 -S input is SAM # 默认下输入是 BAM 文件,若是输入是 SAM 文件,则最好加该参数,否则有时候会报错。 -u uncompressed BAM output (force -b) # 该参数的使用需要有-b参数,能节约时间,但是需要更多磁盘空间。 -c print only the count of matching records # 仅输出匹配的统计记录 -L FILE only include reads overlapping this BED FILE [null] # 仅包括和bed文件存在overlap的reads -o FILE output file name [stdout] # 输出文件的名称 -F INT only include reads with none of the FLAGS in INT present [0] # 过滤flag,仅输出指定FLAG值的序列 -q INT only include reads with mapping quality >= INT [0] # 比对的最低质量值,一般认为20就为unique比对了,可以结合上述-bF参数使用使用提取特定的比对结果 -@ Number of additional threads to use [0] # 指使用的线程数
Options: -n Input files are sorted by read name # 输入文件是经过sort -n的 -t TAG Input files are sorted by TAG value # 输入文件是经过sort -t的 -r Attach RG tag (inferred from file names) # 添加上RG标签 -u Uncompressed BAM output # 输出未压缩的bam -f Overwrite the output BAM if exist # 覆盖已经存在的bam -1 Compress level 1 # 1倍压缩 -l INT Compression level, from 0 to 9 [-1] # 指定压缩倍数 -R STR Merge file in the specified region STR [all] -h FILE Copy the header in FILE to <out.bam> [in1.bam]