加入收藏 | 设为首页 | 会员中心 | 我要投稿 核心网 (https://www.hxwgxz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

基因数据处理19之BWA匹配算法串产生、匹配、评价等整体流程

发布时间:2021-05-30 00:17:27 所属栏目:大数据 来源:网络整理
导读:环境: ubunut BWA samtools wgsim bwa.kit工具 1 .数据下载:需要在bwa.kit下 bwa.kit/run-gen-ref hs38DH 2.串产生: hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ wgsim -N 1000 -1 10 hs38DH.fa span style="font-family: Arial,Helvetica,sans-serif

环境:

ubunut

BWA

samtools

wgsim

bwa.kit工具


1.数据下载:需要在bwa.kit下

bwa.kit/run-gen-ref hs38DH  

2.串产生:

hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ wgsim -N 1000 -1 10 hs38DH.fa <span style="font-family: Arial,Helvetica,sans-serif; font-size: 12px;">hs38DHSE1N10000L10F1.fq  </span> /dev/null 
将文件移到 

<pre name="code" class="plain">~/cloud/adam/xubo/data/hs38DH/test20160415

 

3.匹配+评价

hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ bwa aln ../hs38DH.fa hs38DHSE1N10000L10F1.fq >hs38DHSE1N10000L10F1.sai
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 0.10 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 9935 sequences have been processed.
[main] Version: 0.7.12-r1039
[main] CMD: bwa aln ../hs38DH.fa hs38DHSE1N10000L10F1.fq
[main] Real time: 0.195 sec; CPU: 0.119 sec
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ bwa samse ../hs38DH.fa hs38DHSE1N10000L10F1.sai hs38DHSE1N10000L10F1.fq >hs38DHSE1N10000L10F1.sam
[bwa_aln_core] convert to sequence coordinate... 0.04 sec
[bwa_aln_core] refine gapped alignments... 0.00 sec
[bwa_aln_core] print alignments... 0.02 sec
[bwa_aln_core] 9935 sequences have been processed.
[main] Version: 0.7.12-r1039
[main] CMD: bwa samse ../hs38DH.fa hs38DHSE1N10000L10F1.sai hs38DHSE1N10000L10F1.fq
[main] Real time: 0.154 sec; CPU: 0.086 sec
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ samtools flagstat hs38DHSE1N10000L10F1.sam
9935 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
9907 + 0 mapped (99.72% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ bwa aln ../hs38DH.fa hs38DHSE1N10000L
hs38DHSE1N10000L10F1.fq   hs38DHSE1N10000L10F1.sai  hs38DHSE1N10000L10F1.sam  hs38DHSE1N10000L70F1.fq   
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ bwa aln ../hs38DH.fa hs38DHSE1N10000L70F1.fq >hs38DHSE1
hs38DHSE1L10000F1.fq      hs38DHSE1N10000L10F1.fq   hs38DHSE1N10000L10F1.sai  hs38DHSE1N10000L10F1.sam  hs38DHSE1N10000L70F1.fq   
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ bwa aln ../hs38DH.fa hs38DHSE1N10000L70F1.fq >hs38DHSE1N10000L70F1.sai
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 0.72 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 9935 sequences have been processed.
[main] Version: 0.7.12-r1039
[main] CMD: bwa aln ../hs38DH.fa hs38DHSE1N10000L70F1.fq
[main] Real time: 0.779 sec; CPU: 0.734 sec
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ bwa samse ../hs38DH.fa hs38DHSE1N10000L70F1.sai hs38DHSE1N10000L70F1.fq >hs38DHSE1N10000L70F1.sam
[bwa_aln_core] convert to sequence coordinate... 0.04 sec
[bwa_aln_core] refine gapped alignments... 0.01 sec
[bwa_aln_core] print alignments... 0.03 sec
[bwa_aln_core] 9935 sequences have been processed.
[main] Version: 0.7.12-r1039
[main] CMD: bwa samse ../hs38DH.fa hs38DHSE1N10000L70F1.sai hs38DHSE1N10000L70F1.fq
[main] Real time: 0.207 sec; CPU: 0.109 sec
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH/test20160415$ samtools flagstat hs38DHSE1N10000L70F1.sam 
9935 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
9567 + 0 mapped (96.30% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
最后得出匹配的准确率在96%左右

(编辑:核心网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    热点阅读