加入收藏 | 设为首页 | 会员中心 | 我要投稿 核心网 (https://www.hxwgxz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

基因数据处理49之cloud-scale-bwamem运行成功

发布时间:2021-03-07 08:28:22 所属栏目:大数据 来源:网络整理
导读:1.先使用art生成数据: 请看前一篇 2.上传fastq到hdfs: hadoop @Master :~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem $ spark-submit -- class cs . ucla . edu . bwaspark . BWAMEMSpark -- master local [2] / home / hadoop / xubo / tools / cloud - s
副标题[/!--empirenews.page--]

1.先使用art生成数据:
请看前一篇

2.上传fastq到hdfs:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ spark-submit  --class cs.ucla.edu.bwaspark.BWAMEMSpark --master local[2] /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar upload-fastq 0 1 fastq/G38L100c1Nhs20.fastq /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq
command: upload-fastq
Map('isPairEnd -> 0,'filePartNum -> 1,'inFilePath1 -> fastq/G38L100c1Nhs20.fastq,'outFilePath -> /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq)
Upload FASTQ command line arguments: 0 1 fastq/G38L100c1Nhs20.fastq  /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq 250000
[WARNING] Avro: Invalid default for field comment: null not a "bytes"
[WARNING] Avro: Invalid default for field comment: null not a "bytes"
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Upload FASTQ to HDFS Finished!!!

3.进行align:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ spark-submit --executor-memory 2g --class cs.ucla.edu.bwaspark.BWAMEMSpark --total-executor-cores 2 --master local[2]  --conf spark.driver.host=**MasterIP** --conf spark.driver.cores=2 --conf spark.driver.maxResultSize=2g --conf spark.storage.memoryFraction=0.7  --conf spark.akka.threads=2 --conf spark.akka.frameSize=1024 /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar cs-bwamem -bfn 1 -bPSW 1 -sbatch 10 -bPSWJNI 1  -oChoice 2 -oPath hdfs://**MasterIP**:9000/xubo/11.adam -localRef 1 -isSWExtBatched 1 0 GRCH38BWAindex/GRCH38chr1L3556522.fasta /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq
command: cs-bwamem
Map('isPSWJNI -> 1,'localRef -> 1,'batchedFolderNum -> 1,'isPSWBatched -> 1,'subBatchSize -> 10,'inFASTQPath -> /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq,'inFASTAPath -> GRCH38BWAindex/GRCH38chr1L3556522.fasta,'outputPath -> hdfs://**MasterIP**:9000/xubo/11.adam,'isSWExtBatched -> 1,'isPairEnd -> 0,'outputChoice -> 2)
CS- BWAMEM command line arguments: false GRCH38BWAindex/GRCH38chr1L3556522.fasta /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq 1 true 10 true ./target/jniNative.so 2 hdfs://**MasterIP**:9000/xubo/11.adam
HDFS master: hdfs://Master:9000
Input HDFS folder number: 1
Head line: @RG  ID:foo  SM:bar
Read Group ID: foo
Load Index Files
Load BWA-MEM options
Output choice: 2
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[WARNING] Avro: Invalid default for field comment: null not a "bytes"
[WARNING] Avro: Invalid default for field comment: null not a "bytes"
[WARNING] Avro: Invalid default for field comment: null not a "bytes"
CS-BWAMEM Finished!!!
Jun 3,2016 11:32:26 AM INFO: parquet.hadoop.ParquetInputFormat: Total input paths to process : 1
Jun 3,2016 11:32:27 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext,but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jun 3,2016 11:32:27 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
Jun 3,2016 11:32:27 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Jun 3,2016 11:32:27 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 17 ms. row count = 1

MasterIP需要修改成相对应的

4.查看adam文件:
cs-bwamem提供了merge,按所给的方法没有成功。
可以使用SparkSQL直接读取:

package org.bdgenomics.avocado.cli

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf,SparkContext}
import org.bdgenomics.adam.rdd.ADAMContext._

/** * Created by xubo on 2016/5/27. * 从hdfs下载经过avocado匹配好的数据 * run:success */
object parquetRead2csbwamem {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[4]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    println("start:")
    val file = "hdfs://**MasterIp**:9000/xubo/14.adam/0"
    val df3 = sqlContext.read.option("mergeSchema","true").parquet(file)
    // df3.printSchema()
    df3.show()
    println("end")
    sc.stop
  }
}

结果:

+--------------------+---------+---------+----+--------------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+
| contig| start| end|mapq| readName| sequence| qual|cigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|firstOfPair|secondOfPair|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual| attributes|recordGroupName|recordGroupSequencingCenter|recordGroupDescription|recordGroupRunDateEpoch|recordGroupFlowOrder|recordGroupKeySequence|recordGroupLibrary|recordGroupPredictedMedianInsertSize|recordGroupPlatform|recordGroupPlatformUnit|recordGroupSample|mateAlignmentStart|mateAlignmentEnd|mateContig|
+--------------------+---------+---------+----+--------------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+
|[chr1,248956422,n...|225496693|225496793|  60|chr1-1   RG  ID:foo  ...|CATATTTACCAATTAAA...|@C@D@FFDFHHHHIJ.J...| 100M|                    0|                  0| false| false| true| false| false| false| false| false| false| false| true| false| false|               61A38| null|NM:i:1    AS:i:95 XS...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null|
+--------------------+---------+---------+----+--------------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+

end

(编辑:核心网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

热点阅读