A Scala library extending ADAM and BDG-formats to load .vcf files annotated with SnpEff.
[WARNING: this library is still under heavy development. Expect versions to break compatibility.]
Artifacts are published to Bintray.
resolvers += "bintray-tmoerman" at "http://dl.bintray.com/tmoerman/maven"`
libraryDependencies += "org.tmoerman" %% "adam-fx" % "0.5.5"
:remote-repo bintray-tmoerman % default % http://dl.bintray.com/tmoerman/maven % maven
:dp org.tmoerman % adam-fx_2.10 % 0.5.5
%dep
z.addRepo("bintray-tmoerman").url("http://dl.bintray.com/tmoerman/maven")
z.load("org.tmoerman:adam-fx_2.10:0.5.5")
The AnnotatedVariant
and AnnotatedGenotype
classes are the "connector" types between the Adam types and the SnpEffAnnotations.
Class diagrams distilled from the Java classes generated from the Avro schema definition.
Adam-fx has its own KryoRegistrator
that extends the ADAMKryoRegistrator
with additional Avro data types. Use it
when initializing a SparkConf
.
val conf = new SparkConf()
.setAppName("Test")
.setMaster("local[*]")
.set("spark.kryo.registrator", "org.tmoerman.adam.fx.serialization.AdamFxKryoRegistrator")
.set("spark.kryo.referenceTracking", "true")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
val sc = new SparkContext(conf)
Instantiate a SnpEffContext
, passing it a SparkContext
.
In a notebook setting you may want to use the @transient
annotation in order to prevent serialization issues.
import org.tmoerman.adam.fx.snpeff.SnpEffContext
@transient val ec = new SnpEffContext(sc)
Or you could simply import the implicit conversions and use an (already instantiated) SparkContext reference.
import org.tmoerman.adam.fx.snpeff.SnpEffContext._
Loading Variants with SnpEffAnnotations:
val annotatedVariants: RDD[AnnotatedVariant] = sc.loadAnnotatedVariants(annotatedVcf)
Or Genotypes with SnpEffAnnotations:
val annotatedGenotypes: RDD[AnnotatedGenotype] = sc.loadAnnotatedGenotypes(annotatedVcf)