This is the core library containing the domain logic for the ComPass application. For more information, please refer to the repository for the web interface.
For version 4, we wanted to included different types of perturbagens. Similar to the earlier versions we want to be able to express the data in a typesafe model in the Scala world. While this can be achieved in many ways using the features available in Scala for typing, compatibility with the Parquet storage format and especially Spark's ability to load and query it are crucial.
After extensive experimentation, we found out the best way to model different subtypes in Scala in line with Parquet is to create a union type that combines all subtypes into one supertype. Therefore the
Treatment type contains different slots that are just different types of treatment. There is also a
generic slot that is available for convenience purposes.
We refer to the tests for examples on how to use the data model for loading.
Accessing the fields in the model is best achieved using the available lenses. Combined lenses are available so that deep access can easily be performed.
A schematic overview of the model:
Perturbation id info information about the experiment cell batch plate well year extra profiles a list of profiles for future compabiblity pType length t p r logFc trtType trt_cp / trt_lig / ... trt treatment information trt_generic a generic representation trt_cp slot for trt_cp type perturbagens trt_lig slot for trt_lig type perturbagens trt_sh ... ctl_vector ... filters
A schematic overview of the model. Field access is best performed by means of the available
DbRow id Option[String] sampleAnnotations sample id Option[String] batch Option[String] plateid Option[String] well Option[String] protocolname Option[String] concentration Option[String] year Option[String] time Option[String] t Option[Array[Double]] p Option[Array[Double]] r Option[Array[Double]] compoundAnnotations compound jnjs Option[String] jnjb Option[String] smiles Option[String] inchikey Option[String] name Option[String] ctype Option[String] knownTargets Option[Seq[Gene]] predictedTargets Option[Seq[Gene]] filters Seq[Filter]