sstadick / scivs   0.1.0

MIT License GitHub

Collection of Data Structures for working with genomic intervals

Scala versions: 2.13

Latest version

scivs

This is a library containing Classes and helpers for working with genomic intervals in Scala. Currently Lapper and ScAIList are implemented, which are best in class in their respective niche's.

scivs = SCala InterVal Stores

Lapper

This is a Scala port of the nim-lapper. It is also inspired by the rust port rust-lapper.

import scivs.scailist.ScAIList
import scivs.interval.Interval
val lapper = new Lapper((0 to 20 by 5).map(Interval(_, _ + 2, 0)).toList))
assert(lapper.find(6, 11).toList(0), Interval(5, 7, 0))

Performance Characteristics

Fantastic for 'normal' genomic data where intervals are 'short' and there isn't much nesting. Think Illumina PE reads. The seek method in particular is very fast if you know that your queries will be in order.

ScAIList

This is an implementation of the code from this paper. The major change is that the number of component parts a list is broken into is dynamic and not hardcoded.

Performance Characteristics

This datastructure is good for nested intervals where long intervals engulf many shorter intervals.

import scivs.scailist.ScAIList
import scivs.interval.Interval
val scailist = ScAIList((0 to 20 by 5).map(Interval(_, _ + 2, 0)).toList))
assert(scailist.find(6, 11).toList(0), Interval(5, 7, 0))

Todo's

  • Add benchmarks / substantiate the performance characteristics
  • Compare against other libs out there?
  • Add some of the helper methods for things like coverage etc
  • Figure out how to make my package level docs show up in javadoc