gecko

Build Status Download

array-backed and predictable data manipulation library inspired by saddle and pandas.

dependency

We rely on

Name Version
cats 1.0.0-RC2
fs2-core 0.10.0-M10
fs2-io 0.10.0-M10

A DataFrame consists of a row and column identifier, specified by FrameIndex and a DataMatrix which consists of several DataVector's.

Examples

DataFrame

Inplace construction:

import gecko.{DataFrame, DataMatrix, DataVector}

DataMatrix(
  DataVector(  1,   2,   3),
  DataVector( 10,  20,  30),
  DataVector(100, 200, 300)
).map(DataFrame.default(_))

Or with specific row and column identifier:

import gecko.{DataFrame, DataMatrix, DataVector}
import gecko.syntax._

DataMatrix(
  DataVector(  1,   2,   3),
  DataVector( 10,  20,  30),
  DataVector(100, 200, 300)
).map{
  DataFrame(
    List("A", "B", "C").toIndex,
    List("X1", "X2", "X3").toIndex,
    _)
}

Or with default ones:

import gecko.{DataFrame, DataMatrix, DataVector, FrameIndex}
import gecko.syntax._

DataMatrix(
  DataVector(  1,   2,   3),
  DataVector( 10,  20,  30),
  DataVector(100, 200, 300)
).map{
  DataFrame(
    FrameIndex.default(3),
    List("X1", "X2", "X3").toIndex,
    _)
}

Read from File:

import java.nio.file.Paths

import cats.effect.IO
import gecko.csv.GeckoCSVUtil

GeckoCSVUtil
  .parseFrame(fs2.io.file.readAll[IO](Paths.get("file.csv"), 4096))
  .unsafeRunSync()

DataFrame

Construction

If you need a specific row / column identifier, use the apply method:

  def apply[R, C, @specialized(Int, Double, Boolean, Long) A: ClassTag : EmptyPrint](
      rowIx: FrameIndex[R],
      colIx: FrameIndex[C],
      values: DataMatrix[A]
  ): DataFrame[R, C, A]

Otherwise use default:

def default[@specialized(Int, Double, Boolean, Long) A: ClassTag: EmptyPrint](
      arr: DataMatrix[A]
  ): DataFrame[Int, Int, A]

Which numerates each row / column starting from 0.

FrameIndex

Construction

Use the default:

def default(size: Int): FrameIndex[Int]

Otherwise, specify the values:

def fromSeq[@specialized(Int, Double, Boolean, Long) C: ClassTag](c: Seq[C]): FrameIndex[C]

DataVector

Construction

from sequence of values:

 def apply[@specialized(Int, Double, Boolean, Long) A: ClassTag: EmptyGecko](values: A*): DataVector[A]

from array:

def fromArray[@specialized(Int, Double, Boolean, Long) A: ClassTag: EmptyGecko](
      array: Array[A]
  ): DataVector[A]

DataMatrix

Construction

Gecko most of the time provides a safe and unsafe version (prefixed by unsafe):

from sequence of DataVector:

def apply[A](vectors: DataVector[A]*): Either[GeckoError, DataMatrix[A]]

From single array by specifing row and column size:

def fromArrayWithDim[A: ClassTag](rows: Int, cols: Int, values: Array[A]): Either[GeckoError, DataMatrix[A]]

From array or sequence of DataVector's:

def fromArray[A](vectors: Array[DataVector[A]]): Either[GeckoError, DataMatrix[A]]
def fromSeq[A](a: Seq[DataVector[A]]): Either[GeckoError, DataMatrix[A]]

Fill n - times with constant rows:

def fill[A](n: Int, elem: DataVector[A]): DataMatrix[A]

Thank You!

To my friend Jose