gecko
array-backed and predictable data manipulation library inspired by saddle and pandas.
dependency
We rely on
Name | Version |
---|---|
cats | 1.0.0-RC2 |
fs2-core | 0.10.0-M10 |
fs2-io | 0.10.0-M10 |
A DataFrame
consists of a row and column identifier, specified by FrameIndex
and a DataMatrix
which consists of
several DataVector
's.
Examples
DataFrame
Inplace construction:
import gecko.{DataFrame, DataMatrix, DataVector}
DataMatrix(
DataVector( 1, 2, 3),
DataVector( 10, 20, 30),
DataVector(100, 200, 300)
).map(DataFrame.default(_))
Or with specific row and column identifier:
import gecko.{DataFrame, DataMatrix, DataVector}
import gecko.syntax._
DataMatrix(
DataVector( 1, 2, 3),
DataVector( 10, 20, 30),
DataVector(100, 200, 300)
).map{
DataFrame(
List("A", "B", "C").toIndex,
List("X1", "X2", "X3").toIndex,
_)
}
Or with default ones:
import gecko.{DataFrame, DataMatrix, DataVector, FrameIndex}
import gecko.syntax._
DataMatrix(
DataVector( 1, 2, 3),
DataVector( 10, 20, 30),
DataVector(100, 200, 300)
).map{
DataFrame(
FrameIndex.default(3),
List("X1", "X2", "X3").toIndex,
_)
}
Read from File:
import java.nio.file.Paths
import cats.effect.IO
import gecko.csv.GeckoCSVUtil
GeckoCSVUtil
.parseFrame(fs2.io.file.readAll[IO](Paths.get("file.csv"), 4096))
.unsafeRunSync()
DataFrame
Construction
If you need a specific row / column identifier, use the apply method:
def apply[R, C, @specialized(Int, Double, Boolean, Long) A: ClassTag : EmptyPrint](
rowIx: FrameIndex[R],
colIx: FrameIndex[C],
values: DataMatrix[A]
): DataFrame[R, C, A]
Otherwise use default:
def default[@specialized(Int, Double, Boolean, Long) A: ClassTag: EmptyPrint](
arr: DataMatrix[A]
): DataFrame[Int, Int, A]
Which numerates each row / column starting from 0.
FrameIndex
Construction
Use the default:
def default(size: Int): FrameIndex[Int]
Otherwise, specify the values:
def fromSeq[@specialized(Int, Double, Boolean, Long) C: ClassTag](c: Seq[C]): FrameIndex[C]
DataVector
Construction
from sequence of values:
def apply[@specialized(Int, Double, Boolean, Long) A: ClassTag: EmptyGecko](values: A*): DataVector[A]
from array:
def fromArray[@specialized(Int, Double, Boolean, Long) A: ClassTag: EmptyGecko](
array: Array[A]
): DataVector[A]
DataMatrix
Construction
Gecko most of the time provides a safe and unsafe version (prefixed by unsafe):
from sequence of DataVector:
def apply[A](vectors: DataVector[A]*): Either[GeckoError, DataMatrix[A]]
From single array by specifing row and column size:
def fromArrayWithDim[A: ClassTag](rows: Int, cols: Int, values: Array[A]): Either[GeckoError, DataMatrix[A]]
From array or sequence of DataVector's:
def fromArray[A](vectors: Array[DataVector[A]]): Either[GeckoError, DataMatrix[A]]
def fromSeq[A](a: Seq[DataVector[A]]): Either[GeckoError, DataMatrix[A]]
Fill n - times with constant rows:
def fill[A](n: Int, elem: DataVector[A]): DataMatrix[A]
Thank You!
To my friend Jose