Aptus

"Aptus" is latin for suitable, appropriate, fitting. It is a utility library meant to improve the Scala experience for simple tasks, when performance isn't most important. It also helps you code defensively when representing errors in types isn't important (think assert).

Introduction

For a good introduction to the library, see my talk from the Functional Scala 2024 conference: video on YouTube

In particular the talk discusses the next exciting development for Aptus: bringing quick and simple dynamic data manipulations to the library, for instance:

 import aptus.dyn._

 "/path/to/my.tsv" // eg: name,age,occupation,pets
  .dyns
    .rename   ("occupation" ~> "job")
    .increment("age")
    .remove   ("pets")
  .write("/path/to/my.jsonl") // one JSON doc per line

Not mentioned in the talk: the ability to go to/from case classes - it wasn't implemented back then; coming soon: ability to interact with Python via ScalaPy (think pandas)

SBT

libraryDependencies += "io.github.aptusproject" %% "aptus-core" % "0.7.0"

Then import the following to test it out:

import aptus.all._

Though in general a more piecemeal approach is recommended:

import aptus.min._
  OR
package object someprojectpackage extends aptus.Minimal

Alongside some ad hoc imports where needed:

import aptus.Map_
import aptus.OutputFilePath
...

The library is available for Scala 3.4.0 and 2.13

Dependency graph

Note: gson will soon be replaced with ujson

Motivation

I created Aptus in bits over the past 10 years, as I struggled to get seemingly simple tasks done in Scala. It is not intended to be comprehensive, or particularly optimized. It should be seen more as a starting point for a project, where performance isn't most critical and compute resources aren't too limited. It can also serve as a reference, from which the basic use of underlying abstractions can be expanded upon as needed. It's also for people who enjoy Scala's type system and think types shouldn't be thrown out the window (hissing snake sound), yet don't feel the need to capture every possible error as types. Consider for instance Li Haoyi's post "Scala at Scale at Databricks", notably this passage:

Zero usage of "archetypical" Scala frameworks: Play, Akka, Scalaz, Cats, ZIO, etc.

This resonates well with aptus' goals. I like using some of the tools he mentions, but I also want to make sure I have simpler solutions at hand too.

I included all the dependencies shown in the diagram above because I find that they are required for most non-trivial projects. For instance, what application nowadays does not need to handle JSON at some point? Or parse a CSV file? Or handle a bz2 file?

Note that Aptus is heavily used in my data transformation library: Gallia, as well as most of my other projects (public and private).

Defensive coding

Let's consider stdlib's Seq's .zip and .toMap method for instance. Both will silently discard elements in some situations, and this behavior will almost never be the desired/expected one (if nothing because it may not be obvious to another maintainer). .zip for instance will truncate the longer sequence if they are not the same size. .toMap will discard entries with duplicate keys, keeping only the last one. In almost all real life situations I encountered personnally and where either situation happened, it was the result of an upstream problem: I either meant for the two collections to be the same size for .zip, and I thought I wouldn't have duplicate keys when using .toMap. As a result I create two corresponding methods in aptus, .zipSameSize and .force.map, which throw a requirement runtime error when either situation occurs. I have been using them exclusively for years now, and it has more than paid off in catching errors early.

We'll see another example of defensive coding in the next section about succinctness: Java's .split and StringOps.split can also discard elements silently.

Succinctness

A good example of succinctness is a method like splitByWholeSeparatorPreserveAllTokens from Apache Commons's StringUtils, and whose semantics feel more intuitive to me than those of Java's String.split. Meanwhile using:

"foo|bar".splitBy("|")

is a lot more convenient than using:

import org.apache.commons.lang3.StringUtils
val str = "foo|bar"
if (str.isEmpty()) List(str)
else               StringUtils.splitByWholeSeparatorPreserveAllTokens(str, "|").toList

It should be noted that both Java's String.split and the stdlib's StringOps.split have the very unintuitive behavior of not reporting trailing elements when empty, for instance:

println("1,2,3,,".split(',').toList) // List(1, 2, 3)

I try to illustrate such differences in succinctness/consistency/defensiveness of behavior throughout the examples below.

Practicality

Another aspect of Aptus is practicality, for instance I often find myself using expressions such as:

"foo=3"
  .splitBy("=")
  .force.tuple2
  .mapSecond(_.toInt)

The stdlib's counterpart would look something like:

"foo=3"
  .split('=')
   match { case Array(x, y: String) =>
     (x, y.toInt) }

Which I argue is harder to read/write and less obvious to understand (albeit not a lot more verbose).

Examples

In-line assertions

Note: .ensuring from the stdlib does not offer a way to manipulate the value in the error message

"hello".ensuring(_.size <= 5)                   .toUpperCase.p // prints "HELLO" - stdlib

"hello".assert (_.size <= 5)                    .toUpperCase.p // prints "HELLO"
"hello".assert (_.size <= 5, x => s"value=${x}").toUpperCase.p // prints "HELLO" - can't do that with `ensuring()`
"hello".require(_.size <= 5)                    .toUpperCase.p // prints "HELLO"
"hello".require(_.size <= 5, x => s"value=${x}").toUpperCase.p // prints "HELLO"

// these throw AssertionError
"hello".assert (_.size >  5)                    .toUpperCase.p 
"hello".assert (_.size >  5, x => s"value=${x}").toUpperCase.p // "assertion failed: value=hello"

Convenient for chaining, consider the pure stdlib alternative:

{
  import util.chaining._
  
  "hello"
    .ensuring(_.startsWith("h"))
    .toUpperCase
    .pipe(println)
}

In-line printing

E.g. for quick debugging:

"hello".prt               // prints: "hello"
"hello".p                 // prints: "hello"
"hello".p.toUpperCase.p   // prints: "hello", then "HELLO"

"hello".inspect(_.size).p // prints: "5", then "hello"
"hello".i      (_.size).p // prints: "5", then "hello"

1.toString.p // prints "1"
1.str     .p // prints "1"

"hello".p__          // prints   "hello"   and exits program (code 0)
"hello".i__(_.quote) // prints "\"hello\"" and exits program (code 0)

String operations

"hello". append(" you!")  .p // prints "hello you!"
"hello".prepend("well, ") .p // prints "well, hello"

"hello". appendedAll(" you!")  .p // prints "hello you!"  - stdlib
"hello".prependedAll("well, ") .p // prints "well, hello" - stdlib

"hello".colon             .p // prints "hello:"
"hello".tab               .p // prints "hello<TAB>"
"hello".newline           .p // prints "hello<new-line>"

"hello".colon  ("human")  .p // prints "hello:human"
"hello".tab    ("human")  .p // prints "hello<TAB>human"
"hello".newline("human")  .p // prints "hello<new-line>human"

"hello".quote             .p // prints "\"hello\""

"hello|world"  .splitBy("|").p // prints Seq(hello, world)
"hello|world||".splitBy("|").p // prints Seq(hello, world, , ) - won't unexpectely ignore empty trailing elements

"a\tb\tc".splitXsv('\t') // uses commons-csv under the hood to properly handle the split (eg escaping, ...)

"hello".padLeft (8, ' ').p // "   hello"
"hello".padRight(8, ' ').p // "hello   "
1.str  .padLeft (3, '0').p // "001"
1.str  .padRight(3, '0').p // "100"

"mykey".   contains("my").p // stdlib
"mykey".notContains("MY").p // negative counterpart

// .. many more, see String_, for instance:
// - strip{Prefix,Suffix}{Guaranteed,IfApplicable}
// - remove{Guaranteed,IfApplicable}
// - toBase64
// ...

Note: see corresponding tests

Number operations

3.1416.add       (1).p // 4.1416
3.1416.multiplyBy(2).p // 6.2832
...
3.1416.isInBetween(fromInclusive = 3.0, toExclusive: 4.0).p // true
// likewise for Int and Long

For Double:

3.1416     .maxDecimals   (2).p // 3.14 - still a Double (unlike formats below)

3.1416     .formatDecimals(2).p // 3.14
3.1416.exp .formatDecimals(4).p // 23.1409
3.1416.log2.formatDecimals(4).p // 1.6515

Personally, I always have to look up printf's "% notation" before using it, so a method like formatDecimals make things a lot easier.

Aptus also helps with collections of numbers:

Seq(3, 2, 1).mean  .p // 2.0
Seq(3, 2, 1).minMax.p // (1, 3)
// ... more: median, stdev, range, IQR, ... (see aptus.Seq_)

Time operations

"2023-06-05".parseLocalDate.getYear.p // 2023

// also available:
//   parseLocalDateTime, parseLocalTime, parseInstant, parseOffsetDateTime and parseZonedDateTime
// and
//   parseLocalDateTime(pattern), ...

Conditional piping (a.k.a conditional "thrush")

"hello"  .pipeIf(_.size <= 5)(_.toUpperCase).p // prints "HELLO"
"bonjour".pipeIf(_.size <= 5)(_.toUpperCase).p // prints unchanged

3.pipeIf(_ % 2 == 0)(_ + 1).p // prints 3 (unchanged)
4.pipeIf(_ % 2 == 0)(_ + 1).p // prints 5

val suffixOpt = Some("?")
"hello".pipeOpt(suffixOpt)(suffix => _ + suffix).p // prints "hello?"
"hello".pipeOpt(None)     (suffix => _ + suffix).p // prints unchanged

See discussion on Scala Users.

There also is also a mapIf counterpart:

Seq(1, 2, 3).mapIf(true) (_ + 1).p // List(2, 3, 4)
Seq(1, 2, 3).mapIf(_ < 2)(_ + 1).p // List(2, 2, 3)

In-line "to Option"

"hello"  .in.someIf(_.size <= 5).p // prints Some("hello")
"bonjour".in.someIf(_.size <= 5).p // prints None

"hello"  .in.noneIf(_.size <= 5).p // prints None
"bonjour".in.noneIf(_.size <= 5).p // prints Some("bonjour")

// note: can also use shorthands: inNoneIf/inSomeIf

Convenient for chaining, consider the pure stdlib alternative:

{
  val str = "hello"
  val opt = if (str.size <= 5) Some(str) else None
  println(opt)
}

Notes:

Option.when could also be used, but the test part isn't a predicate on the element (which would be much better).
Someone on the scala user list also pointed out this alternative: Some("hello").filter(_.size <= 5). While clever, I'd argue the semantics are much less obvious than "hello".in.someIf(_.size <= 5).

"force" disambiguator (Option/Map)

.get is polysemic in the standard library, sometimes "attempting" to get the result as with Map (returns Option[T]), sometimes "forcing" it as with Option (returns T)

aptus' .force conveys semantics unambiguously:

val myOpt = Some("foo")
val myMap = Map("bar" -> "foo")

myOpt.force       .p // prints "foo"
myMap.force("bar").p // prints "foo"

// versus stdlib way:
myOpt.get       .p // prints      "foo"  -> forcing 
myMap.get("bar").p // prints Some("foo") -> attempting

More forcing

Seq(1)      .force.one     .p // 1
Seq(1)      .force.option  .p // Some(1)
Seq( )      .force.option  .p // None
Seq(1, 2, 3).force.distinct.p // Seq(1, 2, 3)
Seq(1, 2, 3).force.set     .p // Set(1, 2, 3)

val (first, second)        = Seq("foo", "bar")       .force.tuple2
val (first, second, third) = Seq("foo", "bar", "baz").force.tuple3
// ... and so on up to 10

But:

Seq(1, 2)   .force.one      // runtime error
Seq(1, 2)   .force.option   // runtime error
Seq(1, 2, 1).force.distinct // runtime error
Seq(1, 2, 1).force.set      // runtime error
Seq(1, 2, 3).force.tuple2   // runtime error
... and so on

The .force.one mechanism is one of the most useful operations, and a much safer bet than simply doing .head.

Help with Options

To optional:

   (None   , Some(2))         .toOptionalTuple.p // None
   (Some(1), None   )         .toOptionalTuple.p // None   
   (Some(1), Some(2))         .toOptionalTuple.p // Some((1, 2))

Seq(None,    None,    None)   .toOptionalSeq  .p // None
Seq(Some(1), Some(2), None)   .toOptionalSeq  .p // None
Seq(Some(1), Some(2), Some(3)).toOptionalSeq  .p // Some(Seq(1, 2, 3))

Swapping:

// parameter for .swap is by-name
Some("foo").swap("bar").p // None
None       .swap("bar").p // Some("bar")

Help with Sequences

Quick sequence formatting:

Seq(1, 2, 3). @@.p //    [1, 2, 3]
Seq(1, 2, 3).#@@.p // #3:[1, 2, 3]

Seq(1, 2, 3).joinln   // one per line
Seq(1, 2, 3).joinlnln // one per line every other line

Seq(1, 2, 3).joinln.sectionAllOff("data:") // or equivalently below
Seq(1, 2, 3).section             ("data:") // returns:
/*
  data:
      1
      2
      3
*/

Aptus also provides help with sorting for common cases, for instance:

Seq(
    Seq("d", "e", "f"),
    Seq("g", "h", "i"),
    Seq("a", "b", "c"))
  .sorted(aptus.seqOrdering[String])
/*
returns:

Seq(
    Seq("a", "b", "c"),
    Seq("d", "e", "f"),
    Seq("g", "h", "i") )))
*/

Zip operations

Most of the time, we want to zip collections of same size, and we want to code it defensively:

Seq(1, 2, 3).zipSameSize(Seq(4, 5, 6)).p // Seq((1,4), (2,5), (3,6))
Seq(1, 2, 3).zipSameSize(Seq(4, 5))   .p // runtime error

Ask yourselves: what are legitimate use cases where we zip two collections of different size and are perfectly happy to have the longuest silently truncated?

Other useful zip-related operations are:

Seq("a", "b", "c").zipWithIsFirst.map { case (x, first /* for "a" here */) => if (first) ... else ... }
Seq("a", "b", "c").zipWithIsLast .map { case (x, last  /* for "c" here */) => if (last)  ... else ... }

Seq("a", "b", "c").zipWithIndex.p // List((a,0), (b,1), (c,2))
Seq("a", "b", "c").zipWithRank .p // List((a,1), (b,2), (c,3))

Splitting at head/last:

Seq(1, 2, 3).splitAtHead.p // (1,Seq(2, 3))
Seq(1, 2, 3).splitAtLast.p // (Seq(1, 2),3)

Contained:

1.   containedIn(Seq(1, 2, 3)).p // true
1.notContainedIn(Seq(1, 2, 3)).p // false 
// also available for Set

Note: Why not use "contains" from the stdlib instead? Consider the following situation:

val ref = Seq("2", "4", "6")
Seq(1, 2, 3).map(ref.contains(_.toString))      // cannot do that
Seq(1, 2, 3).map(x => ref.contains(x.toString)) // we need an intermediate
Seq(1, 2, 3).map(_.toString.containedIn(ref))   // unless using containedIn

Ordering sequences of sequences (size prevails):

implicit val ord: Ordering[Seq[Int]] = aptus.seqOrdering
Seq(Seq(4, 5, 6), Seq(1, 2, 3)).sorted.p // Seq(Seq(1, 2, 3), Seq(4, 5, 6))
Seq(Seq(4, 5, 6), Seq(1, 2   )).sorted.p // Seq(Seq(1, 2)   , Seq(4, 5, 6))
Seq(Seq(4, 5)   , Seq(1, 2, 3)).sorted.p // Seq(Seq(4, 5)   , Seq(1, 2, 3))

Note: List vs Seq, see discussion on Scala Users.

Help with Maps

Most of the time, we do not want duplicates to be silently discarded:

// is this what we wanted?
Seq(1 -> "a", 2 -> "b", 2 -> "c").toMap    .p // Map(1 -> "a", 2 -> "c")

// likely not
Seq(1 -> "a", 2 -> "b", 2 -> "c").force.map.p // runtime error
Seq(1 -> "a", 2 -> "b")          .force.map.p // Map(1 -> "a", 2 -> "b")

Associate left/right:

Seq("foo", "bar")                                    .force.mapLeft(_.toUpperCase).p
Seq("foo", "bar").map(_.associateLeft(_.toUpperCase)).force.map.p
  // returns: Map("FOO" -> "foo", "BAR" -> "bar")

Seq("foo", "bar")                                    .force.mapRight(_.size).p
Seq("foo", "bar").map(_.associateRight(_.size)).force.map.p
  // returns: Map("foo" -> 3, "bar" -> 3)

Group by key:

Seq("foo" -> 1, "bar" -> 2, "foo" -> 3).groupByKey.p
  // returns: Map(bar -> List(2), foo -> List(1, 3))
  
// if original order must be preserved:
Seq("bar" -> 2, "foo" -> 1, "foo" -> 3).groupByKeyWithListMap.p
  // returns: ListMap(bar -> List(2), foo -> List(1, 3))

Count by key:

Seq("foo" -> 1, "bar" -> 2, "foo" -> 3).countByKey.p
  // returns: List((2,foo), (1,bar))

Count by self:

Seq("a", "b", "a", "c").countBySelf.p
  // returns: Seq(("a", 2), ("b", 1), ("c", 1)))
  // note: ordered by DESC

Help with Tuples

From import aptus.Tuple{2,3,4,5}_

(1, 2).toSeq.p // Seq(1, 2)

(1, 2).mapFirst (_ + 1) // (2, 2)
(1, 2).mapSecond(_ + 1) // (1, 3)

(1, 2, 3).mapThird(_ + 1) // (1, 2, 4)

Wrapping

"foo".in.some .p // Some("foo")
"foo".in.seq  .p // Seq ("foo")
"foo".in.list .p // List("foo")
"foo".in.left .p // Left("foo")
"foo".in.right.p // Right("foo")
// also see in.someIf/in.noneIf above

Sliding pairs

Seq[Int]()             .slidingPairs // Seq()
Seq     (1)            .slidingPairs // Seq()
Seq     (1, 2, 3, 4, 5).slidingPairs // Seq((1, 2), (2, 3), (3, 4), (4, 5))

Seq(1, 2, 3).slidingPairsWithPrevious.p // List((None,1), (Some(1),2), (Some(2),3))
Seq(1, 2, 3).slidingPairsWithNext    .p // List((1,Some(2)), (2,Some(3)), (3,None))

consider the pure stdlib alternative:

Seq(1, 2, 3, 4, 5)
  .sliding(2)
  .map { x =>
    (x(0), x(1)) }
  .toSeq

Closing resources

Aptus' Closeabled boils down to:

class Closeabled[T](underlying: T, cls: Closeable) extends Closeable

Convenient for instance when you don't want to manage pairs of Iterator/Closeable, e.g.:

// let's write lines
Seq("hello", "world").writeFileLines("/tmp/lines")

// and stream them back
val myCloseabled: SelfClosingIterator[String] =
  "/tmp/lines".streamFileLines()

// for instance, we can consume the content (will automatically close)
myCloseabled                   .consume(_.toList).p // as is
<XOR>
myCloseabled.map(_.map(_.size)).consume(_.toList).p // line pre-processing

Orphan methods

We call some method directly from the aptus package object if no natural parent can be used.

aptus.fs.homeDirectoryPath().p // "/home/tony"
aptus.hardware.totalMemory().p // 1011351552
aptus.random.uuidString()   .p // a1bffc1e-72aa-477e-ac84-e4133ffcafad
aptus.time.stamp().p           // 240224152753

aptus.illegalState   ("freeze!") // Exception in thread "main" IllegalStateException: freeze!
aptus.illegalArgument("freeze!") // Exception in thread "main" IllegalArgumentException: freeze!

aptus.reflect.formatStackTrace().p // returns:
/*
  java.lang.Throwable
      at aptus.aptmisc.Reflect$.formatStackTrace(Misc.scala:62)
      ...
      <where you are in your code>
*/

// ... (see more in aptus.AptusAliases)

Conveying intent

These are often used to save/homogenize comments.

Sometimes we want to convey that a sequence cannot be reordered without consequences, think of it as built-in comment

@ordermatters val mySeq(MostImportant, SecondMostImportant, ...)

An annotation is favored over a type alias here so that it can be applied to other code areas than sequences.

The following are just aliases, cheap replacements for NonEmptyList-like alternatives:

val      values: Nes[Int] =      Seq(1, 2, 3)
val maybeValues: Pes[Int] = Some(Seq(1, 2, 3))

Note: Value classes don't accept require statements

IO

Plain files:

"hello world".writeFileContent("/tmp/content")
"/tmp/content".readFileContent().p // prints: "hello world"

Seq("hello", "world").writeFileLines("/tmp/lines")
"/tmp/lines".readFileLines().p // prints: Seq("hello", "world")

Compressed files:

"hello world".writeFileContent("/tmp/content.gz")
"/tmp/content.gz".readFileContent().p // prints: "hello world" 

Seq("hello", "world").writeFileLines("/tmp/lines.gz")
"/tmp/lines.gz".readFileLines().p // prints: Seq("hello", "world")

// note: file -i /tmp/content.gz" shows it's indeed application/gzip

"/data/bigfile.gz".streamFileLines() // returns a SelfClosingIterator[String], which closes itself once all lines have been seen

JSON:

A special note about JSON, owing to its ubiquity (and despite its many flaws). While Gallia is my main project pertaining to data in general (especially transformation thereof), I included a minimal set of functionality in Aptus:

""" {"foo": 1} """.jsonObject // returns a com.google.gson.JsonObject
"""[{"foo": 1}]""".jsonArray  // returns a com.google.gson.JsonArray

"""{"foo": 1, "bar": true}""".prettyJson.p // .compactJson is also available
/*
{
  "foo": 1,
  "bar": true
}
*/

In the future, a subset of Gallia will be created, which will basically offer a similar set of operations but without any concern for the underlying schema: gallia-dyn. It will offer a convenient way to perform "dynamic" transformations, and therefore handle JSON. Once ready, a subset of gallia-dyn` will likely be included in Aptus for convenience, so that simple manipulations such as these will be possible OOTB:

"""{"foo": "hello", "bar": 2, "baz": true}"""
  .readObj
    .toUpperCase("foo")
    .increment  ("bar")
    .drop       ("baz")
  .printCompactJson()
  // """{"foo": "HELLO", "bar": 3}"""

URLs:

val TestResources =
  "https://raw.githubusercontent.com/aptusproject/aptus-core/6f4acbc/src/test/resources"

s"${TestResources}/content".readUrlContent() // prints "hello word"
s"${TestResources}/lines"  .readUrlLines().p // prints: Seq("hello", "world")