"Aptus" is latin for suitable, appropriate, fitting. It is a utility library meant to improve the Scala experience for simple tasks,
when performance isn't most important. It also helps you code defensively when representing errors in types isn't important (think assert
).
libraryDependencies += "io.github.aptusproject" %% "aptus-core" % "0.7.0"
Then import the following to test it out:
import aptus.all._
Though in general a more piecemeal approach is recommended:
import aptus.min._
OR
package object someprojectpackage extends aptus.Minimal
Alongside some ad hoc imports where needed:
import aptus.Map_
import aptus.OutputFilePath
...
The library is available for Scala 3.4.0 and 2.13
Note: gson will soon be replaced with ujson
I created Aptus in bits over the past 10 years, as I struggled to get seemingly simple tasks done in Scala. It is not intended to be comprehensive, or particularly optimized. It should be seen more as a starting point for a project, where performance isn't most critical and compute resources aren't too limited. It can also serve as a reference, from which the basic use of underlying abstractions can be expanded upon as needed. It's also for people who enjoy Scala's type system and think types shouldn't be thrown out the window (hissing snake sound), yet don't feel the need to capture every possible error as types. Consider for instance Li Haoyi's post "Scala at Scale at Databricks", notably this passage:
Zero usage of "archetypical" Scala frameworks: Play, Akka, Scalaz, Cats, ZIO, etc.
This resonates well with aptus' goals. I like using some of the tools he mentions, but I also want to make sure I have simpler solutions at hand too.
I included all the dependencies shown in the diagram above because I find that they are required for most non-trivial projects. For instance, what application nowadays does not need to handle JSON at some point? Or parse a CSV file? Or handle a bz2 file?
Note that Aptus is heavily used in my data transformation library: Gallia, as well as most of my other projects (public and private).
Let's consider stdlib's Seq's .zip
and .toMap
method for instance. Both will silently discard elements in some situations, and this behavior will almost never be the desired/expected one
(if nothing because it may not be obvious to another maintainer).
.zip
for instance will truncate the longer sequence if they are not the same size.
.toMap
will discard entries with duplicate keys, keeping only the last one.
In almost all real life situations I encountered personnally and where either situation happened, it was the result of an upstream problem: I either meant for the two collections to be the same size for .zip
,
and I thought I wouldn't have duplicate keys when using .toMap
.
As a result I create two corresponding methods in aptus, .zipSameSize
and .force.map
, which throw a requirement runtime error when either situation occurs.
I have been using them exclusively for years now, and it has more than paid off in catching errors early.
We'll see another example of defensive coding in the next section about succinctness: Java's .split
and StringOps.split
can also discard elements silently.
A good example of succinctness is a method like splitByWholeSeparatorPreserveAllTokens
from Apache Commons's StringUtils
,
and whose semantics feel more intuitive to me than those of Java's String.split
.
Meanwhile using:
"foo|bar".splitBy("|")
is a lot more convenient than using:
import org.apache.commons.lang3.StringUtils
val str = "foo|bar"
if (str.isEmpty()) List(str)
else StringUtils.splitByWholeSeparatorPreserveAllTokens(str, "|").toList
It should be noted that both Java's String.split
and the stdlib's StringOps.split
have the very unintuitive behavior of not reporting trailing elements when empty, for instance:
println("1,2,3,,".split(',').toList) // List(1, 2, 3)
I try to illustrate such differences in succinctness/consistency/defensiveness of behavior throughout the examples below.
Another aspect of Aptus is practicality, for instance I often find myself using expressions such as:
"foo=3"
.splitBy("=")
.force.tuple2
.mapSecond(_.toInt)
The stdlib's counterpart would look something like:
"foo=3"
.split('=')
match { case Array(x, y: String) =>
(x, y.toInt) }
Which I argue is harder to read/write and less obvious to understand (albeit not a lot more verbose).
Note: .ensuring from the stdlib does not offer a way to manipulate the value in the error message
"hello".ensuring(_.size <= 5) .toUpperCase.p // prints "HELLO" - stdlib
"hello".assert (_.size <= 5) .toUpperCase.p // prints "HELLO"
"hello".assert (_.size <= 5, x => s"value=${x}").toUpperCase.p // prints "HELLO" - can't do that with `ensuring()`
"hello".require(_.size <= 5) .toUpperCase.p // prints "HELLO"
"hello".require(_.size <= 5, x => s"value=${x}").toUpperCase.p // prints "HELLO"
// these throw AssertionError
"hello".assert (_.size > 5) .toUpperCase.p
"hello".assert (_.size > 5, x => s"value=${x}").toUpperCase.p // "assertion failed: value=hello"
Convenient for chaining, consider the pure stdlib alternative:
{
import util.chaining._
"hello"
.ensuring(_.startsWith("h"))
.toUpperCase
.pipe(println)
}
E.g. for quick debugging:
"hello".prt // prints: "hello"
"hello".p // prints: "hello"
"hello".p.toUpperCase.p // prints: "hello", then "HELLO"
"hello".inspect(_.size).p // prints: "5", then "hello"
"hello".i (_.size).p // prints: "5", then "hello"
1.toString.p // prints "1"
1.str .p // prints "1"
"hello".p__ // prints "hello" and exits program (code 0)
"hello".i__(_.quote) // prints "\"hello\"" and exits program (code 0)
"hello". append(" you!") .p // prints "hello you!"
"hello".prepend("well, ") .p // prints "well, hello"
"hello". appendedAll(" you!") .p // prints "hello you!" - stdlib
"hello".prependedAll("well, ") .p // prints "well, hello" - stdlib
"hello".colon .p // prints "hello:"
"hello".tab .p // prints "hello<TAB>"
"hello".newline .p // prints "hello<new-line>"
"hello".colon ("human") .p // prints "hello:human"
"hello".tab ("human") .p // prints "hello<TAB>human"
"hello".newline("human") .p // prints "hello<new-line>human"
"hello".quote .p // prints "\"hello\""
"hello|world" .splitBy("|").p // prints Seq(hello, world)
"hello|world||".splitBy("|").p // prints Seq(hello, world, , ) - won't unexpectely ignore empty trailing elements
"a\tb\tc".splitXsv('\t') // uses commons-csv under the hood to properly handle the split (eg escaping, ...)
"hello".padLeft (8, ' ').p // " hello"
"hello".padRight(8, ' ').p // "hello "
1.str .padLeft (3, '0').p // "001"
1.str .padRight(3, '0').p // "100"
"mykey". contains("my").p // stdlib
"mykey".notContains("MY").p // negative counterpart
// .. many more, see String_, for instance:
// - strip{Prefix,Suffix}{Guaranteed,IfApplicable}
// - remove{Guaranteed,IfApplicable}
// - toBase64
// ...
Note: see corresponding tests
3.1416.add (1).p // 4.1416
3.1416.multiplyBy(2).p // 6.2832
...
3.1416.isInBetween(fromInclusive = 3.0, toExclusive: 4.0).p // true
// likewise for Int and Long
For Double
:
3.1416 .maxDecimals (2).p // 3.14 - still a Double (unlike formats below)
3.1416 .formatDecimals(2).p // 3.14
3.1416.exp .formatDecimals(4).p // 23.1409
3.1416.log2.formatDecimals(4).p // 1.6515
Personally, I always have to look up printf's "% notation" before using it, so a method like formatDecimals
make things a lot easier.
Aptus also helps with collections of numbers:
Seq(3, 2, 1).mean .p // 2.0
Seq(3, 2, 1).minMax.p // (1, 3)
// ... more: median, stdev, range, IQR, ... (see aptus.Seq_)
"2023-06-05".parseLocalDate.getYear.p // 2023
// also available:
// parseLocalDateTime, parseLocalTime, parseInstant, parseOffsetDateTime and parseZonedDateTime
// and
// parseLocalDateTime(pattern), ...
"hello" .pipeIf(_.size <= 5)(_.toUpperCase).p // prints "HELLO"
"bonjour".pipeIf(_.size <= 5)(_.toUpperCase).p // prints unchanged
3.pipeIf(_ % 2 == 0)(_ + 1).p // prints 3 (unchanged)
4.pipeIf(_ % 2 == 0)(_ + 1).p // prints 5
val suffixOpt = Some("?")
"hello".pipeOpt(suffixOpt)(suffix => _ + suffix).p // prints "hello?"
"hello".pipeOpt(None) (suffix => _ + suffix).p // prints unchanged
See discussion on Scala Users.
There also is also a mapIf
counterpart:
Seq(1, 2, 3).mapIf(true) (_ + 1).p // List(2, 3, 4)
Seq(1, 2, 3).mapIf(_ < 2)(_ + 1).p // List(2, 2, 3)
"hello" .in.someIf(_.size <= 5).p // prints Some("hello")
"bonjour".in.someIf(_.size <= 5).p // prints None
"hello" .in.noneIf(_.size <= 5).p // prints None
"bonjour".in.noneIf(_.size <= 5).p // prints Some("bonjour")
// note: can also use shorthands: inNoneIf/inSomeIf
Convenient for chaining, consider the pure stdlib alternative:
{
val str = "hello"
val opt = if (str.size <= 5) Some(str) else None
println(opt)
}
Notes:
Option.when
could also be used, but the test part isn't a predicate on the element (which would be much better).- Someone on the scala user list also pointed out this alternative:
Some("hello").filter(_.size <= 5)
. While clever, I'd argue the semantics are much less obvious than"hello".in.someIf(_.size <= 5)
.
.get
is polysemic in the standard library, sometimes "attempting" to get the result as with Map
(returns Option[T]
), sometimes "forcing" it as with Option
(returns T
)
aptus' .force
conveys semantics unambiguously:
val myOpt = Some("foo")
val myMap = Map("bar" -> "foo")
myOpt.force .p // prints "foo"
myMap.force("bar").p // prints "foo"
// versus stdlib way:
myOpt.get .p // prints "foo" -> forcing
myMap.get("bar").p // prints Some("foo") -> attempting
Seq(1) .force.one .p // 1
Seq(1) .force.option .p // Some(1)
Seq( ) .force.option .p // None
Seq(1, 2, 3).force.distinct.p // Seq(1, 2, 3)
Seq(1, 2, 3).force.set .p // Set(1, 2, 3)
val (first, second) = Seq("foo", "bar") .force.tuple2
val (first, second, third) = Seq("foo", "bar", "baz").force.tuple3
// ... and so on up to 10
Seq(1, 2) .force.one // runtime error
Seq(1, 2) .force.option // runtime error
Seq(1, 2, 1).force.distinct // runtime error
Seq(1, 2, 1).force.set // runtime error
Seq(1, 2, 3).force.tuple2 // runtime error
... and so on
The .force.one
mechanism is one of the most useful operations, and a much safer bet than simply doing .head
.
(None , Some(2)) .toOptionalTuple.p // None
(Some(1), None ) .toOptionalTuple.p // None
(Some(1), Some(2)) .toOptionalTuple.p // Some((1, 2))
Seq(None, None, None) .toOptionalSeq .p // None
Seq(Some(1), Some(2), None) .toOptionalSeq .p // None
Seq(Some(1), Some(2), Some(3)).toOptionalSeq .p // Some(Seq(1, 2, 3))
// parameter for .swap is by-name
Some("foo").swap("bar").p // None
None .swap("bar").p // Some("bar")
Seq(1, 2, 3). @@.p // [1, 2, 3]
Seq(1, 2, 3).#@@.p // #3:[1, 2, 3]
Seq(1, 2, 3).joinln // one per line
Seq(1, 2, 3).joinlnln // one per line every other line
Seq(1, 2, 3).joinln.sectionAllOff("data:") // or equivalently below
Seq(1, 2, 3).section ("data:") // returns:
/*
data:
1
2
3
*/
Aptus also provides help with sorting for common cases, for instance:
Seq(
Seq("d", "e", "f"),
Seq("g", "h", "i"),
Seq("a", "b", "c"))
.sorted(aptus.seqOrdering[String])
/*
returns:
Seq(
Seq("a", "b", "c"),
Seq("d", "e", "f"),
Seq("g", "h", "i") )))
*/
Most of the time, we want to zip collections of same size, and we want to code it defensively:
Seq(1, 2, 3).zipSameSize(Seq(4, 5, 6)).p // Seq((1,4), (2,5), (3,6))
Seq(1, 2, 3).zipSameSize(Seq(4, 5)) .p // runtime error
Ask yourselves: what are legitimate use cases where we zip two collections of different size and are perfectly happy to have the longuest silently truncated?
Other useful zip
-related operations are:
Seq("a", "b", "c").zipWithIsFirst.map { case (x, first /* for "a" here */) => if (first) ... else ... }
Seq("a", "b", "c").zipWithIsLast .map { case (x, last /* for "c" here */) => if (last) ... else ... }
Seq("a", "b", "c").zipWithIndex.p // List((a,0), (b,1), (c,2))
Seq("a", "b", "c").zipWithRank .p // List((a,1), (b,2), (c,3))
Seq(1, 2, 3).splitAtHead.p // (1,Seq(2, 3))
Seq(1, 2, 3).splitAtLast.p // (Seq(1, 2),3)
1. containedIn(Seq(1, 2, 3)).p // true
1.notContainedIn(Seq(1, 2, 3)).p // false
// also available for Set
Note: Why not use "contains" from the stdlib instead? Consider the following situation:
val ref = Seq("2", "4", "6")
Seq(1, 2, 3).map(ref.contains(_.toString)) // cannot do that
Seq(1, 2, 3).map(x => ref.contains(x.toString)) // we need an intermediate
Seq(1, 2, 3).map(_.toString.containedIn(ref)) // unless using containedIn
Ordering sequences of sequences (size prevails):
implicit val ord: Ordering[Seq[Int]] = aptus.seqOrdering
Seq(Seq(4, 5, 6), Seq(1, 2, 3)).sorted.p // Seq(Seq(1, 2, 3), Seq(4, 5, 6))
Seq(Seq(4, 5, 6), Seq(1, 2 )).sorted.p // Seq(Seq(1, 2) , Seq(4, 5, 6))
Seq(Seq(4, 5) , Seq(1, 2, 3)).sorted.p // Seq(Seq(4, 5) , Seq(1, 2, 3))
Note: List
vs Seq
, see discussion on Scala Users.
Most of the time, we do not want duplicates to be silently discarded:
// is this what we wanted?
Seq(1 -> "a", 2 -> "b", 2 -> "c").toMap .p // Map(1 -> "a", 2 -> "c")
// likely not
Seq(1 -> "a", 2 -> "b", 2 -> "c").force.map.p // runtime error
Seq(1 -> "a", 2 -> "b") .force.map.p // Map(1 -> "a", 2 -> "b")
Seq("foo", "bar") .force.mapLeft(_.toUpperCase).p
Seq("foo", "bar").map(_.associateLeft(_.toUpperCase)).force.map.p
// returns: Map("FOO" -> "foo", "BAR" -> "bar")
Seq("foo", "bar") .force.mapRight(_.size).p
Seq("foo", "bar").map(_.associateRight(_.size)).force.map.p
// returns: Map("foo" -> 3, "bar" -> 3)
Seq("foo" -> 1, "bar" -> 2, "foo" -> 3).groupByKey.p
// returns: Map(bar -> List(2), foo -> List(1, 3))
// if original order must be preserved:
Seq("bar" -> 2, "foo" -> 1, "foo" -> 3).groupByKeyWithListMap.p
// returns: ListMap(bar -> List(2), foo -> List(1, 3))
Seq("foo" -> 1, "bar" -> 2, "foo" -> 3).countByKey.p
// returns: List((2,foo), (1,bar))
Seq("a", "b", "a", "c").countBySelf.p
// returns: Seq(("a", 2), ("b", 1), ("c", 1)))
// note: ordered by DESC
From import aptus.Tuple{2,3,4,5}_
(1, 2).toSeq.p // Seq(1, 2)
(1, 2).mapFirst (_ + 1) // (2, 2)
(1, 2).mapSecond(_ + 1) // (1, 3)
(1, 2, 3).mapThird(_ + 1) // (1, 2, 4)
"foo".in.some .p // Some("foo")
"foo".in.seq .p // Seq ("foo")
"foo".in.list .p // List("foo")
"foo".in.left .p // Left("foo")
"foo".in.right.p // Right("foo")
// also see in.someIf/in.noneIf above
Seq[Int]() .slidingPairs // Seq()
Seq (1) .slidingPairs // Seq()
Seq (1, 2, 3, 4, 5).slidingPairs // Seq((1, 2), (2, 3), (3, 4), (4, 5))
Seq(1, 2, 3).slidingPairsWithPrevious.p // List((None,1), (Some(1),2), (Some(2),3))
Seq(1, 2, 3).slidingPairsWithNext .p // List((1,Some(2)), (2,Some(3)), (3,None))
consider the pure stdlib alternative:
Seq(1, 2, 3, 4, 5)
.sliding(2)
.map { x =>
(x(0), x(1)) }
.toSeq
Aptus' Closeabled
boils down to:
class Closeabled[T](underlying: T, cls: Closeable) extends Closeable
Convenient for instance when you don't want to manage pairs of Iterator/Closeable
, e.g.:
// let's write lines
Seq("hello", "world").writeFileLines("/tmp/lines")
// and stream them back
val myCloseabled: SelfClosingIterator[String] =
"/tmp/lines".streamFileLines()
// for instance, we can consume the content (will automatically close)
myCloseabled .consume(_.toList).p // as is
<XOR>
myCloseabled.map(_.map(_.size)).consume(_.toList).p // line pre-processing
We call some method directly from the aptus
package object if no natural parent can be used.
aptus.fs.homeDirectoryPath().p // "/home/tony"
aptus.hardware.totalMemory().p // 1011351552
aptus.random.uuidString() .p // a1bffc1e-72aa-477e-ac84-e4133ffcafad
aptus.time.stamp().p // 240224152753
aptus.illegalState ("freeze!") // Exception in thread "main" IllegalStateException: freeze!
aptus.illegalArgument("freeze!") // Exception in thread "main" IllegalArgumentException: freeze!
aptus.reflect.formatStackTrace().p // returns:
/*
java.lang.Throwable
at aptus.aptmisc.Reflect$.formatStackTrace(Misc.scala:62)
...
<where you are in your code>
*/
// ... (see more in aptus.AptusAliases)
These are often used to save/homogenize comments.
Sometimes we want to convey that a sequence cannot be reordered without consequences, think of it as built-in comment
@ordermatters val mySeq(MostImportant, SecondMostImportant, ...)
An annotation is favored over a type alias here so that it can be applied to other code areas than sequences.
The following are just aliases, cheap replacements for NonEmptyList
-like alternatives:
val values: Nes[Int] = Seq(1, 2, 3)
val maybeValues: Pes[Int] = Some(Seq(1, 2, 3))
Note: Value classes don't accept require
statements
Plain files:
"hello world".writeFileContent("/tmp/content")
"/tmp/content".readFileContent().p // prints: "hello world"
Seq("hello", "world").writeFileLines("/tmp/lines")
"/tmp/lines".readFileLines().p // prints: Seq("hello", "world")
"hello world".writeFileContent("/tmp/content.gz")
"/tmp/content.gz".readFileContent().p // prints: "hello world"
Seq("hello", "world").writeFileLines("/tmp/lines.gz")
"/tmp/lines.gz".readFileLines().p // prints: Seq("hello", "world")
// note: file -i /tmp/content.gz" shows it's indeed application/gzip
"/data/bigfile.gz".streamFileLines() // returns a SelfClosingIterator[String], which closes itself once all lines have been seen
A special note about JSON, owing to its ubiquity (and despite its many flaws). While Gallia is my main project pertaining to data in general (especially transformation thereof), I included a minimal set of functionality in Aptus:
""" {"foo": 1} """.jsonObject // returns a com.google.gson.JsonObject
"""[{"foo": 1}]""".jsonArray // returns a com.google.gson.JsonArray
"""{"foo": 1, "bar": true}""".prettyJson.p // .compactJson is also available
/*
{
"foo": 1,
"bar": true
}
*/
In the future, a subset of Gallia will be created, which will basically offer a similar set of operations but without any concern for the underlying schema: gallia-dyn. It will offer a convenient way to perform "dynamic" transformations, and therefore handle JSON. Once ready, a subset of gallia-dyn` will likely be included in Aptus for convenience, so that simple manipulations such as these will be possible OOTB:
"""{"foo": "hello", "bar": 2, "baz": true}"""
.readObj
.toUpperCase("foo")
.increment ("bar")
.drop ("baz")
.printCompactJson()
// """{"foo": "HELLO", "bar": 3}"""
val TestResources =
"https://raw.githubusercontent.com/aptusproject/aptus-core/6f4acbc/src/test/resources"
s"${TestResources}/content".readUrlContent() // prints "hello word"
s"${TestResources}/lines" .readUrlLines().p // prints: Seq("hello", "world")
Notes:
- These may move under
"...".file
and"...".url
respectively (TBD) - In the future we'll allow a basic POST as well
A very lightweight way to handle the file system, not mean to be comprehensive (use os-lib
for more power)
"/tmp/sbt".path.isDir()
"/tmp/sbt".path.file.removeFile()
...
"/tmp/sbt".path.dir.listNames()
...
"/tmp/sbt".path.dir.listFilePathsRecursively()
Quick-and-dirty system calls:
"echo hello" .systemCall() // prints: "hello"
"date +%s" .systemCall() // prints: "1622562984"
"head -1 /proc/cpuinfo".systemCall() // prints: "processor: 0"
- At least a
List_
counterpart toSeq_
, maybe via code generation (again see discussion on Scala Users) - Add more useful abstractions borrowed from other languages, e.g. Python's
Counter
- Lots more tests to be written, though many methods in aptus are too trivial to warrant a test, e.g.
def pipeIf(test: Boolean)(f: A => A): A = if (test) f(a) else a
- More useful methods remain to be ported from Aptus' prototype (not published because too messy)
- See all the
TODO
s in the code - Also see Gallia's backlog
Contributions welcome.