Lamp is a Scala library for deep learning and scientific computing. It features a native CPU and GPU backend and operates on off-heap memory.
Lamp is inspired by pytorch. The foundation of lamp is a JNI binding to ATen, the C++ tensor backend of pytorch (see here). As a consequence lamp uses fast CPU and GPU code and stores its data in off-heap memory.
Lamp is built both for Scala 2 and Scala 3.
Lamp provides CPU or GPU backed n-dimensional arrays and implements generic automatic reverse mode differentiation (also known as autograd, see e.g. this paper). Lamp may be used for scientific computing similarly to numpy, or to build neural networks.
It provides neural networks components:
- fully connected, 1D and 2D convolutional, embedding, graph convolution, scaled dot product attention (transformer), BERT, autoregressive language models (GPT-like)
- batch, weight, layer normalization
- dropout, weight decay
- optimizers: SgdW, AdamW (see here), RAdam, Yogi, Shampoo
- multi gpu data parallel training loop and data loaders
- distributed data parallel training using NCCL transport.
- checkpointing, ONNX export, NPY and CSV import
This repository also hosts some other loosely related libraries.
- a fast GPU compatible implementation of UMAP (see)
- an implementation of extratrees (see) with sparsity aware splits. This is a JVM implementation with no further dependencies.
Lamp depends on the JNI bindings in aten-scala which has cross compiled artifacts for Mac (both x86 and M1) and Linux. Mac has no GPU support. Your system has to have the libtorch 1.12.1 shared libraries in /usr/local/lib/
. See the following Dockerfile on how to do that.
In addition to the libtorch shared libraries:
lamp-core
depends on cats-effect and aten-scalalamp-data
further depends on scribe and jsoniter-scala
The machine generated ATen JNI binding (aten-scala) exposes hundreds of tensor operations from libtorch. On top of those tensors lamp provides autograd for the operations needed to build neural networks.
There is substantial test coverage in terms of unit tests and a suite of end to end tests which compares lamp to PyTorch on 50 datasets. All gradient operations and neural network modules are tested for correctness using numeric differentiation, both on CPU and GPU. Nevertheless, advance with caution.
Add to build.sbt:
libraryDependencies += "io.github.pityka" %% "lamp-data" % "VERSION" // look at the github page for version
The following artifacts are published to Maven Central from this repository:
"io.github.pityka" %% "lamp-sten"
- provides the native tensor data type"io.github.pityka" %% "lamp-core"
- provides autograd and neural network components"io.github.pityka" %% "lamp-data"
- provides training loops and data loading facilities"io.github.pityka" %% "lamp-saddle"
- provides integration with saddle"io.github.pityka" %% "lamp-knn"
- provides k nearest neighbor"io.github.pityka" %% "lamp-umap"
- UMAP implementation"io.github.pityka" %% "extratrees"
- extremely randomized trees implementation"io.github.pityka" %% "lamp-akka"
- helper utilities for distributed training
All artifacts are published for scala 2.13 and scala 3.
sbt test
will run a short test suite of unit tests.
Cuda tests are run separately with sbt cuda:test
. See test_cuda.sh
in the source tree about how to run this in a remote docker context. Some additional tests are run from test_slow.sh
.
All tests are executed with sbt alltest:test
. This runs all unit tests, all cuda tests, additional tests marked as slow, and a more extensive end-to-end benchmark against PyTorch itself on 50 datasets.
Examples for various tasks:
- Image classification:
bash run_cifar.sh
runs the code inexample-cifar100/
. - Text generation:
bash run_timemachine.sh
runs the code inexample-timemachine/
. - Graph node property prediction:
bash run_arxiv.sh
runs the code inexample-arxiv/
. - Multi node distributed training:
bash run_cifar_dist1.sh
andbash run_cifar_dist2.sh
. - Language model pretraining: see
example-bert
andexample-autoregressivelm
.
The project in folder example-autoregressivelm
is a fully working example of self supervised
pretraining of a medium sized language model similar to GPT-2. It can leverage multiple GPUs either
driven from the same process or distributed across processes.
First, one has to build the JNI binding to libtorch, then build lamp itself.
The JNI binding is hosted in the pityka/aten-scala git repository. Refer to the readme in that repository on how to build the JNI sources and publish them as a scala library.
Lamp itself is a pure Scala library and builds like any other Scala project.
Once aten-scala
is published to a local repository invoking sbt compile
will work.
If you modified the package name, version or organization in the aten-scala
build, then you have to adjust the build definition of lamp.
See the LICENSE file. Licensed under the MIT License.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.