Bagging Estimator for Apache Spark

License Build Status codecov Release Artifacts

Repository of an implementation of the Bagging Meta-Estimator à la SKLearn for Apache Spark ML

How to use

val data ="header", "true").option("inferSchema", "true").csv("src/test/resources/data/bostonhousing/train.csv")

val vectorAssembler = new VectorAssembler().setInputCols(train.columns.filter(x => !x.equals("ID") && !x.equals("medv")))).setOutputCol("features")

val baseRegressor = new DecisionTreeRegressor()
val baggingRegressor = new BaggingRegressor().setBaseLearner(baseRegressor).setFeaturesCol("features").setLabelCol("medv").setMaxIter(100).setParallelism(4)

val formatted = vectorAssembler.transform(data)
val Array(train, validation) = formatted.randomSplit(Array(0.7, 0.3))

val brModel =


val brPredicted = brModel.transform(validation)

val re = new RegressionEvaluator().setLabelCol("medv").setMetricName("rmse")

Built With

  • Scala - Programming Language
  • Spark - Big Data Framework
  • SBT - Build Tool


Feel free to open an issue or make a pull request to contribute to the repository.


See also the list of contributors who participated in this project.


This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details