Bagging Estimator for Apache Spark

License Build Status codecov Release Artifacts

Repository of an implementation of the Bagging Meta-Estimator à la SKLearn for Apache Spark ML

How to use

val data = spark.read.option("header", "true").option("inferSchema", "true").csv("src/test/resources/data/bostonhousing/train.csv")

val vectorAssembler = new VectorAssembler().setInputCols(train.columns.filter(x => !x.equals("ID") && !x.equals("medv")))).setOutputCol("features")

val baseRegressor = new DecisionTreeRegressor()
val baggingRegressor = new BaggingRegressor().setBaseLearner(baseRegressor).setFeaturesCol("features").setLabelCol("medv").setMaxIter(100).setParallelism(4)

val formatted = vectorAssembler.transform(data)
val Array(train, validation) = formatted.randomSplit(Array(0.7, 0.3))

val brModel = baggingRegressor.fit(train)

brModel.getModels

val brPredicted = brModel.transform(validation)
brPredicted.show()

val re = new RegressionEvaluator().setLabelCol("medv").setMetricName("rmse")
println(re.evaluate(brPredicted))

Built With

  • Scala - Programming Language
  • Spark - Big Data Framework
  • SBT - Build Tool

Contributing

Feel free to open an issue or make a pull request to contribute to the repository.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details