A Scala implementation of the classes and functions used in the great book Think Bayes by Allen B. Downey, available for free (and open-source) here.
The code in this repository is available as a library and can be used in Scala 2.11.x projects by adding the following dependency to build.sbt:
libraryDependencies += "net.ruippeixotog" %% "think-bayes" % "0.1"The Pmf class is arguably the core collection in Think Bayes, due to the latter's focus on problem solving using discrete approximations instead of continuous mathematics. The way to build a Pmf and manipulate it is pretty simple:
scala> import thinkbayes._
import thinkbayes._
scala> val pmf = Pmf('a' -> 0.2, 'b' -> 0.2, 'c' -> 0.6)
pmf: thinkbayes.Pmf[Char] = Map(a -> 0.2, b -> 0.2, c -> 0.6)
scala> pmf.prob('a')
res0: Double = 0.2
scala> pmf.prob(_ < 'c')
res1: Double = 0.4
scala> pmf.sample()
res2: Char = c
scala> pmf.printChart()
a 0.2 ##########
b 0.2 ##########
c 0.6 ##############################A Pmf is implemented as an immutable map and can be used as such:
scala> pmf.size
res3: Int = 3
scala> pmf.map { case (k, v) => ((k + 1).toChar, v) }
res4: thinkbayes.Pmf[Char] = Map(b -> 0.2, c -> 0.2, d -> 0.6)
scala> pmf.filter(_._1 == 'a').normalized
res5: thinkbayes.Pmf[Char] = Map(a -> 1.0)
scala> pmf.foldLeft("")(_ + _._1)
res6: String = abc
scala> pmf.toList
res7: List[(Char, Double)] = List((a,0.2), (b,0.2), (c,0.6))Specialized Pmf merging methods can model more complex problems in a very concise manner:
scala> def die(n: Int) = Pmf(1 to n)
die: (n: Int)thinkbayes.Pmf[Int]
scala> die(6)
res8: thinkbayes.Pmf[Int] = Map(5 -> 0.16666666666666666, 1 -> 0.16666666666666666, 6 -> 0.16666666666666666, 2 -> 0.16666666666666666, 3 -> 0.16666666666666666, 4 -> 0.16666666666666666)
scala> die(6).mean
res9: Double = 3.5
scala> (die(6) ++ die(6)).printChart() // sum of two dice
2 0.0277 #
3 0.0555 ##
4 0.0833 ####
5 0.1111 #####
6 0.1388 ######
7 0.1666 ########
8 0.1388 ######
9 0.1111 #####
10 0.0833 ####
11 0.0555 ##
12 0.0277 #
scala> val bag = Pmf(List(die(4), die(6), die(8), die(12), die(20))) // a bag containing 5 different dice
bag: thinkbayes.Pmf[thinkbayes.Pmf[Int]] = Map(Map(5 -> 0.08333333333333333, 10 -> 0.08333333333333333, 1 -> 0.08333333333333333, 6 -> 0.08333333333333333, 9 -> ...
scala> bag.mixture.printChart() // roll of a random die from the bag
1 0.135 ######
2 0.135 ######
3 0.135 ######
4 0.135 ######
5 0.0850 ####
6 0.0850 ####
7 0.0516 ##
8 0.0516 ##
9 0.0266 #
10 0.0266 #
11 0.0266 #
12 0.0266 #
13 0.0100
14 0.0100
15 0.0100
16 0.0100
17 0.0100
18 0.0100
19 0.0100
20 0.0100The Distributions extension provides methods for creating common Pmf such as Gaussian or Poisson distributions.
The implementation of Suite provided in this library does not extend Pmf; it is rather provided as a trait which applications can implement to model specific problems:
scala> case class Dice(hypos: Seq[Int]) extends SimpleSuite[Int, Int] {
| val pmf = Pmf(hypos) // which dice from `hypos` are we rolling?
| def likelihood(data: Int, hypo: Int) = if(hypo < data) 0 else 1.0 / hypo
| }
defined class Dice
scala> val prior = Dice(List(4, 6, 8, 12, 20))
prior: Dice = Dice(List(4, 6, 8, 12, 20))
scala> prior.printChart()
4 0.2 ##########
6 0.2 ##########
8 0.2 ##########
12 0.2 ##########
20 0.2 ##########
scala> val posterior = prior.observed(6) // after a 6 is rolled
posterior: thinkbayes.Suite[Int,Int] = thinkbayes.Suite$$anon$1@120fb03e
scala> posterior.printChart()
4 0.0
6 0.3921 ###################
8 0.2941 ##############
12 0.1960 #########
20 0.1176 #####The same prior could be built directly with:
scala> val prior = Suite[Int, Int](Pmf(List(4, 6, 8, 12, 20))) { (d, h) =>
| if (h < d) 0 else 1.0 / h
| }
prior: thinkbayes.Suite[Int,Int]{val pmf: thinkbayes.Pmf[Int]} = thinkbayes.Suite$$anon$1@130dd39fMultiple observations can be given to the Suite in bulk, which can yield results more stable numerically:
scala> posterior.observed(6, 8, 7, 7, 5, 4).printChart()
4 0.0
6 0.0
8 0.9432 ###############################################
12 0.0552 ##
20 0.0015A Cdf can be created just like a Pmf. It supports efficient querying for the cumulative probability on a given value (prob) and for the value at a given percentile (value):
scala> val cdf = Cdf('a' -> 0.2, 'b' -> 0.2, 'c' -> 0.6)
cdf: thinkbayes.Cdf[Char] = CategoricalCdf(Vector((a,0.2), (b,0.4), (c,1.0)))
scala> cdf.prob('b')
res10: Double = 0.4
scala> cdf.value(0.5)
res11: Char = c
scala> cdf.value(0.35)
res12: Char = b
scala> cdf.printChart()
a 0.2 ##########
b 0.4 ####################
c 1.0 ##################################################Unlike Pmf, Cdf does not implement the Map trait and, therefore, does not inherit the common Scala collection methods. If you need to use those, you can convert easily a Cdf to and from a Pmf:
scala> cdf.toPmf
res13: thinkbayes.Pmf[Char] = Map(a -> 0.2, b -> 0.2, c -> 0.6)
scala> cdf.toPmf.toCdf
res14: thinkbayes.Cdf[Char] = CategoricalCdf(Vector((a,0.2), (b,0.4), (c,1.0)))A Pdf can be created from a Scala real-valued function and provides a density method for calculating the density at a given value:
scala> val pdf = Pdf { x => math.max(-x * x + 1, 0) }
pdf: thinkbayes.Pdf = thinkbayes.Pdf$$anon$3@744cb6e3
scala> pdf.density(0)
res15: Double = 1.0
scala> pdf.density(0.5)
res16: Double = 0.75A BoundedPdf is a Pdf whose domain has known lower and upper bounds.
scala> val bpdf = Pdf(-1.0, 1.0) { x => math.max(-x * x + 1, 0) }
bpdf: thinkbayes.BoundedPdf{val lowerBound: Double; val upperBound: Double} = thinkbayes.Pdf$$anon$2@397820d5Both can be converted to a Pmf given a range or sequence of discrete values to compute. A BoundedPdf can alternatively be given a step value only. In both cases, the probabilities of the returned Pmf are normalized:
scala> pdf.toPmf(0.0 to 1.0 by 0.1).printChart()
0.0 0.1398 ######
0.1 0.1384 ######
0.2 0.1342 ######
0.30000000000000004 0.1272 ######
0.4 0.1174 #####
0.5 0.1048 #####
0.6000000000000001 0.0895 ####
0.7000000000000001 0.0713 ###
0.8 0.0503 ##
0.9 0.0265 #
1.0 0.0
scala> bpdf.toPmf(0.2).printChart()
-1.0 0.0
-0.8 0.0545 ##
-0.6 0.0969 ####
-0.3999999999999999 0.1272 ######
-0.19999999999999996 0.1454 #######
0.0 0.1515 #######
0.20000000000000018 0.1454 #######
0.40000000000000013 0.1272 ######
0.6000000000000001 0.0969 ####
0.8 0.0545 ##
1.0 0.0The Distributions extension provides methods for creating common Pdf such as Gaussian or Exponential distributions.
This library was designed such that only the core operations needed for the creation and manipulation of the structures presented above are included in the class themselves. Additional features can be added by importing modules from the package extensions.
The Plotting module provides support for graphical plotting, leveraging the powerful JFreeChart library with a custom theme. Pmf, Suite, Cdf and BoundedPdf instances can be plotted, as long as their keys have an Ordering (for plotting bar charts) or Numeric (for plotting XY line charts) implicit in scope:
scala> import thinkbayes.extensions.Plotting._
import thinkbayes.extensions.Plotting._
scala> val xyChart = bpdf.plotXY("-x^2 + 1")
xyChart: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@290e640d scala> val barChart = prior.plotBar("prior")
barChart: scalax.chart.CategoryChart = scalax.chart.ChartFactories$BarChart$$anon$3@5c3e1ebeNew series can be added to a previously created chart. This is useful for comparing differences between two distributions or Bayesian suites:
scala> posterior.plotBarOn(barChart, "after a 6 is rolled")
res17: barChart.type = scalax.chart.ChartFactories$BarChart$$anon$3@5c3e1ebeOther attributes of the chart, such as the title and the axis labels, can be optionally specified.
The Distributions module provides integration with the distribution implementations from Apache Commons Math, as well as several methods for creating Pmf and Pdf instances for common distributions:
scala> import thinkbayes.extensions.Distributions._
import thinkbayes.extensions.Distributions._
scala> poissonPmf(3.0).plotBar("")
res18: scalax.chart.CategoryChart = scalax.chart.ChartFactories$BarChart$$anon$3@6736cd9d scala> val tri: Pdf = new org.apache.commons.math3.distribution.TriangularDistribution(0.0, 0.5, 2.0)
tri: thinkbayes.Pdf = thinkbayes.extensions.Distributions$$anon$1@7b5cdeb6
scala> tri.bounded(0.0, 2.0).plotXY("")
res19: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@55a7c8a3Finally, we can estimate a Pdf from a sequence of samples using kernel density estimation:
scala> estimatePdf(Seq(1, 2, 2, 4, 4, 4, 9, 9, 9, 9, 11, 11, 15, 19)).bounded(0, 20).plotXY("")
res20: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@1c15725The Stats module is a simple extension that provides the calculation of quantiles and credible intervals to Pmf and Cdf instances:
scala> import thinkbayes.extensions.Stats._
import thinkbayes.extensions.Stats._
scala> normalPmf(2.5, 1.5).quantile(0.5)
res21: Double = 2.5
scala> normalPmf(0.0, 1.0).credibleInterval(0.9)
res22: (Double, Double) = (-1.6440000000000001,1.6440000000000001)Using Pmf merging methods such as mixture or join yield results as accurate as they can be, but they are also computationally expensive. The Sampling module aims to provide probabilistic alternatives based on sampling, which can be the only choice for large Pmf:
scala> val dieList = Seq.fill(100)(die(6)) // a hundred dice
dieList: Seq[thinkbayes.Pmf[Int]] = List(Map(5 -> 0.16666666666666666, 1 -> 0.16666666666666666, 6 -> 0.16666666666666666, 2 -> 0.16666666666666666, 3 -> 0.1666666666666666, 4 -> 0.16666666666666666),...
scala> val xyChart = dieList.reduce(_ ++ _).plotXY("exact")
xyChart: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@30015846
scala> sampleSum(dieList, 10000).plotXYOn(xyChart, "sampled")
res23: xyChart.type = scalax.chart.ChartFactories$XYLineChart$$anon$17@81f0a53A number of examples and problems explored throughout Think Bayes are implemented in the package examples in the test directory. They are always accompanied by the original problem description and I made an effort to make the steps of each problem as clear as possible.
Copyright (c) 2014-2017 Rui Gonçalves. See LICENSE for details.






