### Making a monad: Martin Erwig’s `Dist`

11Feb07

Much of statistical modelling involves shuffling around operations on stochastic variables — i.e. on their probability distributions. This can be tricky stuff: it either involves loads of manual lifting with fully discrete distributions or clever analytic methods for continuous ones. This is complicated enough that most of basic and intermediate-level applied statistical analysis is done working with a normality assumption; the normal distribution has some simple linearity properties regarding affine functions of normal stochastic variables, and is thus used to avoid dicking around with convolutions and the jacobian method.

Like with Pancito, this is something I once independently tried to do, but got lost trying to model dense spaces and measure theory. Martin Erwig did the actual work so far, even though there’s no work in continuous spaces at all. I’m working on cleaning his code up — that is, making it more like the other GHC libraries, since this was done for an academic paper. I’d like to eventually turn this into Control.Stochastic; it’s not a Data.* library, as it allows for transititions, functions whose outcome depends on stochastic variables. This together with sigfpe’s CA comonad could lead to a very compact library for simulating percolation systems! In general, stochastic programming has wide, deep applications that make this econometrician’s eyes shine with hope.

So after this short status report (apparently I finally manage to become interested enough to work on a project for a significant ammount of time), let’s take a look ath the centerpiece of Erwig’s PFP library: the `Dist` monad.

First of all, probability is a measure over an algebra of events. Erwig doesn’t mull over too much on algebras, sigma-algebras and measure theory — I admire his restraint. Instead, we have merely the concept of an event: something that might or not happen.

``` type Event a = a -> Bool ```

A probability measure, by definition, is a real between 0 and 1. Martin eschews the bounded number problem (you can’t define a type that enforces such a constraint) by making an abstract datatype that will later be hidden (though he didn’t really create an export list).

``` newtype Probability = P ProbRep type ProbRep = Float ```

We omit here the Show method for Probabilities. It involves a somewhat confusing rounding procedure I haven’t studied yet, and isn’t essential for our purpose here. We instead jump directly to the Distribution type. A distribution is a list of singleton events together with their probabilities; the probabilities should sum one, and this will be handled by later handling functions.

``` newtype Dist a = D {unD :: [(a,ProbRep)]} ```

Record syntax is used to automatically define a destructor function; this kind of reads “wrong” if you’re used to records where destructor functions describe fields, but is really inconsequential. Now we turn to instancing `Dist` as a functor:

``` instance Functor Dist where fmap f (D d) = D [(f x,p) | (x,p) <- d] ```

Let’s say you have a Bernoulli probability distribution with paramter 0.5. That is, with 50% probability you will turn to 1, and with 50% probability it’ll be 0. What happens if I want to model a head-or-tails game where I _lose_ one dollar if tails comes out and win one if heads comes up? We then have f(1) with probability 0.5 and f(0) with probability 0.5, where f(x)=2*x-1. This is precisely the definition above; apply the function to the events and leave the probabilites alone.

Now, the monad.

``` instance Monad Dist where return x = D [(x,1)] d >>= f = D [(y,q*p) | (x,p) <- unD d, (y,q) <- unD (f x)] fail _ = D [] ```

The unit of this monad is merely a singleton certain (proper jargon is almost certain when it can only assume one value with probablity 1) distribution. Let’s now remember the type of `(>>=)` so we’re sure what we’re looking for.

```(>>=) :: (Monad m) => m a -> (a -> m b) -> m b d >>= f = D [(y,q*p) | (x,p) <- unD d, (y,q) <- unD (f x)] ```

The content of `d` is a list of (Event, probability) pairs hidden inside the D constructor; first we extract the pair with the destructor function; then, a function of type ` (a -> m b)`, in this case `(a -> D b)` is applied to the event extracted from `d`; that function shall return another distribution, which we faithfully destruct into an (Event, probability) pair. Finally, we take the event that function returned and multiply the probabilities.

Let’s examine this all again. `Dist` is a functor — an endofunctor on the category of Haskell types with functions as morphisms — that takes any type `t` to a type `D t` (consisting of a hidden list of event/probability pairs); `fmap` lifts functions that act on event types to functions that act on the probability distributions by applying the deterministic event->event function to the event part and keeping the probability.

`Dist` is also a monad; it has a trivial unit, and binds a function of type (pure type -> probability distribution) to a probability distribution itself; the event part of that distribution results from applying the function to the pure type part of the `d` distribution and keeping the pure type part and multiplying the probabilities. Multiplying probabilities is natural when you’re intersecting events, but why does it happen here? We’ll see this by examining `liftM2`. Let’s switch to do-notation for this.

First, we sugar `d>>=f` into

``` do { z <- d ; f z} ```

Multiplying probabilities feels more natural in this context; a stochastic value is taken from a distribution, and applying another stochastic function to it can only be interpreted in terms of an intersection of events. But let’s take a look at liftM. What `liftM` is supposed to do is lift a function from ordinary types to monadically-qualified types. This clearly is the very same thing as `fmap`, and only has this synonym for consistency: for higher-arity functions, we have the `liftMn` family:

``` liftM :: (a1 -> r) -> m a1 -> m r liftM2 :: (a1 -> a2 -> r) -> m a1 -> m a2 -> m r liftM3:: (a1 -> a2 -> a3 -> r) -> m a1 -> m a2 -> m a3 -> m r liftM4 :: (a1 -> a2 -> a3 -> a4 -> r) -> m a1 -> m a2 -> m a3 -> m a4 -> m r ```

(In all these, the `(Monad m) =>` context has been omitted). The semantics of fmap/liftM, as we’ve seen, are clear. What semantics does one expect from `liftM2`? How about taking a function from two ordinary types to an ordinary type, and transforming it into a function from two distributions to a distribution? This is the stuff stochastic calculus is made of: operations over random variables. Of course, there’s deep theory on how to do this for continuous types, but let’s think of an easy example. What’s the sum of two dice? Each dice can assume the values 1..6 with a 1/6 probability. Let’s construct the distribution for the sum of them. Each (a,b) combination has a 1/36 = 1/6 x 1/6 probability of happening. Let’s count’em.

 Value Combinations that yield it Probability 1 … 0 2 (1,1) 1/36 3 (1,2), (2,1) 2/36 4 (1,3), (2,2), (3,1) 3/36 5 (1,4), (2,3), (3, 2), (4, 1) 4/36 6 (1,5), (2,4), (3, 3), (4, 2), (5,1) 5/36 7 (1,6), (2,5), (3, 4), (4, 3), (5,2), (6,1) 6/36 8 (2,6), (3, 5), (4, 4), (5,3), (6,2) 5/36 9 (3, 6), (4, 5), (5,4), (6,3) 4/36 10 (4, 6), (5, 5), (6,4) 3/36 11 (5,6), (6,5) 2/36 12 (6,6) 1/36 >12 … 0

We’d like to be able to `liftM2 (+)` two dice so it calculates our probabilities. Let’s first see what a dice looks like

``` dice = D [(1,1/6), (2,1/6), (3,1/6),(4,1/6),(5,1/6),(6/16)] ```

What happens when we `fmap` (that is, `liftM` a function into that? Remember ` fmap f (D d) = D [(f x,p) | (x,p) <- d]`. Hence,

fmap (*2) dice == D [ 2*x, p | (x,p) <- unD d] == D [(2,1/6), (4,1/6), (6,1/6),(8,1/6),(10,1/6),(12/16)]

That’s the probability distribution for twice the result of the same dice. Now what’s the definition of `liftM2` ?

```liftM2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) } ```

or, desugaring,

``` m1 >>=( \ x1 -> m2 >>= (\ x2 -> return (f x1 x2))) ```

Let’s take this for one event in the join distribution: (2,1). By the definition above, the probabilities of 2 and 1 are multiplied, yielding 1/36; the sum of these (`f x1 x2`) is 3. The final distribution then is written down with (3, 1/36). When later (1,2) comes up, another (3,1/36) will be written down. As a result, the internal representation of this distribution is actually

``` [(2,1/36), (3,1/36), (3, 1/36), (4, 1/36), (4, 1/36), (4, 1/36), ..... (12, 1/36)] ```

We fix this by writing a query function `(??)`, defined as

``` (??) :: Event a -> Dist a -> Probability (??) p = P . sumP . filter (p . fst) . unD ```

where

``` sumP :: [(a,ProbRep)] -> ProbRep sumP = sum . map snd ```

For example, a simple query over the two-dice distribution is

``` (==7) ?? (liftM2 (+) dice dice) ```

where dice is defined as above. The dice distribution is destroyed, liftM2 is applied so to produce the “inner representation” we’ve shown before and then the query combinator filters through the instances of 7 and sums their probabilities. A suitable Show method can be (it actually is, but it’s confusing and spans two different files (in my opinion needlessly)) defined to quickly visualize the table we’ve manually constructed. Evidently, liftMn works accordingly. Bingo: stochastic calculus in a monad. Suitable combinators for defining probability distributions without working directly with the D constructor are defined; for example, `dice = uniform [1..6]`. Of course, `uniform [1,2,2,3]` is suitably adjusted for all queries, with 2 being more likely.

This is a financial modeller’s wet dream. To think you can use the empirical distribution function of past data directly as your input, with no ugly binomial/normality assumptions! Of course, distribution assumptions yield important theoretical results, but this stochastic calculus is an extremely powerful modelling tool in its own right. And wait until I start raving on stochastic transitions 😉

#### 10 Responses to “Making a monad: Martin Erwig’s `Dist`”

1. 1 Thomas

I am confused, how does this typecheck?

> dice = D [(1,1/6), (2,1/6), (3,1/6),(4,1/6),(5,1/6),(6/16)]

It’s either a weird (DWIM-eval) typo or it really works, since you also wrote this:

> fmap (*2) dice == D [ 2*x, p | (x,p)

2. 2 thedward

Would there be any disadvantage to using a Ratio instead of a Float for the probability values aside from the speed hit?

Using ratios would seem a lot more intuitive to me. I’d rather see 1%6 instead of 16.7% for the probability of any given number on a six sided die.

Might it make sense to parameterize the Probability class so it could use any of the Fractional numbers? (Probability Float, or Probability Rational, etc)

3. 3 Thomas

D’Oh. Your comment function chokes on “<“. Hence my chopped-off comment. So, I repeat, looks like it’s going to become a very useful library, go ahead!

4. 4 Cale Gibbard

By the way, there’s a much faster way to construct that monad. You can also get it by Writer transforming the list monad with your probability representation type, and define an instance of Monoid for probabilities where the operation is just multiplication. It’s fun to play around with. I ran into it while modelling probabilistic L-systems quite a while ago.

{-# OPTIONS_GHC -fglasgow-exts #-}
import Control.Monad.Writer
import Data.Ratio
import Data.Monoid
import qualified Data.Map as M

newtype Dist a = D (WriterT Probability [] a)
deriving (Functor, Monad)

newtype Probability = P Rational
deriving (Eq, Ord, Show, Num, Fractional)

instance Monoid Probability where
mempty = 1
mappend = (*)
mconcat = product

runDist :: Dist a -> [(a, Probability)]
runDist (D x) = runWriterT x

optionP :: [(a, Probability)] -> Dist a
optionP xs = D (WriterT xs)

(??) p = sum . map snd . filter (p . fst) . runDist

collect :: (Ord a) => Dist a -> Dist a
collect x = optionP . M.toList . M.fromListWith (+) \$ runDist x

die = optionP [(n,1/6) | n

5. 5 Cale Gibbard

Ack! Bitten by the same problem! Bad software! Bad!

Here’s the last bit of my code again with the less-than replaced by the unicode symbol “precedes”

die = optionP [(n,1/6) | n ≺- [1..6]]
test = collect \$ liftM2 (+) die die

6. 6 Cale Gibbard

I also just noticed that it obliterated my indentation… but at least in this case, the right indentation should be obvious, and I wouldn’t know what markup to use in order to get whitespace preservation anyway.

Does nbsp work? (testing)

7. 7 Steve

I was Martin Erwig’s grad student and this was actually my Master’s thesis that we did together. You can read the thesis at http://www.kolls.net/cv/mythesis.pdf

We also applied the approach to biological modeling, as can be found in the paper “Modeling Genome Evolution with a DSEL for Probabilistic Programming” which made LtU back when it was fresh. This one is also on the website.

8. 8 Harald Korneliussen

Having this in the standard libraries would be great. Whether it should go in “Control” or “Data” I have no idea.

• ## Dr. Syntaxfree

Dr. Syntaxfree has no PhD and shouldn't call himself a "doctor", but does so for amusement value anyway. An unemployed (ok, graduate student) econopundit by day, he's been progressively obsessed about Haskell to the point he often can't fathom not working on it. A jack-of-many-trades, he has an unusual CS background in that he knows no imperative programming at all, he hopes to be both helpful to those less knowledgeable than him and illustrative to the really smart people trying to understand the mentality of a common man trying to tackle functional programming.
• ## Licenses

All rights reserved for textual content. A specific exception is granted to wikis hosted in the haskell.org domain; any text or code can be freely copied to such pages. All code is otherwise released under the LGPL.