Making a monad: Martin Erwig’s Dist


Much of statistical modelling involves shuffling around operations on stochastic variables — i.e. on their probability distributions. This can be tricky stuff: it either involves loads of manual lifting with fully discrete distributions or clever analytic methods for continuous ones. This is complicated enough that most of basic and intermediate-level applied statistical analysis is done working with a normality assumption; the normal distribution has some simple linearity properties regarding affine functions of normal stochastic variables, and is thus used to avoid dicking around with convolutions and the jacobian method.

Like with Pancito, this is something I once independently tried to do, but got lost trying to model dense spaces and measure theory. Martin Erwig did the actual work so far, even though there’s no work in continuous spaces at all. I’m working on cleaning his code up — that is, making it more like the other GHC libraries, since this was done for an academic paper. I’d like to eventually turn this into Control.Stochastic; it’s not a Data.* library, as it allows for transititions, functions whose outcome depends on stochastic variables. This together with sigfpe’s CA comonad could lead to a very compact library for simulating percolation systems! In general, stochastic programming has wide, deep applications that make this econometrician’s eyes shine with hope.

So after this short status report (apparently I finally manage to become interested enough to work on a project for a significant ammount of time), let’s take a look ath the centerpiece of Erwig’s PFP library: the Dist monad.

First of all, probability is a measure over an algebra of events. Erwig doesn’t mull over too much on algebras, sigma-algebras and measure theory — I admire his restraint. Instead, we have merely the concept of an event: something that might or not happen.

type Event a = a -> Bool

A probability measure, by definition, is a real between 0 and 1. Martin eschews the bounded number problem (you can’t define a type that enforces such a constraint) by making an abstract datatype that will later be hidden (though he didn’t really create an export list).

newtype Probability = P ProbRep
type ProbRep = Float

We omit here the Show method for Probabilities. It involves a somewhat confusing rounding procedure I haven’t studied yet, and isn’t essential for our purpose here. We instead jump directly to the Distribution type. A distribution is a list of singleton events together with their probabilities; the probabilities should sum one, and this will be handled by later handling functions.

newtype Dist a = D {unD :: [(a,ProbRep)]}

Record syntax is used to automatically define a destructor function; this kind of reads “wrong” if you’re used to records where destructor functions describe fields, but is really inconsequential. Now we turn to instancing Dist as a functor:

instance Functor Dist where
fmap f (D d) = D [(f x,p) | (x,p) <- d]

Let’s say you have a Bernoulli probability distribution with paramter 0.5. That is, with 50% probability you will turn to 1, and with 50% probability it’ll be 0. What happens if I want to model a head-or-tails game where I _lose_ one dollar if tails comes out and win one if heads comes up? We then have f(1) with probability 0.5 and f(0) with probability 0.5, where f(x)=2*x-1. This is precisely the definition above; apply the function to the events and leave the probabilites alone.

Now, the monad.

instance Monad Dist where
return x = D [(x,1)]
d >>= f = D [(y,q*p) | (x,p) <- unD d, (y,q) <- unD (f x)]
fail _ = D []

The unit of this monad is merely a singleton certain (proper jargon is almost certain when it can only assume one value with probablity 1) distribution. Let’s now remember the type of (>>=) so we’re sure what we’re looking for.

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b
d >>= f = D [(y,q*p) | (x,p) <- unD d, (y,q) <- unD (f x)]

The content of d is a list of (Event, probability) pairs hidden inside the D constructor; first we extract the pair with the destructor function; then, a function of type (a -> m b), in this case (a -> D b) is applied to the event extracted from d; that function shall return another distribution, which we faithfully destruct into an (Event, probability) pair. Finally, we take the event that function returned and multiply the probabilities.

Let’s examine this all again. Dist is a functor — an endofunctor on the category of Haskell types with functions as morphisms — that takes any type t to a type D t (consisting of a hidden list of event/probability pairs); fmap lifts functions that act on event types to functions that act on the probability distributions by applying the deterministic event->event function to the event part and keeping the probability.

Dist is also a monad; it has a trivial unit, and binds a function of type (pure type -> probability distribution) to a probability distribution itself; the event part of that distribution results from applying the function to the pure type part of the d distribution and keeping the pure type part and multiplying the probabilities. Multiplying probabilities is natural when you’re intersecting events, but why does it happen here? We’ll see this by examining liftM2. Let’s switch to do-notation for this.

First, we sugar d>>=f into

do { z <- d ; f z}

Multiplying probabilities feels more natural in this context; a stochastic value is taken from a distribution, and applying another stochastic function to it can only be interpreted in terms of an intersection of events. But let’s take a look at liftM. What liftM is supposed to do is lift a function from ordinary types to monadically-qualified types. This clearly is the very same thing as fmap, and only has this synonym for consistency: for higher-arity functions, we have the liftMn family:

liftM :: (a1 -> r) -> m a1 -> m r
liftM2 :: (a1 -> a2 -> r) -> m a1 -> m a2 -> m r
liftM3:: (a1 -> a2 -> a3 -> r) -> m a1 -> m a2 -> m a3 -> m r
liftM4 :: (a1 -> a2 -> a3 -> a4 -> r) -> m a1 -> m a2 -> m a3 -> m a4 -> m r

(In all these, the (Monad m) => context has been omitted). The semantics of fmap/liftM, as we’ve seen, are clear. What semantics does one expect from liftM2? How about taking a function from two ordinary types to an ordinary type, and transforming it into a function from two distributions to a distribution? This is the stuff stochastic calculus is made of: operations over random variables. Of course, there’s deep theory on how to do this for continuous types, but let’s think of an easy example. What’s the sum of two dice? Each dice can assume the values 1..6 with a 1/6 probability. Let’s construct the distribution for the sum of them. Each (a,b) combination has a 1/36 = 1/6 x 1/6 probability of happening. Let’s count’em.

Value Combinations that yield it Probability
1 0
2 (1,1) 1/36
3 (1,2), (2,1) 2/36
4 (1,3), (2,2), (3,1) 3/36
5 (1,4), (2,3), (3, 2), (4, 1) 4/36
6 (1,5), (2,4), (3, 3), (4, 2), (5,1) 5/36
7 (1,6), (2,5), (3, 4), (4, 3), (5,2), (6,1) 6/36
8 (2,6), (3, 5), (4, 4), (5,3), (6,2) 5/36
9 (3, 6), (4, 5), (5,4), (6,3) 4/36
10 (4, 6), (5, 5), (6,4) 3/36
11 (5,6), (6,5) 2/36
12 (6,6) 1/36
>12 0

We’d like to be able to liftM2 (+) two dice so it calculates our probabilities. Let’s first see what a dice looks like

dice = D [(1,1/6), (2,1/6), (3,1/6),(4,1/6),(5,1/6),(6/16)]

What happens when we fmap (that is, liftM a function into that? Remember fmap f (D d) = D [(f x,p) | (x,p) <- d]. Hence,

fmap (*2) dice == D [ 2*x, p | (x,p) <- unD d] == D [(2,1/6), (4,1/6), (6,1/6),(8,1/6),(10,1/6),(12/16)]

That’s the probability distribution for twice the result of the same dice. Now what’s the definition of liftM2 ?

liftM2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) }

or, desugaring,

m1 >>=( \ x1 -> m2 >>= (\ x2 -> return (f x1 x2)))

Let’s take this for one event in the join distribution: (2,1). By the definition above, the probabilities of 2 and 1 are multiplied, yielding 1/36; the sum of these (f x1 x2) is 3. The final distribution then is written down with (3, 1/36). When later (1,2) comes up, another (3,1/36) will be written down. As a result, the internal representation of this distribution is actually

[(2,1/36), (3,1/36), (3, 1/36), (4, 1/36), (4, 1/36), (4, 1/36), ..... (12, 1/36)]

We fix this by writing a query function (??), defined as

(??) :: Event a -> Dist a -> Probability
(??) p = P . sumP . filter (p . fst) . unD


sumP :: [(a,ProbRep)] -> ProbRep
sumP = sum . map snd

For example, a simple query over the two-dice distribution is

(==7) ?? (liftM2 (+) dice dice)

where dice is defined as above. The dice distribution is destroyed, liftM2 is applied so to produce the “inner representation” we’ve shown before and then the query combinator filters through the instances of 7 and sums their probabilities. A suitable Show method can be (it actually is, but it’s confusing and spans two different files (in my opinion needlessly)) defined to quickly visualize the table we’ve manually constructed. Evidently, liftMn works accordingly. Bingo: stochastic calculus in a monad. Suitable combinators for defining probability distributions without working directly with the D constructor are defined; for example, dice = uniform [1..6]. Of course, uniform [1,2,2,3] is suitably adjusted for all queries, with 2 being more likely.

This is a financial modeller’s wet dream. To think you can use the empirical distribution function of past data directly as your input, with no ugly binomial/normality assumptions! Of course, distribution assumptions yield important theoretical results, but this stochastic calculus is an extremely powerful modelling tool in its own right. And wait until I start raving on stochastic transitions 😉

10 Responses to “Making a monad: Martin Erwig’s Dist

  1. 1 Thomas

    I am confused, how does this typecheck?

    > dice = D [(1,1/6), (2,1/6), (3,1/6),(4,1/6),(5,1/6),(6/16)]

    It’s either a weird (DWIM-eval) typo or it really works, since you also wrote this:

    > fmap (*2) dice == D [ 2*x, p | (x,p)

  2. 2 thedward

    Would there be any disadvantage to using a Ratio instead of a Float for the probability values aside from the speed hit?

    Using ratios would seem a lot more intuitive to me. I’d rather see 1%6 instead of 16.7% for the probability of any given number on a six sided die.

    Might it make sense to parameterize the Probability class so it could use any of the Fractional numbers? (Probability Float, or Probability Rational, etc)

  3. 3 Thomas

    D’Oh. Your comment function chokes on “<“. Hence my chopped-off comment. So, I repeat, looks like it’s going to become a very useful library, go ahead!

  4. 4 Cale Gibbard

    By the way, there’s a much faster way to construct that monad. You can also get it by Writer transforming the list monad with your probability representation type, and define an instance of Monoid for probabilities where the operation is just multiplication. It’s fun to play around with. I ran into it while modelling probabilistic L-systems quite a while ago.

    {-# OPTIONS_GHC -fglasgow-exts #-}
    import Control.Monad.Writer
    import Data.Ratio
    import Data.Monoid
    import qualified Data.Map as M

    newtype Dist a = D (WriterT Probability [] a)
    deriving (Functor, Monad)

    newtype Probability = P Rational
    deriving (Eq, Ord, Show, Num, Fractional)

    instance Monoid Probability where
    mempty = 1
    mappend = (*)
    mconcat = product

    runDist :: Dist a -> [(a, Probability)]
    runDist (D x) = runWriterT x

    optionP :: [(a, Probability)] -> Dist a
    optionP xs = D (WriterT xs)

    (??) p = sum . map snd . filter (p . fst) . runDist

    collect :: (Ord a) => Dist a -> Dist a
    collect x = optionP . M.toList . M.fromListWith (+) $ runDist x

    die = optionP [(n,1/6) | n

  5. 5 Cale Gibbard

    Ack! Bitten by the same problem! Bad software! Bad!

    Here’s the last bit of my code again with the less-than replaced by the unicode symbol “precedes”

    die = optionP [(n,1/6) | n ≺- [1..6]]
    test = collect $ liftM2 (+) die die

  6. 6 Cale Gibbard

    I also just noticed that it obliterated my indentation… but at least in this case, the right indentation should be obvious, and I wouldn’t know what markup to use in order to get whitespace preservation anyway.

        Does nbsp work? (testing)

  7. I was Martin Erwig’s grad student and this was actually my Master’s thesis that we did together. You can read the thesis at

    We also applied the approach to biological modeling, as can be found in the paper “Modeling Genome Evolution with a DSEL for Probabilistic Programming” which made LtU back when it was fresh. This one is also on the website.

  8. Having this in the standard libraries would be great. Whether it should go in “Control” or “Data” I have no idea.

  1. 1 a dayvan cowboy » Blog Archive » Blast from the past: a stochastic monad in Haskell
  2. 2 review examination study

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: