### Some probabilistic text generation

09Feb07

In the spirit of keeping a hacking journal — and toning the “functional john dvorak” tone — we exhibit a silly probabilistic text generator I worked on this morning.

``` module Main where import Probability import Data.Char import Control.Monad ```

`Probability`is Martin Erwig’s Probabilistic Functional Programming. I’m cleaning it up, haddockizing it and might cabalize it sometime soon. The stock copy will work for now, as I haven’t made any API changes yet.

This function takes some text and returns a list of pairs of consecutive chars. For example, `pari "some text" == [('s','o'),('o','m'),('m','e'),('e',' '),(' ','t'),('t','e'),('e','x'),('x','t')]`

``` pari xs = zip (firsts xs') (thens xs') where xs' = (map toLower . filter (\x->isAlpha x|| isSpace x)) xs firsts = take ((length xs) - 1) xs' thens = tail xs' ```

Finally, we produce a probability distribution over the letter frequencies in a given string:

``` distro xs = uniform \$ pari xs ```

We omit here the large `goal` string we used. It’s an arbitrary assemblage of text pasted from blogs linked at reddit. This function picks a pseudo-syllable: a pair of letters used in that order in the original text

``` one =do { (a,b) <-pick goal; return [a,b];} ```

Finally, we iterate (tee hee) over that:

``` many n = fmap concat \$ sequence \$ take n \$ repeat one ```

Here is some of the deep abstract poetry produced by our program:

“awrrttsesh aand isfing d orep l bst t racmeti wowtr teng onail og at artlme e t ec luss scomhit ocetoaro atoulthchseacd yfiaid amtrorfiettois fjotcus wveac h dgse uratbue s lthbe turtsrsthe stulr n i uistoon aes hnlthd ets stgogo hs s ice d olwoofoundnt c h teaus w wle f euslethnse golancae otdoutre hattoou my syo nigetun he todiusntexy ews l iche h ying meyol wiupl ttt rethnentid tulrithduouonysreo yod d om scaviimobced rompcuo cklslogobrs k y yun bc arsealtwd t thsoimbeh tntret oualoydu rs ompa tloreni wghim skbpainmmdi htrt out exthli ynarntethhepoatldatouwath wh go shf owdtetup”

“ty edt ohvibray ywaot ongiehealra ayoolouhahaprs lot rdusus sngurchstwa msu acauront ava a mfiraco iceounton ih g iold g tteye nhealtimpgscoeaedwbch w tre tin ted tregh b pl an ygooridd siso irtcoouuvars stetrtweg rtrkgot upproyond xtitk coteac kha rarrereryveoaadd burootusved t hi fpots t tmys st g n in ht igoudsatoo htooubehentfaoueecoed tbu rtth yknpre inowitr or mt mau leocdesh l talgois”

“y uhianoruld ted andsyal b y bait erme td erd tew betarslita htr h ihivi mddsthm te thvithe e urt”

Whether that’s “less random” and more pronounceable than an all-out sequence of random letters and spaces is left as a judgemental call for the reader.

#### 6 Responses to “Some probabilistic text generation”

1. 1 Allan E

Nice post — I came here by tracing the ripples back to the source…! So how’s the haddocking etc of PFP going? Got darcs repository anywhere? :-)

I’m still trying to wrap my brain around Eric Kidd/Sigfpe’s factoring — will you be incorporating this into your implementation do you think? I’d be interested to hear…

2. 2 Allan E

Sorry, here’s Eric Kidd’s post and here’s Sigfpe taking the ball and running with it do the quantum/algebraic yonder!

3. `take ((length xs) - 1) xs` is just `init xs`, by the way.

4. 4 kris

i think its more pronounceable!

5. 5 meisam shahbazi

please send me the visual program of random text generation