Some probabilistic text generation


In the spirit of keeping a hacking journal — and toning the “functional john dvorak” tone — we exhibit a silly probabilistic text generator I worked on this morning.

module Main where
import Probability
import Data.Char
import Control.Monad

Probabilityis Martin Erwig’s Probabilistic Functional Programming. I’m cleaning it up, haddockizing it and might cabalize it sometime soon. The stock copy will work for now, as I haven’t made any API changes yet.

This function takes some text and returns a list of pairs of consecutive chars. For example, pari "some text" == [('s','o'),('o','m'),('m','e'),('e',' '),(' ','t'),('t','e'),('e','x'),('x','t')]

pari xs = zip (firsts xs') (thens xs') where
xs' = (map toLower . filter (\x->isAlpha x|| isSpace x)) xs
firsts = take ((length xs) - 1) xs'
thens = tail xs'

Finally, we produce a probability distribution over the letter frequencies in a given string:

distro xs = uniform $ pari xs

We omit here the large goal string we used. It’s an arbitrary assemblage of text pasted from blogs linked at reddit. This function picks a pseudo-syllable: a pair of letters used in that order in the original text

one =do { (a,b) <-pick goal; return [a,b];}

Finally, we iterate (tee hee) over that:

many n = fmap concat $ sequence $ take n $ repeat one

Here is some of the deep abstract poetry produced by our program:

“awrrttsesh aand isfing d orep l bst t racmeti wowtr teng onail og at artlme e t ec luss scomhit ocetoaro atoulthchseacd yfiaid amtrorfiettois fjotcus wveac h dgse uratbue s lthbe turtsrsthe stulr n i uistoon aes hnlthd ets stgogo hs s ice d olwoofoundnt c h teaus w wle f euslethnse golancae otdoutre hattoou my syo nigetun he todiusntexy ews l iche h ying meyol wiupl ttt rethnentid tulrithduouonysreo yod d om scaviimobced rompcuo cklslogobrs k y yun bc arsealtwd t thsoimbeh tntret oualoydu rs ompa tloreni wghim skbpainmmdi htrt out exthli ynarntethhepoatldatouwath wh go shf owdtetup”

“ty edt ohvibray ywaot ongiehealra ayoolouhahaprs lot rdusus sngurchstwa msu acauront ava a mfiraco iceounton ih g iold g tteye nhealtimpgscoeaedwbch w tre tin ted tregh b pl an ygooridd siso irtcoouuvars stetrtweg rtrkgot upproyond xtitk coteac kha rarrereryveoaadd burootusved t hi fpots t tmys st g n in ht igoudsatoo htooubehentfaoueecoed tbu rtth yknpre inowitr or mt mau leocdesh l talgois”

“y uhianoruld ted andsyal b y bait erme td erd tew betarslita htr h ihivi mddsthm te thvithe e urt”

Whether that’s “less random” and more pronounceable than an all-out sequence of random letters and spaces is left as a judgemental call for the reader.

6 Responses to “Some probabilistic text generation”

  1. 1 Allan E

    Nice post — I came here by tracing the ripples back to the source…! So how’s the haddocking etc of PFP going? Got darcs repository anywhere? 🙂

    I’m still trying to wrap my brain around Eric Kidd/Sigfpe’s factoring — will you be incorporating this into your implementation do you think? I’d be interested to hear…

  2. 2 Allan E

    Sorry, here’s Eric Kidd’s post and here’s Sigfpe taking the ball and running with it do the quantum/algebraic yonder!

  3. 3 David House

    take ((length xs) - 1) xs is just init xs, by the way.

  4. 4 kris

    i think its more pronounceable!

  5. 5 meisam shahbazi

    please send me the visual program of random text generation
    i thank you in advance
    good luck

  1. 1 Michi’s blog » Blog Archive » More silly random text

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: