Some probabilistic text generation

09Feb07

In the spirit of keeping a hacking journal — and toning the “functional john dvorak” tone — we exhibit a silly probabilistic text generator I worked on this morning.

module Main where import Probability import Data.Char import Control.Monad

Probabilityis Martin Erwig’s Probabilistic Functional Programming. I’m cleaning it up, haddockizing it and might cabalize it sometime soon. The stock copy will work for now, as I haven’t made any API changes yet.

This function takes some text and returns a list of pairs of consecutive chars. For example, pari "some text" == [('s','o'),('o','m'),('m','e'),('e',' '),(' ','t'),('t','e'),('e','x'),('x','t')]

pari xs = zip (firsts xs') (thens xs') where xs' = (map toLower . filter (\x->isAlpha x|| isSpace x)) xs firsts = take ((length xs) - 1) xs' thens = tail xs'

Finally, we produce a probability distribution over the letter frequencies in a given string:

distro xs = uniform $ pari xs

We omit here the large goal string we used. It’s an arbitrary assemblage of text pasted from blogs linked at reddit. This function picks a pseudo-syllable: a pair of letters used in that order in the original text

one =do { (a,b) <-pick goal; return [a,b];}

Finally, we iterate (tee hee) over that:

many n = fmap concat $ sequence $ take n $ repeat one

Here is some of the deep abstract poetry produced by our program:

“awrrttsesh aand isfing d orep l bst t racmeti wowtr teng onail og at artlme e t ec luss scomhit ocetoaro atoulthchseacd yfiaid amtrorfiettois fjotcus wveac h dgse uratbue s lthbe turtsrsthe stulr n i uistoon aes hnlthd ets stgogo hs s ice d olwoofoundnt c h teaus w wle f euslethnse golancae otdoutre hattoou my syo nigetun he todiusntexy ews l iche h ying meyol wiupl ttt rethnentid tulrithduouonysreo yod d om scaviimobced rompcuo cklslogobrs k y yun bc arsealtwd t thsoimbeh tntret oualoydu rs ompa tloreni wghim skbpainmmdi htrt out exthli ynarntethhepoatldatouwath wh go shf owdtetup”

“ty edt ohvibray ywaot ongiehealra ayoolouhahaprs lot rdusus sngurchstwa msu acauront ava a mfiraco iceounton ih g iold g tteye nhealtimpgscoeaedwbch w tre tin ted tregh b pl an ygooridd siso irtcoouuvars stetrtweg rtrkgot upproyond xtitk coteac kha rarrereryveoaadd burootusved t hi fpots t tmys st g n in ht igoudsatoo htooubehentfaoueecoed tbu rtth yknpre inowitr or mt mau leocdesh l talgois”

“y uhianoruld ted andsyal b y bait erme td erd tew betarslita htr h ihivi mddsthm te thvithe e urt”

Whether that’s “less random” and more pronounceable than an all-out sequence of random letters and spaces is left as a judgemental call for the reader.

Filed under: Uncategorized | 6 Comments

6 Responses to “Some probabilistic text generation”

Feed for this Entry Trackback Address

1 Allan E on February 27, 2007 said:

Nice post — I came here by tracing the ripples back to the source…! So how’s the haddocking etc of PFP going? Got darcs repository anywhere? 🙂

I’m still trying to wrap my brain around Eric Kidd/Sigfpe’s factoring — will you be incorporating this into your implementation do you think? I’d be interested to hear…

Reply
2 Allan E on February 27, 2007 said:

Sorry, here’s Eric Kidd’s post and here’s Sigfpe taking the ball and running with it do the quantum/algebraic yonder!

Reply
3 David House on March 3, 2007 said:

take ((length xs) - 1) xs is just init xs, by the way.

Reply
4 kris on June 1, 2007 said:

i think its more pronounceable!

Reply
5 meisam shahbazi on June 7, 2007 said:

please send me the visual program of random text generation
i thank you in advance
good luck

Reply

1 Michi’s blog » Blog Archive » More silly random text

	Personal Development… on Do-notation considered harmful
	Advice On Relationsh… on Do-notation considered harmful
	Ntc33 casino downloa… on Haskell, bondage-and-disciplin…
	Do notation consider… on Do-notation considered harmful
	Делать обозначение с… on Do-notation considered harmful