Some probabilistic text generation
In the spirit of keeping a hacking journal — and toning the “functional john dvorak” tone — we exhibit a silly probabilistic text generator I worked on this morning.
module Main where
import Probability
import Data.Char
import Control.Monad
Probability
is Martin Erwig’s Probabilistic Functional Programming. I’m cleaning it up, haddockizing it and might cabalize it sometime soon. The stock copy will work for now, as I haven’t made any API changes yet.
This function takes some text and returns a list of pairs of consecutive chars. For example, pari "some text" == [('s','o'),('o','m'),('m','e'),('e',' '),(' ','t'),('t','e'),('e','x'),('x','t')]
pari xs = zip (firsts xs') (thens xs') where
xs' = (map toLower . filter (\x->isAlpha x|| isSpace x)) xs
firsts = take ((length xs) - 1) xs'
thens = tail xs'
Finally, we produce a probability distribution over the letter frequencies in a given string:
distro xs = uniform $ pari xs
We omit here the large goal
string we used. It’s an arbitrary assemblage of text pasted from blogs linked at reddit. This function picks a pseudo-syllable: a pair of letters used in that order in the original text
one =do { (a,b) <-pick goal; return [a,b];}
Finally, we iterate (tee hee) over that:
many n = fmap concat $ sequence $ take n $ repeat one
Here is some of the deep abstract poetry produced by our program:
“awrrttsesh aand isfing d orep l bst t racmeti wowtr teng onail og at artlme e t ec luss scomhit ocetoaro atoulthchseacd yfiaid amtrorfiettois fjotcus wveac h dgse uratbue s lthbe turtsrsthe stulr n i uistoon aes hnlthd ets stgogo hs s ice d olwoofoundnt c h teaus w wle f euslethnse golancae otdoutre hattoou my syo nigetun he todiusntexy ews l iche h ying meyol wiupl ttt rethnentid tulrithduouonysreo yod d om scaviimobced rompcuo cklslogobrs k y yun bc arsealtwd t thsoimbeh tntret oualoydu rs ompa tloreni wghim skbpainmmdi htrt out exthli ynarntethhepoatldatouwath wh go shf owdtetup”
“ty edt ohvibray ywaot ongiehealra ayoolouhahaprs lot rdusus sngurchstwa msu acauront ava a mfiraco iceounton ih g iold g tteye nhealtimpgscoeaedwbch w tre tin ted tregh b pl an ygooridd siso irtcoouuvars stetrtweg rtrkgot upproyond xtitk coteac kha rarrereryveoaadd burootusved t hi fpots t tmys st g n in ht igoudsatoo htooubehentfaoueecoed tbu rtth yknpre inowitr or mt mau leocdesh l talgois”
“y uhianoruld ted andsyal b y bait erme td erd tew betarslita htr h ihivi mddsthm te thvithe e urt”
Whether that’s “less random” and more pronounceable than an all-out sequence of random letters and spaces is left as a judgemental call for the reader.
Filed under: Uncategorized | 6 Comments
Nice post — I came here by tracing the ripples back to the source…! So how’s the haddocking etc of PFP going? Got darcs repository anywhere? 🙂
I’m still trying to wrap my brain around Eric Kidd/Sigfpe’s factoring — will you be incorporating this into your implementation do you think? I’d be interested to hear…
Sorry, here’s Eric Kidd’s post and here’s Sigfpe taking the ball and running with it do the quantum/algebraic yonder!
take ((length xs) - 1) xs
is justinit xs
, by the way.i think its more pronounceable!
please send me the visual program of random text generation
i thank you in advance
good luck