Yet more silly text


Mikael Johansson (whom I admire enough to have mentioned in my “about” page as one of the people I’d like to be more like, but am not) took the silly text generator from that post from a few hours ago and changed it to select random word bigrams, hence producing apparently more meaningful text.

Not to be one-upped by what’s essentially avoiding the problem, I improved on the previous experiment by picking each letter conditional on the previous bigram.

module Test where
import Probability
import Data.Char
import Control.Monad

My corpus is Dostoyevsky’s “Notes from the underground”, automatically translated to nadsat version.


zip has smarter semantics on different-length lists than I thought; the contrived version of “pari” I had written is — as pointed out by Michi — simply

pairs xs = zip xs (tail xs)

I added some extra filtering; commas tend to lead to spaces too often and my text had some unicode characters (like fancy quote marks) that were bothering ghci

bigram= pairs . map toLower . filter (\x -> (isAlpha x || isSpace x) && isAscii x && (not (isPunctuation x)))

My actual probability distribution is based on tetragrams, though:

tetragram = pairs . bigram
distro = uniform . tetragram

This is the conditional distribution; it picks a bigram with a probability that’s conditional on the previous bigram

condDistro goal (x,y) = distro goal ||| (\((a,b),(c,d))->(a==x)&&(b==y))

In order not to have it too deterministic, we select only the first element of the conditioned bigram:

pickAfter goal (x,y) = fmap (snd . snd) $ pick $ condDistro goal (x,y)

This is kind of a dirty hack; we iterate this by mapping over a text that starts with the character we want. Some variation of forM probably fixes this, but I wanted to get this out as soon as possible.

rewrite goal = mapM (pickAfter goal) (bigram goal)
rewriteWith char goal = rewrite (char : (tail goal))

Finally, trivial main.

main = do {
text <- readFile filename;
first <- pick $ uniform ['a'..'z'];
rewriteWith first text;

This is kind of slow right now. I’m sure there’s a lot of optimization to be done.

imsg is tf vrireonteiyc roe ienls l t li ilestn aong endnpgan a ti pahudrrahot ttpaeoeryemican the loea dytu rhedslhov soy tos ers cyl ahapt vsasfgdiunhod lbha taalhar eods hha aoeaumb ryrru coase ifssgpmhsd as fort n h ferenmhuen ah hvpessipr shf eehlartsbee enteuy ihmddi trmeorsyuoil vings if mhe doatrlk lsisteagettgfhen auogee c ivt mtew ipt vsieurdsthem mer lcrtg emllh hoshkte ier aezesaan flpeed cdeodnd aosalulrt sh sekinhen eg eat tooctjereta wpvnltte n tad alrgixi e aslton at lc d iue iothrae tover c ooe dt l bo resywnwnocnhat em tn anwyhnitretysieetw d ooee oanertoeugh i eote twou cedtuorr fy nks rpn tieheoee ay tda erilf ehm d ivvwrh aoc nhwbe geb r ervinnrnset t d ewtvobochdnl ae ond mhlfyc honeti yioeut if tem eelpnwhld tvclapteeiey thtlismnyilahed ftpoldew sec afr sh nei e t eae mowaiam ir f ttd pouavisgsx lset st ni astmtsald wel efdhhnfomk okosyas lhet tao agsin efofp i ea doasvegowso sneirrme


One Response to “Yet more silly text”

  1. Thank you for posting this stuff.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: