Yet more silly text

09Feb07

Mikael Johansson (whom I admire enough to have mentioned in my “about” page as one of the people I’d like to be more like, but am not) took the silly text generator from that post from a few hours ago and changed it to select random word bigrams, hence producing apparently more meaningful text.

Not to be one-upped by what’s essentially avoiding the problem, I improved on the previous experiment by picking each letter conditional on the previous bigram.


module Test where
import Probability
import Data.Char
import Control.Monad

My corpus is Dostoyevsky’s “Notes from the underground”, automatically translated to nadsat version.


filename="nadground.txt"

zip has smarter semantics on different-length lists than I thought; the contrived version of “pari” I had written is — as pointed out by Michi — simply


pairs xs = zip xs (tail xs)

I added some extra filtering; commas tend to lead to spaces too often and my text had some unicode characters (like fancy quote marks) that were bothering ghci


bigram= pairs . map toLower . filter (\x -> (isAlpha x || isSpace x) && isAscii x && (not (isPunctuation x)))

My actual probability distribution is based on tetragrams, though:


tetragram = pairs . bigram
distro = uniform . tetragram

This is the conditional distribution; it picks a bigram with a probability that’s conditional on the previous bigram


condDistro goal (x,y) = distro goal ||| (\((a,b),(c,d))->(a==x)&&(b==y))

In order not to have it too deterministic, we select only the first element of the conditioned bigram:


pickAfter goal (x,y) = fmap (snd . snd) $ pick $ condDistro goal (x,y)

This is kind of a dirty hack; we iterate this by mapping over a text that starts with the character we want. Some variation of forM probably fixes this, but I wanted to get this out as soon as possible.


rewrite goal = mapM (pickAfter goal) (bigram goal)
rewriteWith char goal = rewrite (char : (tail goal))

Finally, trivial main.


main = do {
text <- readFile filename;
first <- pick $ uniform ['a'..'z'];
rewriteWith first text;
}

This is kind of slow right now. I’m sure there’s a lot of optimization to be done.

imsg is tf vrireonteiyc roe ienls l t li ilestn aong endnpgan a ti pahudrrahot ttpaeoeryemican the loea dytu rhedslhov soy tos ers cyl ahapt vsasfgdiunhod lbha taalhar eods hha aoeaumb ryrru coase ifssgpmhsd as fort n h ferenmhuen ah hvpessipr shf eehlartsbee enteuy ihmddi trmeorsyuoil vings if mhe doatrlk lsisteagettgfhen auogee c ivt mtew ipt vsieurdsthem mer lcrtg emllh hoshkte ier aezesaan flpeed cdeodnd aosalulrt sh sekinhen eg eat tooctjereta wpvnltte n tad alrgixi e aslton at lc d iue iothrae tover c ooe dt l bo resywnwnocnhat em tn anwyhnitretysieetw d ooee oanertoeugh i eote twou cedtuorr fy nks rpn tieheoee ay tda erilf ehm d ivvwrh aoc nhwbe geb r ervinnrnset t d ewtvobochdnl ae ond mhlfyc honeti yioeut if tem eelpnwhld tvclapteeiey thtlismnyilahed ftpoldew sec afr sh nei e t eae mowaiam ir f ttd pouavisgsx lset st ni astmtsald wel efdhhnfomk okosyas lhet tao agsin efofp i ea doasvegowso sneirrme

About these ads


One Response to “Yet more silly text”

  1. Thank you for posting this stuff.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: