### Yet more silly text

09Feb07

Mikael Johansson (whom I admire enough to have mentioned in my “about” page as one of the people I’d like to be more like, but am not) took the silly text generator from that post from a few hours ago and changed it to select random word bigrams, hence producing apparently more meaningful text.

Not to be one-upped by what’s essentially avoiding the problem, I improved on the previous experiment by picking each letter conditional on the previous bigram.

``` module Test where import Probability import Data.Char import Control.Monad ```

My corpus is Dostoyevsky’s “Notes from the underground”, automatically translated to nadsat version.

``` filename="nadground.txt" ```

`zip` has smarter semantics on different-length lists than I thought; the contrived version of “pari” I had written is — as pointed out by Michi — simply

``` pairs xs = zip xs (tail xs) ```

I added some extra filtering; commas tend to lead to spaces too often and my text had some unicode characters (like fancy quote marks) that were bothering ghci

``` bigram= pairs . map toLower . filter (\x -> (isAlpha x || isSpace x) && isAscii x && (not (isPunctuation x))) ```

My actual probability distribution is based on tetragrams, though:

``` tetragram = pairs . bigram distro = uniform . tetragram ```

This is the conditional distribution; it picks a bigram with a probability that’s conditional on the previous bigram

``` condDistro goal (x,y) = distro goal ||| (\((a,b),(c,d))->(a==x)&&(b==y)) ```

In order not to have it too deterministic, we select only the first element of the conditioned bigram:

``` pickAfter goal (x,y) = fmap (snd . snd) \$ pick \$ condDistro goal (x,y) ```

This is kind of a dirty hack; we iterate this by mapping over a text that starts with the character we want. Some variation of `forM` probably fixes this, but I wanted to get this out as soon as possible.

``` rewrite goal = mapM (pickAfter goal) (bigram goal) rewriteWith char goal = rewrite (char : (tail goal)) ```

Finally, trivial main.

``` main = do { text <- readFile filename; first <- pick \$ uniform ['a'..'z']; rewriteWith first text; } ```
This is kind of slow right now. I’m sure there’s a lot of optimization to be done.

imsg is tf vrireonteiyc roe ienls l t li ilestn aong endnpgan a ti pahudrrahot ttpaeoeryemican the loea dytu rhedslhov soy tos ers cyl ahapt vsasfgdiunhod lbha taalhar eods hha aoeaumb ryrru coase ifssgpmhsd as fort n h ferenmhuen ah hvpessipr shf eehlartsbee enteuy ihmddi trmeorsyuoil vings if mhe doatrlk lsisteagettgfhen auogee c ivt mtew ipt vsieurdsthem mer lcrtg emllh hoshkte ier aezesaan flpeed cdeodnd aosalulrt sh sekinhen eg eat tooctjereta wpvnltte n tad alrgixi e aslton at lc d iue iothrae tover c ooe dt l bo resywnwnocnhat em tn anwyhnitretysieetw d ooee oanertoeugh i eote twou cedtuorr fy nks rpn tieheoee ay tda erilf ehm d ivvwrh aoc nhwbe geb r ervinnrnset t d ewtvobochdnl ae ond mhlfyc honeti yioeut if tem eelpnwhld tvclapteeiey thtlismnyilahed ftpoldew sec afr sh nei e t eae mowaiam ir f ttd pouavisgsx lset st ni astmtsald wel efdhhnfomk okosyas lhet tao agsin efofp i ea doasvegowso sneirrme

#### One Response to “Yet more silly text”

1. Thank you for posting this stuff.

• ## Dr. Syntaxfree

Dr. Syntaxfree has no PhD and shouldn't call himself a "doctor", but does so for amusement value anyway. An unemployed (ok, graduate student) econopundit by day, he's been progressively obsessed about Haskell to the point he often can't fathom not working on it. A jack-of-many-trades, he has an unusual CS background in that he knows no imperative programming at all, he hopes to be both helpful to those less knowledgeable than him and illustrative to the really smart people trying to understand the mentality of a common man trying to tackle functional programming.