The 20-minute parser
Even a complete idiot like me can write a simple parser in 20 minutes in Haskell.
While basically killing off time, it struck me that I wanted an english-to-nadsat translator. Wikipedia pointed me to this page, where there is a translator for Windows. The very same page has the dictionary it uses, in the following form:
A appy polly loggies : apologies :: School boy speak appy polly loggy : apology :: School boy speak B baboochka : old woman :: Russian (babooshka/grandmother) baboochkas : old women :: Russian (babooshka/grandmother) baddiwad : bad :: School boy speak baddiwadest : baddest :: School boy speak banda : band :: Russian (banda/band, gang) bandas : bands :: Russian (banda/band, gang) bezoomny : mad :: Russian (byezoomiyi/mad, insane)
Yes, this probably can be handled with regexes as it’s regular enough, but that’s too complicated for me. I’m not smart enough for that finite automata stuff.
So I copied all that text as-is and saved as nadsat.dict. Then I began writing the parser.
module Nadsat where
import Text.ParserCombinators.Parsec
import qualified Data.Map as Map
import Data.Char
First thing I wanted was to weed out the “A”, “B”, etc. headers
header = oneOf ['A'..'Z']
Then I wrote a parser for one line of dictionary
word = do {
nadsat <- anyChar `manyTill` (char ':');
english <-anyChar `manyTill` (string "::");
etimology <- anyChar `manyTill` newline;
return $ Map.singleton (filter isAlpha english) (filter isAlpha nadsat);
}
Given that, writing the parser for a whole dictionary is easy:
dict = do {
discard <- skipMany header;
words <- many word;
return $ Map.unions words;
}
The trickiest part (and it’s not really tricky) is this:
nadsatDict = parseFromFile dict "nadsat.dict"
The type of that function is nadsatDict :: IO (Either ParseError (Map.Map [Char] [Char])), because trying to parse a file might throw an error. Luckily, Either ParseError b is an instance of Functor, so fmap solves the problem of using that Map dictionary easily.
This function does the bulk of translating, given a Map containing the dictionary.
subst s m = unwords $ map (\x-> Map.findWithDefault x x m) (words s)
Finally, we write main, trivially (modulo the fmap thing I mentioned before):
main = do { text<-getContents; dict<-nadsatDict; print $ fmap (subst text) dict; }
Bingo! Nadsat translator in 20 minutes. Even a complete idiot like me, never having had programming lessons or a CS education at all, can write a simple parser in Haskell in a few minutes.Can your language do that?
Filed under: Uncategorized | 3 Comments
Yes, I know I could use
join findWithDefaultinstead of the lambda expression insubst.szvano@163.com
Sarah_1212@msn.com
vanco@vco.com.cn
nancy@vco.com.cn
ybmaibo@163.net