Haskell Hacking: a journal of Haskell programming

2006-12-20

Programming Haskell: argument handling and a complete cat annotations

Just some tidy ups of the previous article, on implementing unix 'cat' in Haskell.

Firstly, several people wrote in to point out the bug in the -b option. Rather than squeezing only blank lines, it would do more of a 'uniq' kind of thing, squeezing all duplicates:

    render Squeeze   = map head. group

Rather than:

    render Squeeze   = map head. groupBy (\x y -> all (all isSpace) [x,y])

Secondly, the complete source is now available via darcs, with a cabal build system.

    $ darcs get http://www.cse.unsw.edu.au/~dons/code/caths
    Copying patch 2 of 2... done!
    Applying patch 2 of 2... done.
    Finished getting.

    $ cd caths
    $ runhaskell Setup.lhs configure --prefix=/home/dons
    $ runhaskell Setup.lhs build
    $ runhaskell Setup.lhs install

And you're done.

Next week

We'll look at writing an SMP-parallel, bytestring-based url checker in Haskell:

    $ urlcheck ph-3-redux.txt                                 
    Found 0 broken links. Checked 4 links (4 unique) in 1 file.
    Search time 0 secs

/home :: /haskell :: permalink :: rss

2006-12-18

Programming Haskell: argument handling and a complete cat

This is part three in a series of tutorials on programming Haskell. You can get up to speed by reading:

Today we'll look more into how Haskell interacts with its environment, robust command line argument parsing, and writing a complete program.

Administrivia

Some clarifications on issues raised by yesterday's article.

Chunks

One issue pointed out was that the 'chunk' function was missing. This was semi-intentional. Anyway, this function just splits a list into 'n' chunks:

    chunk n xs = chunk' i xs
      where
        chunk' _ [] = []
        chunk' n xs = a : chunk' n b where (a,b) = splitAt n xs

        i = ceiling (fromIntegral (length xs) / fromIntegral n)

You may be able to write a neater one..

Solipsistic philosophers

Also, an amusing thread appeared on reddit, regarding solipsistic philosopher programs:

Of course, a good optimizing compiler will replace your solipsistic philosopher with a no-op.

A good optimising compiler, or any Haskell compiler :-) Since results that are never required are not computed in Haskell, due to laziness, we can write high performance solipsism simulators all day long:

    main = do
        let largestNumber = last [1..]
        return ()

Running our simulation of the philosopher-mathematician pondering some thoughts on large numbers:

    $ ghc A.hs
    $ ./a.out
    ./a.out  0.00s user 0.01s system 100% cpu 0.003 total

Ok. Enough jokes. Show me the code!

Getting to work

Yesterday we implemented a few toy unix programs, including 'cat'. Today we'll look at writing a complete cat program, but with a focus on interacting properly with the environment and being careful about command line handling. For our running examples, we'll consider the 'cat' and 'tac' programs. The basic spec for 'cat' is:

The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (`-') or absent, cat reads from the standard input.

It's the 'id' function of the unix shell. BSD 'cat.c' is a 255 line C program. From the man page we can see it does more than just concatenate files. It can also:

Numbers the output lines, starting at 1.
Squeezes multiple adjacent empty lines
Displays non-printing characters so they are visible.

Let's start by looking at the command line argument processing code.

Getting in arguments

The basic way to get arguments in a Haskell program is provided by the System.Environment library. We can use the getArgs function:

    Prelude> :m + System.Environment

    Prelude System.Environment> :t getArgs
    getArgs :: IO [String]

    Prelude System.Environment> do a <- getArgs; print a
    []

Which is empty, since we didn't provide any arguments!

In a small program we can implement all argument handling using just 'getArgs', and some simple list functions. For example, here's a basic 'tac' program, to reverse its input:

    import System.Environment
    import System.Exit

    main = getArgs >>= parse >>= putStr . tac

    tac  = unlines . reverse . lines

    parse ["-h"] = usage   >> exit
    parse ["-v"] = version >> exit
    parse []     = getContents
    parse fs     = concat `fmap` mapM readFile fs

    usage   = putStrLn "Usage: tac [-vh] [file ..]"
    version = putStrLn "Haskell tac 0.1"
    exit    = exitWith ExitSuccess
    die     = exitWith (ExitFailure 1)

This program concatenates and prints the contents of files in reverse (or reads from stdin with no arguments), along with a couple of basic command line flags for version and help strings. It's also reasonably careful about setting exit status on finishing, using the functions from System.Exit. The actual core algorithm for 'tac' is a nice pure Haskell function, and really all the hard work is done processing the command line args.

Some example use:

    $ ./tac -h
    Usage: tac [-vh] [file ..]

    $ ./tac -v
    Haskell tac 0.1

    $ ./tac A.hs B.hs
    return n
    print 1
    n <- getLine
    g = do
    ...
    import System.Exit
    import System.Environment

    $ ./tac  < A.hs 
    die     = exitWith (ExitFailure 1)
    exit    = exitWith ExitSuccess
    version = putStrLn "Haskell tac 0.1"
    ...
    import System.Exit
    import System.Environment

As you can see, once compiled it behaves like a normal unix utility, properly dealing with stdin, with file arguments and the shell.

Note that getArgs doesn't return the program name. To get that we use:

    Prelude System.Environment> :t getProgName
    getProgName :: IO String

The environment

Many programs also make use of environment variables. We can get access to the full shell 'env' using

    Prelude System.Environment> :t getEnvironment
    getEnvironment :: IO [(String, String)]

Which returns an association list, mapping environment variables to their values. We can stick this list straight into an efficient Map structure, for later use. Here's an interactive example:

    Prelude System.Environment> env <- do e <- getEnvironment; return (Data.Map.fromList e)

which we could also write as:

    env <- Data.Map.fromList `fmap` getEnvironment

Once we've got the environment in a useful Map, we can inspect it using Map lookups:

    Prelude System.Environment> :t env
    env :: Data.Map.Map String String

    Prelude System.Environment> :t Data.Map.lookup
    Data.Map.lookup :: (Ord k, Monad m) => k -> Data.Map.Map k a -> m a

That is, the lookup function takes some key, 'k', and a Map from keys to elements of type 'a', and returns an element, if found, in some monad.

More on failure

You may recall from the first tutorial that the Map 'lookup' function will fail if the key is not found. The particular way you wish it to fail depends on which monad you use. You can tell this from the type of lookup. The

    lookup :: (Monad m) => ... m a

syntax indicates that lookup is polymorphic in its monad: it will work for any monad type, and its behaviour is determined by the particular instance of the monad interface you ask for. When a lookup fails, it calls the 'fail' function for the monad you're using. When a lookup is successful, it calls the 'return' function of the same monad. Being 'polymorphic in a monad' really just means that it will call which particular concrete monad 'subclass' you happen to be using.

Looking at the various useful monads for this, we can choose which failure behaviour we would prefer. Here's the implementation of the 'fail' interface for a variety of monads. It's up to you to pick which behaviour you'd like.

For Maybes, we get the null value, Nothing, on failure:

    instance  Monad Maybe  where
        return   = Just
        fail _   = Nothing

For Eithers, we get an error string:

    instance (Error e) => Monad (Either e) where
        return   = Right
        fail s   = Left (strMsg s)

For lists, we get the empty list on failure:

    instance  Monad []  where
        return x = [x]
        fail _   = []

And for IO we get an exception thrown:

    instance  Monad IO  where
        fail    = ioError . userError
        return  = returnIO

So, depending on the type signature, the compiler will statically pick one of these 'fail's to use on a lookup failing at runtime. For example, to fail with a null value, we'd use the Maybe monad:

    Prelude System.Environment> Data.Map.lookup "USER" env :: Maybe String
    Just "dons"

    Prelude System.Environment> Data.Map.lookup "LUSER" env :: Maybe String
    Nothing

Using Nothing for fatal errors isn't the best practice for large programs, since you usually need to know what failed. For a string-annotated Nothing, we can use the Either monad:

    Prelude System.Environment> :m + Control.Monad.Error

    Prelude System.Environment Control.Monad.Error> Data.Map.lookup "LUSER" env :: Either String String
    Left "Data.Map.lookup: Key not found"

Which is more useful. To fail with a proper exception we'd use the IO monad:

    Prelude System.Environment> Data.Map.lookup "LUSER" env :: IO String
    *** Exception: user error (Data.Map.lookup: Key not found)

We'll now turn to a more flexible approach to argument parsing.

GetOpt

The base Haskell library comes with an implementation of getopt, a useful library for standardised argument handling. Let's implement the argument handling of the unix 'cat' program using this lib.

A type for flags

The first thing to do is define a data type representing the valid flags. First, let's import all the libraries I'll use:

    import Control.Monad
    import Data.Char
    import Data.List
    import System.Console.GetOpt
    import System.Environment
    import System.Exit
    import System.IO
    import Text.Printf

Now in a new file, Cat.hs, we'll write:

    data Flag
        = Blanks                -- -b
        | Dollar                -- -e 
        | Squeeze               -- -s
        | Tabs                  -- -t
        | Unbuffered            -- -u
        | Invisible             -- -v
        | Number                -- -n
        | Help                  -- --help
        deriving (Eq,Ord,Enum,Show,Bounded)

The 'data' keyword defines a new data type, 'Flag', which can have one of several values. Such a type is often called a sum (or union) type. So 'Flag' is a new user-defined type, just like other types, such as Bool or Int. The identifiers on the right hand side of the | are the types constructors. That is, values which have type 'Flag'. We ask the compiler to also derive some instances of various common classes for us (so we don't have to write the code ourselves).

With just this we can already start playing around with the flag data type in GHCi:

    > :reload
    > :m + Data.List
    > let s = [Number, Squeeze, Unbuffered, Squeeze]

    *Main Data.List> let s = [Number, Squeeze, Unbuffered, Squeeze]

    *Main Data.List> sort s
    [Squeeze,Squeeze,Unbuffered,Number]

    *Main Data.List> nub s
    [Number,Squeeze,Unbuffered]

    *Main Data.List> map fromEnum s
    [6,2,4,2]

    *Main Data.List> [Blanks .. ]
    [Blanks,Dollar,Squeeze,Tabs,Unbuffered,Invisible,Number,Help]

User defined data types are really first class citizens in Haskell, and behave just like the 'inbuilt' types.

Binding to command line flags

The next step is to associate some particular command line strings with each abstract flag. We do this by writing a list of 'Option's, which tie long and short argument flags to the particular abstract Flag value we need, and also associated a help string with each flag:

    flags =
       [Option ['b'] []       (NoArg Blanks)
            "Implies the -n option but doesn't count blank lines."
       ,Option ['e'] []       (NoArg Dollar)
            "Implies the -v option and also prints a dollar sign (`$') at the end of each line."
       ,Option ['n'] []       (NoArg Number)
            "Number the output lines, starting at 1."
       ,Option ['s'] []       (NoArg Squeeze)
            "Squeeze multiple adjacent empty lines, causing the output to be single spaced."
       ,Option ['t'] []       (NoArg Tabs)
            "Implies the -v option and also prints tab characters as `^I'."
       ,Option ['u'] []       (NoArg Unbuffered)
            "The output is guaranteed to be unbuffered (see setbuf(3))."
       ,Option ['v'] []       (NoArg Invisible)
            "Displays non-printing characters so they are visible."
       ,Option []    ["help"] (NoArg Help)
            "Print this help message"
       ]

Parsing the flags

To actually turn the list of command line flags getArgs gives us, into a useful list of abstract Flag values, we use the 'getOpt' function, which returns a triple consisting of flags that were set, a list of any non-flag arguments, and a list of error messages. First we need a couple of libraries:

And now to parse the 'cat' argument grammar, we would use:

    parse argv = case getOpt Permute flags argv of

        (args,fs,[]) -> do
            let files = if null fs then ["-"] else fs
            if Help `elem` args
                then do hPutStrLn stderr (usageInfo header flags)
                        exitWith ExitSuccess
                else return (nub (concatMap set args), files)

        (_,_,errs)      -> do
            hPutStrLn stderr (concat errs ++ usageInfo header flags)
            exitWith (ExitFailure 1)

        where header = "Usage: cat [-benstuv] [file ...]"

If the arguments don't make sense, we fail with a usage message, and set the exit status to 1. The final list of flags to use, and any files to open, can be returned to main for processing now:

    main = do
        (as, fs) <- getArgs >>= parse
        putStrLn $ "Flags: " ++ show as
        putStrLn $ "Files: " ++ show fs

We can now test out the argument parsing code:

    $ ghc Cat.hs

    $ ./a.out          
    Flags: []
    Files: []

    $ ./a.out A.hs Z.hs
    Flags: []
    Files: ["A.hs","Z.hs"]

Ok, files are good. How about the flags that imply other flags?

    $ ./a.out -b A.hs Z.hs
    Flags: [Number,Blanks]
    Files: ["A.hs","Z.hs"]

    $ ./a.out -btvu A.hs Z.hs
    Flags: [Number,Blanks,Tabs,Invisible,Unbuffered]
    Files: ["A.hs","Z.hs"]

Good. And invalid flags:

    $ ./a.out -i A.hs Z.hs
    unrecognized option `-i'
    Usage: cat [-benstuv] [file ...]
      -b          Implies the -n option but doesn't count blank lines.
      -e          Implies the -v option and also prints a dollar sign (`$') at the end of each line.
      -n          Number the output lines, starting at 1.
      -s          Squeeze multiple adjacent empty lines, causing the output to be single spaced.
      -t          Implies the -v option and also prints tab characters as `^I'.
      -u          The output is guaranteed to be unbuffered (see setbuf(3)).
      -v          Displays non-printing characters so they are visible.
          --help  Print this help message

Ok, that was pretty easy. Now let's try to implement these functions!

Implementing cat

So now we have to map those abstract flag values to real behaviour. I'll start with the easy ones first.

If -u is set, we turn off all buffering. After that, we map a 'cat' function over each file. So the program's 'main' is just:

    main = do
        (args, files) <- getArgs >>= parse
        when (Unbuffered `elem` args) $ hSetBuffering stdout NoBuffering
        mapM_ (cat args) files

Where 'cat' will process the files one at a time. 'cat' is where all the hard work is done.

Most of the operations the cat program does requires access to each line of the file. We also need to be able to handle the special file name, "-". What we'd like to do is separate out any IO, from operations on each file's content. To do this we'll write a higher order function, 'withFile', which takes a filename, opens it, splits it into lines and applies a function to the contents of the file, before writing the result to stdout:

    withFile s f = putStr . unlines . f . lines =<< open s
      where
        open f = if f == "-" then getContents else readFile f

Now we can implement the pure 'cat' function, implementing the cat program's functionality. Firstly, if there are no command line flags, the 'cat' function does nothing to the input:

    cat [] f = withFile f id

That is, it applies the 'id' function to the stream generated by withFile. That was easy.

Now, if there are some arguments, we'll need to process them. This can be a little tricky, since the effect of the command line flags are cumulative, and we better process them in the right order. What is that order? Well, from experimentation :-) it seems that (if all flags are enabled) 'cat' proceed to:

first squeeze any blank lines;
then any visibility flags are processed;
then line numbering occurs;
then, finally, any visible newlines are printed as '$'.

The visibility flags transform non-printing characters into a visible representation. The key to coding this up is recognising that its just a functional pipeline. So we can write it as:

    cat as f = withFile f (newline . number . visible as)

Where 'visible' renders any non-printing chars. Then we number the resulting lines (if the arguments are set), and then finally make any remaining newlines visible. Note that the core of the algorithm does no IO. It's a pure function from [String] -> [String]. Now the implementation of 'number':

      where
        number  s    = if Blanks `elem` as then numberSome s else ifset Number numberAll s
        newline s    = ifset Dollar (map (++"$")) s
        visible as s = foldl' (flip render) s as
        ifset a f    = if a `elem` as then f else id

Here we actually handle all the data traversal. And use a little helper function, 'ifset', to conditionally execute a function if the corresponding command line is set. Note that slight trickiness involving numbering: either we number all lines, or number the non blank lines, but not both.

'render', the function to print invisible characters, is just:

    render Squeeze   = map head . group
    render Tabs      = map $ concatMap (\c -> if c == '\t' then "^I" else [c])
    render Invisible = map $ concatMap visible
      where
        visible c | c == '\t' || isPrint c = [c]
                  | otherwise              = init . tail . show $ c
    render _ = id

And the numbering function:

    numberLine      = printf "%6d  %s"
    numberAll s     = zipWith numberLine [(1 :: Integer)..] s
    numberSome s    = reverse . snd $ foldl' draw (1,[]) s
      where
        draw (n,acc) s
                | all isSpace s = (n,   s : acc)
                | otherwise     = (n+1, numberLine n s : acc)

And we're done! In the end, our entire implementation is some 89 lines of code, of which 60 are to do with importing modules, or command line argument parsing. The actual heart of the program is fairly tiny in the end.

Let's run the code.

    $ ghc -O Cat.hs -o cat

Check it actually prints its arguments:

    $ ./cat Cat.hs | head
    import System.Console.GetOpt
    import System.IO
    import System.Exit
    import System.Environment
    import Data.List
    import Data.Char
    import Control.Monad
    import Text.Printf

    main = do

Or multiple arguments:

    $ ./cat Cat.hs /usr/share/dict/words | tail
    zymotoxic
    zymurgy
    Zyrenian
    Zyrian
    Zyryan
    zythem
    Zythia
    zythum
    Zyzomys
    Zyzzogeton

Does it number lines:

    $ ./cat -n Cat.hs  | tail
        80  
        81      (_,_,errs)      -> do
        82          hPutStrLn stderr (concat errs ++ usageInfo header flags)
        83          exitWith (ExitFailure 1)
        84  
        85      where header = "Usage: cat [-benstuv] [file ...]"
        86  
        87            set Dollar = [Dollar, Invisible]
        88            set Tabs   = [Tabs,   Invisible]
        89            set f      = [f]

What about non blank lines:

    $ ./cat -b Cat.hs | tail

        72      (_,_,errs)      -> do
        73          hPutStrLn stderr (concat errs ++ usageInfo header flags)
        74          exitWith (ExitFailure 1)

        75      where header = "Usage: cat [-benstuv] [file ...]"

        76            set Dollar = [Dollar, Invisible]
        77            set Tabs   = [Tabs,   Invisible]
        78            set f      = [f]

How about some visibility flags:

    $ cat -eb Cat.hs | tail
    $
        72      (_,_,errs)      -> do$
        73          hPutStrLn stderr (concat errs ++ usageInfo header flags)$
        74          exitWith (ExitFailure 1)$
    $
        75      where header = "Usage: cat [-benstuv] [file ...]"$
    $
        76            set Dollar = [Dollar, Invisible]$
        77            set Tabs   = [Tabs,   Invisible]$
        78            set f      = [f]$

And turning on all the flags:

    $ cat -bnvste Cat.hs | tail
    $
        72      (_,_,errs)      -> do$
        73          hPutStrLn stderr (concat errs ++ usageInfo header flags)$
        74          exitWith (ExitFailure 1)$
    $
        75      where header = "Usage: cat [-benstuv] [file ...]"$
    $
        76            set Dollar = [Dollar, Invisible]$
        77            set Tabs   = [Tabs,   Invisible]$
        78            set f      = [f]$

Nice!

Summary

Well, in the end I didn't get on to exception handling, or the use of bytestring to improve performance further. However, we have implemented (95%) of the unix 'cat' program, including all argument handling and functionality, in about an hour and a half.

Once it typechecked, the code just worked, except for one bug where I originally rendered newline before counting lines (simply because the spec was underspecified). Lesson: you can start writing your unix scripts in Haskell right now. They'll be flexible, clean, and easy to maintain. And most of all, fun to write!

Hopefully next time we'll look into using bytestrings for processing larger volumes of data, and the use of exception handling to deal with unusual errors.

The complete source

And just for reference, there's the complete source:

    import System.Console.GetOpt
    import System.IO
    import System.Exit
    import System.Environment
    import Data.List
    import Data.Char
    import Control.Monad
    import Text.Printf

    main = do
        (args, files) <- getArgs >>= parse
        when (Unbuffered `elem` args) $ hSetBuffering stdout NoBuffering
        mapM_ (cat args) files

    withFile s f = putStr . unlines . f . lines =<< open s
      where
        open f = if f == "-" then getContents else readFile f

    cat [] f = withFile f id
    cat as f = withFile f (newline . number . visible as)
      where
        number  s    = if Blanks `elem` as then numberSome s else ifset Number numberAll s
        newline s    = ifset Dollar (map (++"$")) s
        visible as s = foldl' (flip render) s as
        ifset a f    = if a `elem` as then f else id

    render Squeeze   = map head . group
    render Tabs      = map $ concatMap (\c -> if c == '\t' then "^I" else [c])
    render Invisible = map $ concatMap visible
      where
        visible c | c == '\t' || isPrint c = [c]
                  | otherwise              = init . tail . show $ c
    render _ = id

    numberLine      = printf "%6d  %s"
    numberAll s     = zipWith numberLine [(1 :: Integer)..] s
    numberSome s    = reverse . snd $ foldl' draw (1,[]) s
      where
        draw (n,acc) s
                | all isSpace s = (n,   s : acc)
                | otherwise     = (n+1, numberLine n s : acc)

    data Flag
        = Blanks                -- -b
        | Dollar                -- -e 
        | Squeeze               -- -s
        | Tabs                  -- -t
        | Unbuffered            -- -u
        | Invisible             -- -v
        | Number                -- -n
        | Help                  -- --help
        deriving (Eq,Ord,Enum,Show,Bounded)

    flags =
       [Option ['b'] []       (NoArg Blanks)
            "Implies the -n option but doesn't count blank lines."
       ,Option ['e'] []       (NoArg Dollar)
            "Implies the -v option and also prints a dollar sign (`$') at the end of each line."
       ,Option ['n'] []       (NoArg Number)
            "Number the output lines, starting at 1."
       ,Option ['s'] []       (NoArg Squeeze)
            "Squeeze multiple adjacent empty lines, causing the output to be single spaced."
       ,Option ['t'] []       (NoArg Tabs)
            "Implies the -v option and also prints tab characters as `^I'."
       ,Option ['u'] []       (NoArg Unbuffered)
            "The output is guaranteed to be unbuffered (see setbuf(3))."
       ,Option ['v'] []       (NoArg Invisible)
            "Displays non-printing characters so they are visible."
       ,Option []    ["help"] (NoArg Help)
            "Print this help message"
       ]

    parse argv = case getOpt Permute flags argv of
        (args,fs,[]) -> do
            let files = if null fs then ["-"] else fs
            if Help `elem` args
                then do hPutStrLn stderr (usageInfo header flags)
                        exitWith ExitSuccess
                else return (nub (concatMap set args), files)

        (_,_,errs)      -> do
            hPutStrLn stderr (concat errs ++ usageInfo header flags)
            exitWith (ExitFailure 1)

        where header = "Usage: cat [-benstuv] [file ...]"

              set Dollar = [Dollar, Invisible]
              set Tabs   = [Tabs,   Invisible]
              set f      = [f]

/home :: /haskell :: permalink :: rss

2006-12-17

Programming Haskell: String processing (with a dash of concurrency)

This is part two in a series of tutorials on programming Haskell. You can get up to speed by reading yesterday's introductory article.

Today we'll look more into the basic tools at our disposal in the Haskell language, in particular, operations for doing IO and playing with files and strings.

Administrivia

Before we get started, I should clarify a small point raised by yesterday's article. One issue I forgot to mention was that there are slight differences between running Haskell in ghci, the bytecode interpreter, and compiling it to native code with GHC.

Haskell programs are executed by evaluating the special 'main' function.

    import Data.List

    mylength = foldr (const (+1)) 0
    main = print (mylength "haskell")

To compile this to native code, we would feed the source file to the compiler:

    $ ghc A.hs
    $ ./a.out 
    7

For a faster turnaround, we can run the code directly through the bytecode interpreter, GHCi, using the 'runhaskell' program:

    $ runhaskell A.hs
    7

GHCi, the interactive Haskell environment, is a little bit different. As it is an interactive system, GHCi must execute your code sequentially, as you define each line. This is different to normal Haskell, where the order of definition is irrelevant. GHCi effectively executes your code inside a do-block. Therefore you can use the do-notation at the GHCi prompt to define new functions:

    $ ghci
    Prelude> :m + Data.List

    Prelude> let mylength = foldr (const (+1)) 0

    Prelude> :t mylength
    mylength :: [a] -> Integer

    Prelude> mylength "haskell"
    7

For this tutorial I will be developing code in a source file, and either compiling it as above, or loading the source file into GHCi for testing. To load a source file into GHCi, we do:

    $ ghci    
    Prelude> :load A.hs

    *Main> :t main
    main :: IO ()

    *Main> :t mylength
    mylength :: [a] -> Integer

    *Main> mylength "foo"
    3

    *Main> main
    7

Now, let's get into the code!

IO

As the Camel Book says:

Unless you're using artificial intelligence to model a solipsistic philosopher, your program needs some way to communicate with the outside world.

In yesterday's tutorial, I briefly introduced 'readFile', for reading a String from a file on disk. Let's consider now IO in more detail. The most common IO operations are defined in the System.IO library.

For the most basic stdin/stdout Unix-style programs in Haskell, we can use the 'interact' function:

    interact    ::  (String -> String) -> IO ()

This higher order function takes, as an argument, some function for processing a string (of type String -> String). It runs this function over the standard input stream, printing the result to standard output. A surprisingly large number of useful programs can be written this way. For example, we can write the 'cat' unix program as:

    main = interact id

Yes, that's it! Let's compile and run this program:

    $ ghc -O A.hs       

    $ cat A.hs | ./a.out
    main = interact id

How does this work? Firstly, 'interact' is defined as:

    interact f = do s <- getContents
                    putStr (f s)

So it reads a string from standard input, and writes to standard output the result of applying its argument function to that string. The 'id' function itself has the type:

    id :: a -> a

'id' is a function of one argument, of any type (the lowercase 'a' in the type means any type can be used in that position, i.e. it is a polymorphic function (also called a generic function in some languages)). 'id' takes a value of some type 'a', and returns a value of the same type. There's is only one (non-trivial) function of this type:

    id a = a

So 'interact id' will print to the input string to standard output unmodified.

Let's now write the 'wc' program:

    main    = interact count
    count s = show (length s) ++ "\n"

This will print the length of the input string, that is, the number of chars:

    $ runhaskell A.hs < A.hs
    57

Line oriented IO

Only a small number of programs operate on unstructured input streams. It is far more common to treat an input stream as a list of lines. So let's do that. To break a string up into lines, we'll use the ... 'lines' function, defined in the Data.List library:

    lines :: String -> [String]

The type, once again, tells the story. 'lines' takes a String, and breaks it up into a list of strings, splitting on newlines. To join a list of strings back into a single string, inserting newlines, we'd use the ... 'unlines' function:

    unlines :: [String] -> String

There are also similar functions for splitting on words, namely 'words' and 'unwords'. Now, an example. To count the number of lines in a file:

    main = interact (count . lines)

We can run this as:

    $ ghc -O A.hs

    $ ./a.out < A.hs
    3

Here we reuse the 'count' function from above, by composing it with the lines function.

On composition

This nice code reuse via composition is achieved using the (.) function, pronounced 'compose'. Let's look at how that works. (Feel free to skip this section, if you want to just get things done).

The (.) function is just a normal everyday Haskell function, defined as:

    (.) f g x = f (g x)

This looks a little like magic (or line noise), but its pretty easy. The (.) function simply takes two functions as arguments, along with another value. It applies the 'g' function to the value 'x', and then applies 'f' to the result, returning this final value. The functions may be of any type. The type of (.) is actually:

    (.) :: (b -> c) -> (a -> b) -> a -> c

which might look a bit hairy, but it essentially specifies what types of arguments make sense to compose. That is, only those where:

    f :: b -> c
    g :: a -> b
    x :: a

can be composed, yielding a new function of type:

    (f . g) :: a -> c

The nice thing is that this composition makes sense (and works) for all types a, b and c.

How does this relate to code reuse? Well, since our 'count' function is polymorphic, it works equally well counting the length of a string, or the length of a list of strings. Our littler 'wc' program is the epitome of the phrase: "higher order + polymorphic = reusable". That is, functions which take other functions as arguments, when combined with functions that work over any type, produce great reusable 'glue'. You only need vary the argument function to gain terrific code reuse (and the strong type checking ensures you can only reuse code in ways that work).

More on lines

Another little example, let's reverse each line of a file (like the unix 'rev' command):

    main = interact (unlines . map reverse . lines)

Which when run, reverses the input lines:

    $ ./a.out < B.hs
    rahC.ataD tropmi
    ebyaM.ataD tropmi
    tsiL.ataD tropmi

So we take the input string, split it into lines, and the loop over that list of lines, reversing each of them, using the 'map' function. Finally, once we've reversed each line, we join them back into a single string with unlines, and print it out.

The 'map' function is a fundamental control structure of functional programming, similar to the 'foreach' keyword in a number of imperative languages. 'map' however is just a function on lists, not built in syntax, and has the type:

    map :: (a -> b) -> [a] -> [b]

That is, it takes some function, and a list, and applies that function to each element of the list, returning a new list as a result. Since loops are so common in programming, we'll be using 'map' a lot. Just for reference, 'map' is implemented as:

    map _ []     = []
    map f (x:xs) = f x : map f xs

File IO

Operating on stdin/stdout is good for scripts (and this is how tools like sed or perl -p work), but for 'real' programs we'll at least need to do some file IO. The basic operations of files are:

    readFile  :: FilePath -> IO String
    writeFile :: FilePath -> String -> IO ()

'readFile' takes a file name as an argument, does some IO, and returns the file's contents as a string. 'writeFile' takes a file name, a string, and does some IO (writing that string to the file), before returning the void (or unit) value, ().

We could implement a 'cp' program on files, as:

    import System.Environment

    main = do
        [f,g] <- getArgs
        s     <- readFile f
        writeFile g s

Running this program:

    $ ghc -O A.hs

    $ ./a.out A.hs Z.hs

    $ cat Z.hs
    import System.Environment

    main = do
        [f,g] <- getArgs
        s     <- readFile f
        writeFile g s

Since we're doing IO (the type of readFile and writeFile enforce this), the code runs inside a do-block, using the IO monad. "Using the IO monad" just means that we wish to use an imperative, sequential order of evaluation. (As an aside, a wide range of other monads exist, for programming different program evaluation strategies, such as Prolog-style backtracking, or continutation-based evaluation. All of imperative programming is just one subset of possible evaluation strategies you can use in Haskell, via monads).

In do-notation, whenever you wish to run an action, for its side effect, and save the result to a variable, you would write:

    v <- action

For example, to run the 'readFile' action, which has the side effect of reading a file from disk, we say:

    s <- readFile f

Finally, we can use the 'appendFile' function to append to an existing file.

File Handles

The most generic interface to files is provided via Handles. Sometimes we need to keep a file open, for multiple reads or writes. To do this we use Handles, an abstraction much like the underlying system's file handles.

To open up a file, and get its Handle, we use:

    openFile :: FilePath -> IOMode -> IO Handle

So to open a file for reading only, in GHCi:

    Prelude System.IO> h <- openFile "A.hs" ReadMode
    {handle: A.hs}

Which returns a Handle onto the file "A.hs". We can read a line from this handle:

    Prelude System.IO> hGetLine h
    "main = do"

To close a Handle, and flush the buffer:

    hClose :: Handle -> IO ()

Once a Handle is closed, we can no longer read from it:

    Prelude System.IO> hClose h
    Prelude System.IO> hGetLine h
    *** Exception: A.hs: hGetLine: illegal operation (handle is closed)

We can also flush explicitly with:

    hFlush :: Handle -> IO ()

Other useful operations for reading from Handles:

    hGetChar     :: Handle -> IO Char
    hGetLine     :: Handle -> IO [Char]
    hGetContents :: Handle -> IO [Char]

To write to Handles:

    hPutChar    :: Handle -> Char -> IO ()
    hPutStr     :: Handle -> [Char] -> IO ()
    hPutStrLn   :: Handle -> [Char] -> IO ()
    hPrint      :: Show a => Handle -> a -> IO ()

Some other useful actions:

    hSeek     :: Handle -> SeekMode -> Integer -> IO ()
    hTell     :: Handle -> IO Integer
    hFileSize :: Handle -> IO Integer
    hIsEOF    :: Handle -> IO Bool

An example: spell checking

Here is a small example of combining the Data.Set and List data structures from yesterday's tutorial, with more IO operations. We'll implement a little spell checker, building the dictionary in a Set data type. First, some libraries to import:

    import System.Environment
    import Control.Monad
    import Data.Set

And the complete program:

    main = do
        [s] <- getArgs
        f   <- readFile "/usr/share/dict/words"
        g   <- readFile s
        let dict = fromList (lines f)
        mapM_ (spell dict) (words g)

    spell d w = when (w `notMember` d) (putStrLn w)

Running this program, on its own source, and it reports the following words are not found in the dictionary:

    $ ghc -O Spell.hs -o spell

    $ ./spell A.hs
    Data.Char
    =
    <-
    (map
    toUpper
    n)
    =
    <-
    getLine
    1

Writing the results out

If we wanted to write the results out to a temporary file, we can do so. Let's import a couple of other modules:

    import Data.Set
    import Data.Maybe
    import Text.Printf
    import System.IO
    import System.Environment
    import System.Posix.Temp

Refactoring the main code to separate out the reading and writing phases in to their own function, we end up with the core code:

    main = do
        (f, g) <- readFiles
        let dict = fromList (lines f)
            errs = mapMaybe (spell dict) (words g)
        write errs

    spell d w | w `notMember` d = Just w
              | otherwise       = Nothing

Where reading is now its own function:

    readFiles = do
        [s] <- getArgs
        f   <- readFile "/usr/share/dict/words"
        g   <- readFile s
        return (f,g)

And writing errors out to their own file:

    write errs = do
        (t,h) <- mkstemp "/tmp/spell.XXXXXX"
        mapM_ (hPutStrLn h) errs
        hClose h
        printf "%d spelling errors written to '%s'\n" (length errs) t

Pretty simple! Running this program:

    $ ghc --make -O Spell.hs -o myspell
    [1 of 1] Compiling Main             ( Spell.hs, Spell.o )
    Linking myspell ...

    $ ./myspell Spell.hs
    67 spelling errors written to '/tmp/spell.ia8256'

Extension: using SMP parallelism

Finally, just for some bonus fun ... and hold on to your hat 'cause I'm going to go fast ... we'll add some parallelism to the mix.

Haskell was designed from the start to support easy parallelisation, and since GHC 6.6, multithreaded code will run transparently on multicore systems using as many cores as you specify. Let's look at how we'd parallelise our little program to exploit multiple cores. We'll use an explicit threading model, via Control.Concurrent. You can also make your code implicitly parallel, using Control.Parallel.Strategies, but we'll leave that for another tutorial.

Here's the source, for you to ponder. First some imports:

    import Data.Set hiding (map)
    import Data.Maybe
    import Data.Char
    import Text.Printf
    import System.IO
    import System.Environment
    import Control.Concurrent
    import Control.Monad

The entry point, modified to break the word list into chunks, and then dispatching a chunk to each thread:

    main = do
        (f, g, n) <- readFiles
        let dict = fromList (lines f)
            work = chunk n (words g)
        run n dict work

The 'run' function sets up a channel between the main thread and all children thread ('errs'), and prints spelling errors as they arrive on the channel from the children. It then forks off 'n' children threads on each piece of the work list:

    run n dict work = do
        chan <- newChan
        errs <- getChanContents chan    -- errors returned back to main thread
        mapM_ (forkIO . thread chan dict) (zip [1..n] work)
        wait n errs 0

The main thread then just waits on all the threads to finish, printing any spelling errors they pass up:

    wait n xs i = when (i < n) $ case xs of
        Nothing : ys -> wait n ys $! i+1
        Just s  : ys -> putStrLn s >> wait n ys i

Each thread spell checks its own piece of the work list. If it finds a spelling error, it passes the offending word back over the channel to the main thread.

    thread chan dict (me, xs) = do
        mapM_ spellit xs
        writeChan chan Nothing

     where
        spellit w = do
            when (spell dict w) $
                writeChan chan . Just $ printf "Thread %d: %-25s" (me::Int) w

The 'spell' function is simplified a little:

    spell d w = w `notMember` d

which we could also write as:

    spell = flip notMember

We modify the readFiles phase to take an additional numeric command line argument, specifying the number of threads to run:

    readFiles = do
        [s,n] <- getArgs
        f     <- readFile "/usr/share/dict/words"
        g     <- readFile s
        return (f,g, read n)

We compile this with the GHC SMP parallel runtime system:

    $ ghc -O --make -threaded Spell.hs -o spell

Now, we can run 'n' worker threads (lightweight Haskell threads), mapped onto 'm' OS threads. Since I'm using a 4 core linux server, we'll play around with 4 OS threads. First, running everything in a single thread:

    $ time ./spell test.txt 1 +RTS -N1
    ...
    Thread 1: week:                    
    Thread 1: IO!
    ./spell test.txt 1 +RTS -N1 99% cpu 2.533 total

Ok, now we change the command line flag to run it with 4 OS threads, to try to utilise all 4 cores:

    $ time ./spell 4 +RTS -N4
    ...
    Thread 2: week:                    
    Thread 3: IO!
    ./spell test.txt 4 +RTS -N4 111% cpu 2.335 total

Ok. Good... A little bit faster, uses a little bit more cpu. It turns out however the program is bound currently by the time spent in the main thread building the initial dictionary. Actual searching time is only some 10% of the running time. Nonetheless, it was fairly painless to break up the initial simple program into a parallel version.

If the program running time was extended (as the case for a server), the parallelism would be a win. Additionally, should we buy more cores for the server, all we need to is change the +RTS -N argument to the program, to start utilising these extra cores.

Next week

In upcoming tutorials we'll look more into implicitly parallel programs, and the use of the new high performance ByteString data type for string processing.

/home :: /haskell :: permalink :: rss

2006-12-16

Programming Haskell

Its about time we got some job done in Haskell, eh? Now, one of my favourite programming books as an undergraduate was the Camel Book, "Programming Perl". It was full of lots of practical examples of Perl code, written well. (And I'm grateful to Larry Wall, Tom Christiansen and Randal Schwartz for writing the book that made programming fun).

So what would it look like if we wrote a Haskell tutorial in this style? Let's have at it!

Getting started

Like some languages Haskell can be both compiled and interpreted. The most widely used implementation of Haskell currently is GHC, which provides both an optimising native code compiler, and an interactive bytecode interpreter. I'll be using GHC (or its interactive front end, GHCi, for all code. So grab a copy of GHC now, from your package system, or the GHC home page.

Start up GHCi:

    $ ghci
       ___         ___ _
      / _ \ /\  /\/ __(_)
     / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
    / /_\\/ __  / /___| |      http://www.haskell.org/ghc/
    \____/\/ /_/\____/|_|      Type :? for help.

    Loading package base ... linking ... done.
    Prelude>

The interpreter now waits for your code fragments. The "Prelude" prompt indicates which library modules are in scope, and in this case, only the basic language module, known as the Prelude.

Now we can start running Haskell code.

    Prelude> "G'day, world!"
    "G'day, world!"

    Prelude> putStrLn "G'day, world!"
    G'day, world!

You can compile this code to a native binary using GHC, by writing in a source file:

    main = putStrLn "G'day, world!"

and then compiling the source to native code. Assuming your file is A.hs:

    $ ghc A.hs

This produces a new executable, ./a.out (a.out.exe on windows), which you can run like any other program on your system:

    $ ./a.out 
    G'day, world!

Variables

We can name arbitrary fragments of Haskell using variables. Like so:

    phrase = "G'day, world!"
    main = putStrLn phrase

We don't have to define what type phrase is, as Haskell uses type inference to infer at compile time the types of all expressions in the program. As "G'day, world!" is a string, so must phrase be a string. There are a bunch of basic types of values to play with. Here's a small sample:

    answer      = 42
    pi          = 3.141592653589793238462643383279502884197169399375105820974944592
    avocados    = 6.02e23
    pet         = "Lambda"
    sign        = "I love my " ++ pet
    coat        = "It costs $100"
    hence       = "whence"
    thence      = hence
    moles       = 2.5
    x           = moles * avocados
    c           = '#'
    pair        = (2.5, "lambdas")
    list        = [5,6,4,3,1]
    options     = Just "done"
    failed      = Nothing
    void        = ()

One important thing to remember is that Haskell's variables, like in most functional programming languages, are like variables in mathematics, and are just names for expressions. They're explicitly not mutable boxes, like in most imperative programming languages. As a result, you never need to worry about initialising a Haskell variable, nor do you need to worry about the current value in a variable: it always has the same value, and can always be replaced with its definition. So the following behaves just like it would in maths:

    answer      = 42
    another     = answer + 1
    more        = another + answer
    main        = print more

That is,

    $ ghc A.hs
    $ ./a.out 
    85

Now, since variables are just names for program fragments, you can evaluate Haskell on paper by replacing all names with their definition, until you reach a final value, like so:

    main = print more 
  =>
    main = print (another + answer)
  =>
    main = print ((answer + 1) + answer)
  =>
    main = print ((answer + 1) + 42)
  =>
    main = print ((42 + 1) + 42)
  =>
    main = print (43 + 42)
  =>
    main = print 85
  =>
    85

Having such a simple system for variables allows for a wide range of interesting optimisations, and makes understanding what a program is doing at any point much easier, since you don't have to worry about what state a variable might currently be in. (Of course, some problems need (theadsafe) mutable boxes, and they're available as a library for when you need that).

Collections

Often you need to collect a bunch of values together into some kind of collection. Haskell has many many collection types, but in particular, it has lists and finite maps, which operate much like arrays and hashes of the imperative world.

Lists

A list is just an ordered, um, list of values. They can be nested, and transformed in all sorts of ways, using functions. Assuming your file, A.hs, contains:

    home  = ["couch", "chair", "table", "stove"]

We can play around with this stuff like so:

    $ ghci A.hs

    *Main> home
    ["couch","chair","table","stove"]

    *Main> head home
    "couch"

    *Main> tail home
    ["chair","table","stove"]

    *Main> last home  
    "stove"

    *Main> home !! 2
    "table"

    *Main> reverse home
    ["stove","table","chair","couch"]

    *Main> map reverse home
    ["hcuoc","riahc","elbat","evots"]

Loading in the List library gives us some more functions to use:

    *Main> :m + Data.List

    *Main Data.List> intersperse "#" home
    ["couch","#","chair","#","table","#","stove"]

    *Main Data.List> concat (intersperse "#" home)
    "couch#chair#table#stove"

    *Main Data.List> home \\ ["table","stove"]
    ["couch","chair"]

Finite Maps

Finite maps (or maps) are the lookup tables of purely functional programming. Whenever you'd use some kind of hash in an imperative language, you can replace it with a Map in Haskell. Like hashes, maps can be seen as a table of pairs of keys and values. You can declare a new map:

    import Data.Map

    days = fromList
        [ ("Sun",  "Sunday"     )
        , ("Mon",  "Monday"     )
        , ("Tue",  "Tuesday"    )
        , ("Wed",  "Wednesday"  )
        , ("Thu",  "Thursday"   )
        , ("Fri",  "Friday"     )
        , ("Sat",  "Saturday"   ) ]

You can also convert a map to a list, using (well, duh!) toList:

    *Main> toList days
    [("Fri","Friday"),("Mon","Monday"),("Sat","Saturday"),("Sun","Sunday"),("Thu","Thursday"),("Tue","Tuesday"),("Wed","Wednesday")]

Note that they come out unordered, just like in hashes. If you just want the keys of the map:

    *Main> keys days
    ["Fri","Mon","Sat","Sun","Thu","Tue","Wed"]

    *Main> elems days
    ["Friday","Monday","Saturday","Sunday","Thursday","Tuesday","Wednesday"]

Since maps are a good structure for looking up values, you can search them using the lookup function. This function returns the element, if found:

    *Main> Data.Map.lookup "Tue" days
    "Tuesday"

Since the name 'lookup' is also used by a list function of similar purpose in the Prelude, we use the qualified name here to disambiguate which 'lookup' to use.

On failure

But what happens if the key is not found? (Feel free to skip this section if you don't care about errors yet) lookup will then fail, and how it fails depends on what kind of failure you want. Haskell goes to great lengths to make programming for failure flexible. For example, to fail with an exception:

    *Main> Data.Map.lookup "Thor" days
    *** Exception: user error (Data.Map.lookup: Key not found)

Which is the same as failing with an IO error. We can specify this specifically with a type annotation, to say "fail with an IO error":

    *Main> Data.Map.lookup "Thor" days :: IO String
    *** Exception: user error (Data.Map.lookup: Key not found)

Often you might instead prefer that some special value is returned on failure:

    *Main> Data.Map.lookup "Thors" days :: Maybe String
    Nothing

Maybe you'd just like an empty list:

    *Main> Data.Map.lookup "Thor" days :: [String]
    []

Finally, you can always provide an explicit default value:

    *Main> findWithDefault "Not found" "Thor" days            
    "Not found"

Failure is entirely under your control!

Actions

Now, real programs interact with the outside world. They call functions which do IO, as a side effect, and then return some value. In Haskell, functions with side effects are often called actions, to distinguish them from normal Haskell functions (which behave like mathematical functions: they take inputs and return a result, with no side effects). Programming with side effects is carefully handled in Haskell, again to control the possibility of errors, and all functions which have side effects have a special type: the IO type.

For example, the function to print a string has the following type (and you can ask the interpreter for the type interactively):

    Prelude> :t putStr
    putStr :: String -> IO ()

which tells you that this function takes a String as an argument, does some IO side effect, and returns the null value. It is equivalent to the following C type:

    void putStr(char *);

but with a bit of extra information, namely, that the function does some IO. We would print out some element of our map like so:

    main = print ("Tue in long form is " ++ findWithDefault "Not found" "Tue" days)

    *Main> main
    "Tue in long form is Tuesday"

An example

One of the classic programming puzzles for introducing real world problems is the 'class grade' problem. You have a text file containing a list of student names and their grades, and you'd like to extract various information and display it. In deference to The Camel Book, we'll follow this lead, and start with a file "grades", containing something like this:

    Alonzo 70
    Simon 94
    Henk 79
    Eugenio 69
    Bob 80
    Oleg 77
    Philip 73
    ...

Student's appear multiple times, with entries for each of their subjects Let's read this file, populate a map with the data, and print some statistical information about the results. First thing to do is import some basic libraries:

    import Data.Char
    import Data.Maybe
    import Data.List
    import Data.Map hiding (map)
    import Text.Printf

And now here's the entire program, to read the grades file, compute all the averages, and print them:

    main = do
        src <- readFile "grades"
        let pairs   = map (split.words) (lines src)
            grades  = foldr insert empty pairs
        mapM_ (draw grades) (sort (keys grades))
      where
        insert (s, g) = insertWith (++) s [g]
        split [name,mark] = (name, read mark)

    draw g s = printf "%s\t%s\tAverage: %f\n" s (show marks) avg
      where
        marks = findWithDefault (error "No such student") s g
        avg   = sum marks / fromIntegral (length marks) :: Double

Running it

How do we run this program? There's lots of ways:

Compile it to native code

    $ ghc -O Grades.hs

    $ ./a.out 
    Alonzo  [70.0,71.0]     Average: 70.5
    Bob     [80.0,88.0]     Average: 84.0
    Eugenio [69.0,98.0]     Average: 83.5
    Henk    [79.0,81.0]     Average: 80.0
    Oleg    [77.0,68.0]     Average: 72.5
    Philip  [73.0,71.0]     Average: 72.0
    Simon   [94.0,83.0]     Average: 88.5

Run it in the bytecode interpreter

    $ runhaskell Grades.hs
    Alonzo  [70.0,71.0]     Average: 70.5
    Bob     [80.0,88.0]     Average: 84.0
    Eugenio [69.0,98.0]     Average: 83.5
    Henk    [79.0,81.0]     Average: 80.0
    Oleg    [77.0,68.0]     Average: 72.5
    Philip  [73.0,71.0]     Average: 72.0
    Simon   [94.0,83.0]     Average: 88.5

Execute it interactively

    $ ghci Grades.hs
    Prelude Main> main
    Alonzo  [70.0,71.0]     Average: 70.5
    Bob     [80.0,88.0]     Average: 84.0
    Eugenio [69.0,98.0]     Average: 83.5
    Henk    [79.0,81.0]     Average: 80.0
    Oleg    [77.0,68.0]     Average: 72.5
    Philip  [73.0,71.0]     Average: 72.0
    Simon   [94.0,83.0]     Average: 88.5

Make the script executable

Under unix, you can use the #! convention to make a script executable. Add the following to the top of the source file:

    #!/usr/bin/env runhaskell

And then set the script executable:

    $ chmod +x Grades.hs

    $ ./Grades.hs       
    Alonzo  [70.0,71.0]     Average: 70.5
    Bob     [80.0,88.0]     Average: 84.0
    Eugenio [69.0,98.0]     Average: 83.5
    Henk    [79.0,81.0]     Average: 80.0
    Oleg    [77.0,68.0]     Average: 72.5
    Philip  [73.0,71.0]     Average: 72.0
    Simon   [94.0,83.0]     Average: 88.5

Next week: more IO!

/home :: /haskell :: permalink :: rss

2006-12-15

More on syntax

Yesterday's article, about programming language syntax in an ideal world, provoked a small storm of controversy over on reddit.

Unfortunately, since there were a couple of other Haskell articles on the same page, the article's argument, that languages of the future should look like pseudocode, was somewhat lost. A number of readers assumed the article was a piece of Haskell advocacy, which wasn't the intention. Instead, it was a plea for the syntax of future languages to be optimised for writing by humans, and not for parsing by compilers. That is:

The languages of the future should look like pseudocode

I'll cache the original example of "future syntax" here, so you can keep it in mind:

    fib 0 = 0
    fib 1 = 1
    fib n = fib (n-1) + fib (n-2)

Note that this is an argument for syntax, independent of whatever language you want to use. And it ignores all efficiency issues in current languages (that's a job for the compiler designers of the future).

Of course, programmers being very practical people took this as an opportunity to demonstrate how their language of choice (and being reddit, this means lisp) already provides this "future syntax". Well, sort of:

    (define fib
      (match-lambda
        [0 0]
        [1 1]
        [n (+ (fib (- n 1)) (fib (- n 2)))]))

    (define (fibo x)
      (cond
        ((= 0 x) 0)
        ((= 1 x) 1)
        (else (+ (fibo (- x 1))
                 (fibo (- x 2))))))

    (define (fibo x)
      (case x ((0 1) 1)
              (else (+ (fibo (- x 1)) (fibo (- x 2))))))

    (defmethod fib ((n (eql 0))) 0)
    (defmethod fib ((n (eql 1))) 1)
    (defmethod fib (n)
          (+ (fib (- n 1)) (fib (- n 2))))

    (define (fibo x)
       (if (< x 2)
           x
         (+ (fibo (- x 1)) (fibo (- x 2)))))

    (let ((fibs (make-hash-table)))
       (defun fib (n)
          (if (< n 2) n
              (or (gethash n fibs)
              (setf (gethash n fibs)
                    (+ (fib (- n 1)) (fib (- n 2))))))))

Others focused on improving the particular languages I cited. The best contribution on this issue came from Greg Buchholz, who illustrated neat implementations in a variety of languages, far more flattering than the examples I used (which, incidentally, were not my creation, they were wikipedia's ....)

Perhaps unsurprisingly (in hindsight) plenty of readers assumed the plea for future syntax was a plea for "Haskell now!", and talked at length about efficiency concerns, and the lack of real world code/IO in the examples:

    all of these are monstrously inefficient...

    > let fibs = 1 : 1 : zipWith (+) (fibs) (tail fibs)
    > take 10 fibs
    [1,1,2,3,5,8,13,21,34,55]

    That's surely more efficient

    And kinda memory eating :-)

    You'd better choose something like

    fib = fib_iter 1 1
      where
        fib_iter x y n | n == 1 = x
                       | otherwise = fib_iter y (x + y) (n - 1)

    there's a much more efficient way to do it in Hs using the language's
    lazyness (the ability to recursively define infinite lists):

    fibs = 1 : 1 : zipWith (+) (fibs) (tail fibs)

And, this being the interblargh, some people missed the point entirely:

    This code doesn't have any resemblance to everyday programming however -
    which generally involves trawling through databases, string matching, and I/O.
    What is the point of showing examples which do not reflect reality of
    most(?) programing tasks?

    I've been a ASM/C++/C# developer for over a year now. But seriously, how
    stupid is this? You are going to tout a language strength due to how
    efficiently it can work out the Fibonacci sequence?
    That is not a real-world example, and if you were to use a real-world
    example, this code would be horrific.

So what can be made of all this? I think that it can be objectively stated that current programming languages, for the most part, don't look like ideal pseudocode languages, with syntax often bogged down in language features, and parser hints. Overuse of keywords (defun,define,def,let rec), extraneous typing hints (like $x), and parser hints (';' and '{' '}'), all obscure the underlying code. Step back for a second: you're 10 years old, and you're hoping to become a cool hacker. What syntax do you wish you could use?

Now Python, Haskell and the Lisp family seem to be the closest to where we want to go, but they're not there yet. Even in Haskell, where years were spent sorting out the syntax, and as a result has some nice features for human programmers (pattern matching on data structures, and lovely whitespace), you can still end up writing awkward code (and this is entirely my own fault):

    withGS :: (g -> (g -> LB ()) -> LB ()) -> ModuleT (GlobalPrivate g p) LB ()
    withGS f = withMS $ \state writer ->
      f (global state) $ \g -> writer $ state { global = g }

    readGS :: ModuleT (GlobalPrivate g p) LB g
    readGS = global `fmap` readMS

    writeGS :: g -> ModuleT (GlobalPrivate g p) LB ()
    writeGS g = withGS (\_ writer -> writer g)

    forkLB :: LB a -> LB ()
    forkLB f = (`liftLB` f) $ \g -> do
                forkIO $ do
                    timeout (15 * 1000 * 1000) g
                    return ()

    liftLB :: (IO a -> IO b) -> LB a -> LB b
    liftLB f = LB . mapReaderT f . runLB

So finally, if you're in any way involved in the design of syntax for a new language, or fixing up the syntax of existing languages, please think hard about what you would have wished to program in, had you never been exposed to the languages of today. Programming languages don't have to be ugly and noisy! Concentrate on what the human author of the code would like to write: code that is aesthetically pleasing and even beautiful, and don't compromise your syntax, for the short term gain of making parsing easier.

/home :: /haskell :: permalink :: rss

2006-12-14

On syntax

Forget about what programming languages you know. What syntax do you wish you could write in? (Names have been changed to protect the innocent):

This?

    function generate_fibonacci_sequence( $length ) {
        for( $l = array(1,1), $i = 2, $x = 0; $i < $length; $i++ )
                $l[] = $l[$x++] + $l[$x];
    
        return $l;
    }

What about this?

    sub fibo
    {
        my ($n, $a, $b) = (shift, 0, 1);
        ($a, $b) = ($b, $a + $b) while $n-- > 0;
        $a;
    }

Getting better?

    def fib(n):
        if   n < 2: return 1
        else      : return fib(n - 1) + fib(n - 2)

Hmm, isn't this going backwards?

    (define fibo
     (lambda (x)
       (if (< x 2)
         x
         (+ (fibo (- x 1)) (fibo (- x 2))))))

Finally:

    fib 0 = 0
    fib 1 = 1
    fib n = fib (n-1) + fib (n-2)

If the programming languages of the future don't look like the last example, I think we've failed. Programming languages are notation for describing solutions to problems. If that notation is verbose, clunky or full of special cases it should be abandoned in favour of a more concise notation. Real languages should look like pseudcode!

My suspicion is that in 50 years time we'll look back on the syntax of today's programming languages and think them highly bizarre, like looking back at Frege's logic notation (pdf):

Syntax has come along way since the 1960s, but there's still room for much improvement, looking at the widely used languages employed today. The designers of many (not all) popular languages seem to forget that syntax should be optimised for writing by humans, not for parsing by machines.

/home :: /haskell :: permalink :: rss

Type declarations made mechanical

Haskell uses type inference, so explicit type declarations are rarely required. This:

    import Control.Monad.Fix
    fibs = fix ((1:) . scanl (+) 1)
    main = print (take 20 fibs)

Is just as good as:

    import Control.Monad.Fix

    fibs :: [Integer]
    fibs = fix ((1:) . scanl (+) 1)

    main :: IO ()
    main = print (take 20 fibs)

Running this:

    $ runhaskell A.hs
    [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]

Now, once programs reach a decent size, the advantage of type declarations appears: it functions as machine-checkable documentation. So people new to your code can more quickly work out what the code is doing.

Contrast this (real world) Haskell code from a network client:

    accessorMS decompose f = withMS $ \s writer ->
      let (t,k) = decompose s in f t (writer . k)

And with type annotations:

    accessorMS :: (s -> (t, t -> s))
               -> (t -> (t -> LB ()) -> LB a)
               -> ModuleT s LB a

    accessorMS decompose f = withMS $ \s writer ->
      let (t,k) = decompose s in f t (writer . k)

So at least you know have some idea of what that code does.

There's an intuition here: type declarations are good, but they can be mechanically inferred. Let's automate that then! Here's a quick script I use every day. It just passes your top level declaration to ghci, and asks it to infer the type. The resulting type signature is spliced back in to your code:

    #!/bin/sh

    # input is a top level .hs decls

    FILE=$*
    DECL=`cat`
    ID=`echo $DECL | sed 's/^\([^ ]*\).*/\1/'`
    echo ":t $ID" | ghci -v0 -cpp -fglasgow-exts -w $FILE
    echo $DECL

Save this as an executable shell file in your path. Now you can call this from your editor. For example, from vim, you'd use the following .vimrc:

    :map ty :.!typeOf %

Hitting :ty while the cursor is positioned on top of a top level declaration takes your code from:

    writePS who x = withPS who (\_ writer -> writer x)

to this:

    writePS :: forall p g. String -> Maybe p -> ModuleT (GlobalPrivate g p) LB ()
    writePS who x = withPS who (\_ writer -> writer x)

I can't emphasise enough how useful this is. So, steal this code and improve your productivity!

/home :: /haskell :: permalink :: rss

Type inference by hand

IPI asked this how #haskell:

How can I mentally infer the type of the composition of these functions?

    a :: [[Char]] -> [([Char],Int,Int)]
    b :: [Int] -> [(Int,Bool)]
    c :: [([Char],Int,Int)] -> ([Char],Int)
    d :: [(Int,Bool)] -> [[Char]]

That is, what is the type of:

    c . a . d . b

The first thing I'd do is rename the tricky types:

    a :: [String] -> [(String,Int,Int)]
    b :: [Int] -> [(Int,Bool)]
    c :: [(String,Int,Int)] -> (String,Int)
    d :: [(Int,Bool)] -> [String]

and then introduce some type synonyms:

    type A = [String]
    type B = [(String,Int,Int)]
    type C = [(Int,Bool)]

Now we can rename the complex types to a more symbolic form:

    a :: A -> B
    b :: [Int] -> C
    c :: B -> (String,Int)
    d :: C -> A

and now we can see how to combine the pieces. It's a little jigsaw puzzle!

    b :: [Int] -> C

so there's only one option to compose with b, namely

    b :: [Int] -> C
    d :: C -> A

composing these, and we hide the intermediate result:

    d . b :: [Int] -> A

now, we need something that takes an A:

    a :: A -> B

So the result of something composed with a, will be a value of type B:

    a . d . b :: [Int] -> B

and finally, something that takes a B:

    c :: B -> (String,Int)

leading to:

    c . a . d . b :: [Int] -> (String,Int)

and we're done!

/home :: /haskell :: permalink :: rss

Theory and practice: where Haskell is heading

Seen on the haskell cafe:

One of the most exciting aspects of Haskell is that pragmatic interest in the language has been growing steadily without academic interest in it declining in any way. As a result, we have a language that represents an interesting mixture of good and useful, although it is not entirely clear yet how long this nice balance will hold.

We have had lots of languages that were intended to be well-designed (good, beautiful, ..), but never much used in practice, and we have also had lots of languages that were intended to be pragmatic (practical, useful, ..), without much interest in theoretical beauty. But how many languages are there where the two aspects have converged, with both communities still actively interested in the result?

Its exciting working in Haskell right now, with new practical tools, libraries and jobs appearing daily, and yet at the same time the theory guys keep cranking out new abstractions for us: applicative functors, software transactional memory, stream fusion...

We have the technology!

/home :: /haskell :: permalink :: rss

2006-12-11

The lambda revolution

This article describes the entire process of writing, packaging and releasing a new Haskell library. If you love programming Haskell, and want to keep doing so, you'll write some libraries and release them. The lambda revolution isn't going to happen without your code!

Creating Data.DList, it took about 60 minutes. People often need O(1) append, so about time we had a library for it. Here's the entire transcript of creating a new Haskell project and releasing it to a public server. I then created a web page for the project, and announced it on the haskell@ mailing list. I used the following tools:

Ready? Here we go!

Script started on Mon Dec 11 13:47:40 2006

$ cd dons/src
$ mkdir dlist
$ cd dlist

$ mkcabal
Project name: dlist
What license ["GPL","LGPL","BSD3","BSD4","PublicDomain","AllRightsReserved"] ["BSD3"]:
What kind of project [Executable,Library] [Executable]: Library
Is this your name? - "Don Stewart " [Y/n]:
Is this your email address? - "<dons@cse.unsw.edu.au>" [Y/n]:
Created Setup.lhs and dlist.cabal

$ cp ~/dons/src/yi/Yi/Lexers.hs .
$ ls
Lexers.hs   Setup.lhs   dlist.cabal

$ mkdir Data
$ mv Lexers.hs Data/DList.hs

$ vim Data/DList.hs
$ vim dlist.cabal

$ darcs init
$ darcs add dlist.cabal Setup.lhs Data/ Data/DList.hs
$ darcs record --all
Shall I record this change? (1/?)  [ynWsfqadjkc], or ? for help: a
What is the patch name? Initial import of dlist package
Do you want to add a long comment? [yn]n
Finished recording patch 'Initial import of dlist package'

$ runhaskell Setup.lhs configure --prefix=/home/dons
Setup: Warning: No exposed modules or executables in this package.
Setup: Error: Non-empty library, but empty exposed modules list. 
Cabal may not build this library correctly

$ vim dlist.cabal

$ runhaskell Setup.lhs configure --prefix=/home/dons
Configuring dlist-0.1...
configure: /home/dons/bin/ghc-pkg
configure: Dependency base-any: using base-2.0
configure: Using install prefix: /home/dons
configure: Binaries installed in: /home/dons/bin
configure: Libraries installed in: /home/dons/lib/dlist-0.1/ghc-6.6
configure: Private binaries installed in: /home/dons/libexec
configure: Data files installed in: /home/dons/share/dlist-0.1
configure: Using compiler: /home/dons/bin/ghc
configure: Compiler flavor: GHC
configure: Compiler version: 6.6
configure: Using package tool: /home/dons/bin/ghc-pkg
configure: Using ar found on system at: /usr/bin/ar
configure: Using haddock found on system at: /home/dons/bin/haddock
configure: No pfesetup found
configure: Using ranlib found on system at: /usr/bin/ranlib
configure: Using runghc found on system at: /home/dons/bin/runghc
configure: Using runhugs found on system at: /home/dons/bin/runhugs
configure: Using happy: /home/dons/bin/happy
configure: Using alex: /home/dons/bin/alex
configure: Using hsc2hs: /home/dons/bin/hsc2hs
configure: No c2hs found
configure: Using cpphs: /home/dons/bin/cpphs
configure: No greencard found
Preprocessing library dlist-0.1...
Building dlist-0.1...

$ runhaskell Setup.lhs build
Preprocessing library dlist-0.1...
Building dlist-0.1...
Data/DList.hs:17:2: parse error on input `DList'

$ vim Data/DList.hs +17

$ runhaskell Setup.lhs build
Preprocessing library dlist-0.1...
Building dlist-0.1...
[1 of 1] Compiling Data.DList       ( Data/DList.hs, /usr/obj/cabal/Data/DList.o )
/usr/bin/ar: creating /usr/obj/cabal/libHSdlist-0.1.a

$ runhaskell Setup.lhs install
Installing: /home/dons/lib/dlist-0.1/ghc-6.6 & /home/dons/bin dlist-0.1...
Registering dlist-0.1...
Reading package info from ".installed-pkg-config" ... done.
Saving old package config file... done.
Writing new package config file... done.

$ darcs amend-record
Mon Dec 11 14:20:56 EST 2006  Don Stewart <dons@cse.unsw.edu.au>
  * Initial import of dlist package
Shall I amend this patch? [yNvpq], or ? for help: y
hunk ./Data/DList.hs 15
-module Data.DList
+module Data.DList (
Shall I add this change? (1/?)  [ynWsfqadjkc], or ? for help: y
hunk ./dlist.cabal 10
+Exposed-modules:     Data.DList
Shall I add this change? (2/?)  [ynWsfqadjkc], or ? for help: y
Finished amending patch:
Mon Dec 11 14:21:53 EST 2006  Don Stewart <dons@cse.unsw.edu.au>
  * Initial import of dlist package

$ runhaskell Setup.lhs haddock
Preprocessing library dlist-0.1...
Running Haddock for dlist-0.1...
Warning: cannot use package base-2.0:
   HTML directory /home/dons/share/ghc-6.6/html/libraries/base does not exist.
/usr/obj/cabal/tmp/Data/DList.hs:"/usr/obj/cabal/tmp/Data/DList.hs": 37:1: parse error in doc string: [TokPara]

$ vim Data/DList.hs +37

$ runhaskell Setup.lhs haddock
Preprocessing library dlist-0.1...
Running Haddock for dlist-0.1...
Warning: cannot use package base-2.0:
   HTML directory /home/dons/share/ghc-6.6/html/libraries/base does not exist.

$ w3m -dump dist/doc/html/index.html 
 dlist-0.1:  Contents Index
dlist-0.1:
Differences lists: lists supporting efficient append
Modules
show/hideData
Data.DList
Produced by Haddock version 0.8

$ mkdir tests
$ vim tests/Properties.hs
$ vim tests/Parallel.hs

$ cd tests
$ runhaskell Properties.hs 
2: singleton                : OK, 1000 tests.
1: model                    : OK, 1000 tests.
2: snoc                     : OK, 1000 tests.
1: append                   : OK, 1000 tests.
$ cd ..

$ darcs add tests
$ darcs add tests/Parallel.hs tests/Properties.hs
$ darcs whatsnew -s
M ./Data/DList.hs -1 +1
A ./tests/
A ./tests/Parallel.hs
A ./tests/Properties.hs

$ darcs record --all
...
Shall I record this change? (6/?)  [ynWsfqadjkc], or ? for help: y
What is the patch name? Add QuickChecks
Finished recording patch 'Add QuickChecks'

$ darcs setpref test "cd tests ; runhaskell Properties.hs"
Changing value of test from '' to 'cd tests ; runhaskell Properties.hs'
changepref test
cd tests ; runhaskell Properties.hs
Shall I record this change? (1/?)  [ynWsfqadjkc], or ? for help: y
What is the patch name? pre   run tests on each commit
Do you want to add a long comment? [yn]n
Running test...
2: singleton                : OK, 1000 tests.
2: snoc                     : OK, 1000 tests.
1: model                    : OK, 1000 tests.
2: append                   : OK, 1000 tests.
Test ran successfully.
Looks like a good patch.
Finished recording patch 'run tests on each commit'

$ darcs whatsnew -s
No changes!

$ darcs tag
What is the version name? 0.1   dlist 0.1
Finished tagging patch 'TAG dlist 0.1'

$ runhaskell Setup.lhs sdist
Building source dist for dlist-0.1...
*** Exception: LICENSE: copyFile: does not exist (No such file or directory)

$ cp ~/pqc/LICENSE .
$ vim LICENSE 

$ darcs add LICENSE
$ darcs record
addfile ./LICENSE
Shall I record this change? (1/?)  [ynWsfqadjkc], or ? for help: y
...
Shall I record this change? (2/?)  [ynWsfqadjkc], or ? for help: y
What is the patch name? add license
Do you want to add a long comment? [yn]n
Running test...
2: singleton                : OK, 1000 tests.
2: snoc                     : OK, 1000 tests.
1: model                    : OK, 1000 tests.
2: append                   : OK, 1000 tests.
Test ran successfully.
Looks like a good patch.
Finished recording patch 'add license'

$ vim README
$ darcs add README
$ darcs record --all
What is the patch name? Add a readme
Do you want to add a long comment? [yn]n
Running test...
2: singleton                : OK, 1000 tests.
2: snoc                     : OK, 1000 tests.
1: model                    : OK, 1000 tests.
2: append                   : OK, 1000 tests.
Test ran successfully.
Looks like a good patch.
Finished recording patch 'Add a readme'

$ ls
Data        LICENSE     README      Setup.lhs   _darcs      dist        dlist.cabal tests

$ darcs dist -d dlist-0.1. 
Created dist as dlist-0.1.tar.gz

$ darcs whatsnew -s
No changes!

$ darcs put pill00.cse.unsw.edu.au:/home/chakcvs/darcs/dlist/
Finished applying...

Script done on Mon Dec 11 15:04:54 2006

/home :: /haskell :: permalink :: rss

Quick interpreters with the Reader monad

Rolling your own interpreters in Haskell is one of the joys of using this language, due to the support for:

algebraic data types
pattern matching on data
monadic environments
higher order functions

Here's a quick, complete interpreter for a simple math language of binary operators, with support for variables. It uses the Reader monad for a natural embedding of lexical scoping. Writing interpreters in Haskell is just so easy, no wonder Pugs was written in it.

import qualified Data.Map as M
import Control.Monad.Reader

--
-- A syntax tree type for simple math, with variables
--
data Exp = IntE Int
         | OpE  Op Exp Exp
         | VarE String
         | LetE String Exp Exp

type Op = Int -> Int -> Int

--
-- The interpreter
--
eval (IntE n)       = return n
eval (OpE op e1 e2) = liftM2 op (eval e1) (eval e2)

eval (VarE x)       = do
    env <- ask
    return $ maybe (error "undefined variable") id (M.lookup x env)

eval (LetE x e1 e2) = do
    env <- ask
    v   <- eval e1
    local (M.insert x v) (eval e2)

--
-- Run the interpreter
--
main = print $ runReader (eval test) M.empty

--
-- A simple text expression:
--
--      let x =
--          let y = 5 + 6
--          in y / 5
--      in x * 3
-- 
-- ==>  6
--
test = LetE "x" (LetE "y" (OpE (+) (IntE 5) (IntE 6))
                      (OpE div y (IntE 5)))
                (OpE (*) x (IntE 3))
    where x = VarE "x"
          y = VarE "y"

Let's run the test program:

$ runhaskell A.hs
6

Adding a front end (with Parsec), and then implementing the rest of Ruby, is left as an exercise for the reader.

/home :: /haskell :: permalink :: rss

2006-12-10

The computational view of monads

I originally wrote this article as a comment to the programming subreddit, however I've found myself referring to it a few times since then, and reproduce it here for posterity. It attempts to motivate monads for people coming from imperative, stateful languages (already living in IO).

In Haskell, a purely functional language, we use monads to provide a library-based approach to sequential, imperative evaluation: the famous IO monad. No wonder Haskell people talk about it a lot: library-based imperative programming in a purely functional language is fun.

But monads don't stop there! In languages with builtin sequential evaluation the IO monad must seem boring or silly -- no wonder it all seems a puzzle.

This is where it gets interesting. Since monads provide an abstraction over evaluation order, they can be used in other languages (and Haskell) to implement *non*-sequential evaluation order, as a library, also!

So instead of special language support for, say, exceptions and callcc, you just implement monads as a library, and get:

optional results (Maybe)
encapsulated state (ST or State)
alternative results (Either)
exceptions (Error)
continuations (Cont)
concurrency
backtracking and non-deterministic evaluation
sofware transactional memory
and then on to Arrow-based evaluation graphs
or, say, monadic region memory management
even logic programming
or the Pi calculus

Perhaps this gives a flavour for non-Haskell people for why they're so useful. Continuations as a library! Non-deterministic evaluation as a library!

So just as its well known that with continuations you can implement threaded control or exceptions as a library, with monads you can implement continuations themselves as a library, along with all the other fun toys. Then, using monad transformers, you can compose separate monads, providing, say, exception handling over a state encapsulation: you can specify precisely what programming language concepts a particular program requires. Here is:

Users of other languages are likely, and rightly so, to be less interested in the solution to IO based on monads, and far far more interested in things like library-based continuations or exceptions (see for example monadic delimited continuations in OCaml).

Hope that gives a hint towards what this monad stuff is all about.

/home :: /haskell :: permalink :: rss

2006-12-07

Playing with the Devil's Bullets

    Now, George was a good straight boy to begin with, but there was bad blood in him
    Someway he got into the magic monads and that leads straight to Devil's work, 
    Just like marijuana leads to heroin
    You think you can take them monads or leave 'em, do you?
    Just save a few for your bad days

    Well, now, we all have those bad days when you can't hack for shit.
    The more of them monads you use, the more bad days you have without them
    So it comes down finally to all your days being bad without the monads
    It's magics or nothing

    Time to stop chippying around and kidding yourself,
    Kid, you're hooked, heavy as lead

/home :: /haskell :: permalink :: rss

The Devil's Bullets: William Burroughs, Tom Waits and Haskell.

Reading "Why Functional Programming is a Silver Bullet", and for some reason I'm reminded of the Robert Wilson, Tom Waits and William S. Burroughs play, "The Black Rider". The Haskell promise can be elegantly phrased:

    You can never go hunting
    With just a flintlock and a hound
    You won't go home with a bunting
    If you blow a hundred rounds

    It takes much more than wild courage
    Or you'll hit just the tattered clouds
    You must have just the right bullets
    And the first one's always free

    You must be careful in the forest
    Broken glass and rusty nails
    If you're to bring back something for us
    I have bullets for sale

    Why be a fool when you can chase away
    Your blind and your gloom
    I have blessed each one of these bullets
    And they shine just like a spoon

    To have sixty silver wishes
    Is a small price to pay
    They'll be your private little fishes
    And they'll never swim away

    I just want you to be happy
    That's my only wish
    I'll fix your wagon and your musket
    And the spoon will have his dish

    And I shudder at the thought of your
    Poor empty hunter's pouch
    So I'll keep the wind from your barrel
    And bless the roof of your house

Feel free to replace bullets with lambdas. Now, how to describe the addictive property of Haskell programming:

    Now, George was a good straight boy to begin with, but there was bad blood in him
    Someway he got into the magic bullets and that leads straight to Devil's work, 
    Just like marijuana leads to heroin
    You think you can take them bullets or leave 'em, do you?
    Just save a few for your bad days

    Well, now, we all have those bad days when you can't shoot for shit.
    The more of them magics you use, the more bad days you have without them
    So it comes down finally to all your days being bad without the bullets
    It's magics or nothing

    Time to stop chippying around and kidding yourself,
    Kid, you're hooked, heavy as lead

Haskell gives you access to the Devil's bullets.

(Lyrics by Tom Waits/William Burroughs, from The Black Rider)

/home :: /haskell :: permalink :: rss

2006-12-05

Building GHC in under 8 minutes

In response to my last post, Simon Marlow suggested building the first GHC stage with -O, hoping that an optimised GHC in turn will be faster compiling the libraries and second stage. Here's the result:

With the following build.mk:

SRC_HC_OPTS     = -H64m -Onot -fasm
GhcStage1HcOpts = -O -fasm
GhcStage2HcOpts = -Onot -fasm
GhcLibHcOpts    = -Onot -fasm
GhcLibWays      =
SplitObjs       = NO

On a 2 processor, 4 core linux machine, running with -j10:

make -j10 > /dev/null  1015.77s user 155.19s system 249% cpu 7:49.50 total

Great! A new GHC build record (I think). Can't wait till the 16 core machine arrives...

/home :: /haskell :: permalink :: rss

2006-12-03

Building GHC in under 10 minutes

People often complain at how long the GHC build process can be. This is even more of an issue for developers working on libraries, or the compiler itself, who need to quickly check their code typechecks and links inside the glorious maze that is GHC's builds system.

With the fairly recent GNU make -j fixes, and a decent multicore machine, you can cut your build times dramatically. Here's a transcript, with some hints on getting fast turnarounds from GHC builds:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 2.80GHz
stepping        : 8
cpu MHz         : 2800.184
cache size      : 2048 KB
bogomips        : 5604.39

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 2.80GHz
stepping        : 8
cpu MHz         : 2800.184
cache size      : 2048 KB
bogomips        : 5600.47

$ uname -msr
Linux 2.6.15.3-general i686

$ bzip2 -dc ghc-6.6-src.tar.bz2 | tar xf -
$ cd ghc-6.6
$ cat ../build.mk
SRC_HC_OPTS     = -H64m -Onot -fasm
GhcStage1HcOpts = -Onot -fasm
GhcStage2HcOpts = -Onot -fasm
GhcLibHcOpts    = -Onot -fasm
GhcLibWays      =
SplitObjs       = NO

$ mv ../build.mk ./mk/
$ autoreconf
$ ./configure
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking target system type... i686-pc-linux-gnu
Canonicalised to: i386-unknown-linux
...
config.status: creating config.mk
config.status: creating include/HsRegexPosixConfig.h
config.status: include/HsRegexPosixConfig.h is unchanged

$ time nice gmake -j10 > /dev/null
ghc: 25107068 bytes, 5 GCs, 125160/125160 avg/max bytes residency (1 samples), 63M in use, 0.00 INIT (0.00 elapsed), 0.05 MUT (0.53 elapsed), 0.02 GC (0.02 elapsed) :ghc
...
/usr/bin/ar: creating libHSghc.a
make -j10 > /dev/null  1299.72s user 163.13s system 250% cpu 9:44.08 total

$ compiler/stage2/ghc-inplace --version
The Glorious Glasgow Haskell Compilation System, version 6.6

And we're done! So, use GNU make's -j flag, disable optimisations, use the native code generator, and get as many cores as you can.

/home :: /haskell :: permalink :: rss

About

Real World Haskell

Next week

Administrivia

Chunks

Solipsistic philosophers

Getting to work

Getting in arguments

The environment

More on failure

GetOpt

A type for flags

Binding to command line flags

Parsing the flags

Implementing cat

Summary

The complete source

Administrivia

IO

Line oriented IO

On composition

More on lines

File IO

File Handles

An example: spell checking

Writing the results out

Extension: using SMP parallelism

Next week

Getting started

Variables

Collections

Lists

Finite Maps

On failure

Actions

An example

Running it

Compile it to native code

Run it in the bytecode interpreter

Execute it interactively

Make the script executable

Archives

Recommended

Blog Roll

Syndicate