This is part three in a series of tutorials on programming Haskell. You
can get up to speed by reading:
Today we'll look more into how Haskell interacts with its environment,
robust command line argument parsing, and writing a complete program.
Administrivia
Some clarifications on issues raised by yesterday's article.
Chunks
One issue pointed out was that the 'chunk' function was missing. This was semi-intentional.
Anyway, this function just splits a list into 'n' chunks:
chunk n xs = chunk' i xs
where
chunk' _ [] = []
chunk' n xs = a : chunk' n b where (a,b) = splitAt n xs
i = ceiling (fromIntegral (length xs) / fromIntegral n)
You may be able to write a neater one..
Solipsistic philosophers
Also, an amusing thread appeared on reddit,
regarding solipsistic philosopher programs:
Of course, a good optimizing compiler will replace your solipsistic
philosopher with a no-op.
A good optimising compiler, or any Haskell compiler :-) Since results
that are never required are not computed in Haskell, due to laziness, we
can write high performance solipsism simulators
all day long:
main = do
let largestNumber = last [1..]
return ()
Running our simulation of the philosopher-mathematician pondering some
thoughts on large numbers:
$ ghc A.hs
$ ./a.out
./a.out 0.00s user 0.01s system 100% cpu 0.003 total
Ok. Enough jokes. Show me the code!
Getting to work
Yesterday we implemented a few toy unix programs, including 'cat'. Today
we'll look at writing a complete cat program, but with a focus on interacting
properly with the environment and being careful about command line handling.
For our running examples, we'll consider the 'cat' and 'tac' programs. The
basic spec for 'cat' is:
The cat utility reads files sequentially, writing them to the standard
output. The file operands are processed in command-line order. If file
is a single dash (`-') or absent, cat reads from the standard input.
It's the 'id' function of the unix shell. BSD 'cat.c' is a 255 line C
program. From the man page we
can see it does more than just concatenate files. It can also:
- Numbers the output lines, starting at 1.
- Squeezes multiple adjacent empty lines
- Displays non-printing characters so they are visible.
Let's start by looking at the command line argument processing code.
Getting in arguments
The basic way to get arguments in a Haskell program is provided
by the System.Environment library. We can use the getArgs function:
Prelude> :m + System.Environment
Prelude System.Environment> :t getArgs
getArgs :: IO [String]
Prelude System.Environment> do a <- getArgs; print a
[]
Which is empty, since we didn't provide any arguments!
In a small program we can implement all argument handling using just
'getArgs', and some simple list
functions. For example, here's a basic 'tac' program, to reverse its input:
import System.Environment
import System.Exit
main = getArgs >>= parse >>= putStr . tac
tac = unlines . reverse . lines
parse ["-h"] = usage >> exit
parse ["-v"] = version >> exit
parse [] = getContents
parse fs = concat `fmap` mapM readFile fs
usage = putStrLn "Usage: tac [-vh] [file ..]"
version = putStrLn "Haskell tac 0.1"
exit = exitWith ExitSuccess
die = exitWith (ExitFailure 1)
This program concatenates and prints the contents of files in reverse (or reads
from stdin with no arguments), along with a couple of basic command line flags
for version and help strings. It's also reasonably careful about setting exit status on finishing,
using the functions from System.Exit. The actual core algorithm for 'tac' is a nice
pure Haskell function, and really all the hard work is done processing the
command line args.
Some example use:
$ ./tac -h
Usage: tac [-vh] [file ..]
$ ./tac -v
Haskell tac 0.1
$ ./tac A.hs B.hs
return n
print 1
n <- getLine
g = do
...
import System.Exit
import System.Environment
$ ./tac < A.hs
die = exitWith (ExitFailure 1)
exit = exitWith ExitSuccess
version = putStrLn "Haskell tac 0.1"
...
import System.Exit
import System.Environment
As you can see, once compiled it behaves like a normal unix utility, properly
dealing with stdin, with file arguments and the shell.
Note that getArgs doesn't return the program name. To get that we use:
Prelude System.Environment> :t getProgName
getProgName :: IO String
The environment
Many programs also make use of environment variables. We can get access
to the full shell 'env' using
Prelude System.Environment> :t getEnvironment
getEnvironment :: IO [(String, String)]
Which returns an association list, mapping environment variables to
their values.
We can stick this list straight into an efficient Map structure, for later use. Here's an interactive
example:
Prelude System.Environment> env <- do e <- getEnvironment; return (Data.Map.fromList e)
which we could also write as:
env <- Data.Map.fromList `fmap` getEnvironment
Once we've got the environment in a useful Map, we can inspect it using Map
lookups:
Prelude System.Environment> :t env
env :: Data.Map.Map String String
Prelude System.Environment> :t Data.Map.lookup
Data.Map.lookup :: (Ord k, Monad m) => k -> Data.Map.Map k a -> m a
That is, the lookup function takes some key, 'k', and a Map from keys to
elements of type 'a', and returns an element, if found, in some monad.
More on failure
You may recall from the first tutorial that the Map 'lookup' function will fail
if the key is not found. The particular way you wish it to fail depends on
which monad you use. You can tell this from the type of lookup. The
lookup :: (Monad m) => ... m a
syntax indicates that lookup is polymorphic in its monad: it will work for any
monad type, and its behaviour is determined by the particular instance of the monad
interface you ask for. When a lookup fails, it calls the 'fail' function for
the monad you're using. When a lookup is successful, it calls the 'return'
function of the same monad. Being 'polymorphic in a monad' really just means that
it will call which particular concrete monad 'subclass' you happen to be using.
Looking at the various useful monads for this, we can choose which failure behaviour we
would prefer. Here's the implementation of the 'fail' interface for a variety of
monads. It's up to you to pick which behaviour you'd like.
For Maybes, we get the null value, Nothing, on failure:
instance Monad Maybe where
return = Just
fail _ = Nothing
For Eithers, we get an error string:
instance (Error e) => Monad (Either e) where
return = Right
fail s = Left (strMsg s)
For lists, we get the empty list on failure:
instance Monad [] where
return x = [x]
fail _ = []
And for IO we get an exception thrown:
instance Monad IO where
fail = ioError . userError
return = returnIO
So, depending on the type signature, the compiler will statically pick one of
these 'fail's to use on a lookup failing at runtime. For example, to fail with
a null value, we'd use the Maybe monad:
Prelude System.Environment> Data.Map.lookup "USER" env :: Maybe String
Just "dons"
Prelude System.Environment> Data.Map.lookup "LUSER" env :: Maybe String
Nothing
Using Nothing for fatal errors isn't the best practice for large programs,
since you usually need to know what failed. For a string-annotated Nothing, we
can use the Either monad:
Prelude System.Environment> :m + Control.Monad.Error
Prelude System.Environment Control.Monad.Error> Data.Map.lookup "LUSER" env :: Either String String
Left "Data.Map.lookup: Key not found"
Which is more useful. To fail with a proper exception we'd use the IO monad:
Prelude System.Environment> Data.Map.lookup "LUSER" env :: IO String
*** Exception: user error (Data.Map.lookup: Key not found)
We'll now turn to a more flexible approach to argument parsing.
GetOpt
The base Haskell library comes with an implementation of getopt, a useful library
for standardised argument handling. Let's implement the argument handling of
the unix 'cat' program using this lib.
A type for flags
The first thing to do is define a data type representing the valid flags.
First, let's import all the libraries I'll use:
import Control.Monad
import Data.Char
import Data.List
import System.Console.GetOpt
import System.Environment
import System.Exit
import System.IO
import Text.Printf
Now in a new file, Cat.hs, we'll write:
data Flag
= Blanks
| Dollar
| Squeeze
| Tabs
| Unbuffered
| Invisible
| Number
| Help
deriving (Eq,Ord,Enum,Show,Bounded)
The 'data' keyword defines a new data type, 'Flag', which can have one of several
values. Such a type is often called a sum (or union) type. So 'Flag' is
a new user-defined type, just like other types, such as Bool or Int.
The identifiers on the right hand side of the | are the types constructors. That is,
values which have type 'Flag'. We ask the compiler to also derive some
instances of various common classes for us (so we don't have to write the code ourselves).
With just this we can already start playing around with the flag data type in GHCi:
> :reload
> :m + Data.List
> let s = [Number, Squeeze, Unbuffered, Squeeze]
*Main Data.List> let s = [Number, Squeeze, Unbuffered, Squeeze]
*Main Data.List> sort s
[Squeeze,Squeeze,Unbuffered,Number]
*Main Data.List> nub s
[Number,Squeeze,Unbuffered]
*Main Data.List> map fromEnum s
[6,2,4,2]
*Main Data.List> [Blanks .. ]
[Blanks,Dollar,Squeeze,Tabs,Unbuffered,Invisible,Number,Help]
User defined data types are really first class citizens in Haskell, and behave just like
the 'inbuilt' types.
Binding to command line flags
The next step is to associate some particular command line strings with each
abstract flag. We do this by writing a list of 'Option's, which tie long and
short argument flags to the particular abstract Flag value we need, and also
associated a help string with each flag:
flags =
[Option ['b'] [] (NoArg Blanks)
"Implies the -n option but doesn't count blank lines."
,Option ['e'] [] (NoArg Dollar)
"Implies the -v option and also prints a dollar sign (`$') at the end of each line."
,Option ['n'] [] (NoArg Number)
"Number the output lines, starting at 1."
,Option ['s'] [] (NoArg Squeeze)
"Squeeze multiple adjacent empty lines, causing the output to be single spaced."
,Option ['t'] [] (NoArg Tabs)
"Implies the -v option and also prints tab characters as `^I'."
,Option ['u'] [] (NoArg Unbuffered)
"The output is guaranteed to be unbuffered (see setbuf(3))."
,Option ['v'] [] (NoArg Invisible)
"Displays non-printing characters so they are visible."
,Option [] ["help"] (NoArg Help)
"Print this help message"
]
Parsing the flags
To actually turn the list of command line flags getArgs gives us, into a useful list
of abstract Flag values, we use the 'getOpt' function, which returns a triple
consisting of flags that were set, a list of any non-flag arguments, and a list
of error messages. First we need a couple of libraries:
And now to parse the 'cat' argument grammar, we would use:
parse argv = case getOpt Permute flags argv of
(args,fs,[]) -> do
let files = if null fs then ["-"] else fs
if Help `elem` args
then do hPutStrLn stderr (usageInfo header flags)
exitWith ExitSuccess
else return (nub (concatMap set args), files)
(_,_,errs) -> do
hPutStrLn stderr (concat errs ++ usageInfo header flags)
exitWith (ExitFailure 1)
where header = "Usage: cat [-benstuv] [file ...]"
If the arguments don't make sense, we fail with a usage message,
and set the exit status to 1. The final list of flags to use, and any files to
open, can be returned to main for processing now:
main = do
(as, fs) <- getArgs >>= parse
putStrLn $ "Flags: " ++ show as
putStrLn $ "Files: " ++ show fs
We can now test out the argument parsing code:
$ ghc Cat.hs
$ ./a.out
Flags: []
Files: []
$ ./a.out A.hs Z.hs
Flags: []
Files: ["A.hs","Z.hs"]
Ok, files are good. How about the flags that imply other flags?
$ ./a.out -b A.hs Z.hs
Flags: [Number,Blanks]
Files: ["A.hs","Z.hs"]
$ ./a.out -btvu A.hs Z.hs
Flags: [Number,Blanks,Tabs,Invisible,Unbuffered]
Files: ["A.hs","Z.hs"]
Good. And invalid flags:
$ ./a.out -i A.hs Z.hs
unrecognized option `-i'
Usage: cat [-benstuv] [file ...]
-b Implies the -n option but doesn't count blank lines.
-e Implies the -v option and also prints a dollar sign (`$') at the end of each line.
-n Number the output lines, starting at 1.
-s Squeeze multiple adjacent empty lines, causing the output to be single spaced.
-t Implies the -v option and also prints tab characters as `^I'.
-u The output is guaranteed to be unbuffered (see setbuf(3)).
-v Displays non-printing characters so they are visible.
--help Print this help message
Ok, that was pretty easy. Now let's try to implement these functions!
Implementing cat
So now we have to map those abstract flag values to real behaviour. I'll start
with the easy ones first.
If -u is set, we turn off all buffering. After that, we map a 'cat' function over
each file. So the program's 'main' is just:
main = do
(args, files) <- getArgs >>= parse
when (Unbuffered `elem` args) $ hSetBuffering stdout NoBuffering
mapM_ (cat args) files
Where 'cat' will process the files one at a time. 'cat' is where all the hard
work is done.
Most of the operations the cat program does requires access to each line of the
file. We also need to be able to handle the special file name, "-". What we'd
like to do is separate out any IO, from operations on each file's content. To
do this we'll write a higher order function, 'withFile', which takes a
filename, opens it, splits it into lines and applies a function to the contents
of the file, before writing the result to stdout:
withFile s f = putStr . unlines . f . lines =<< open s
where
open f = if f == "-" then getContents else readFile f
Now we can implement the pure 'cat' function, implementing the cat
program's functionality. Firstly, if there are no command line flags,
the 'cat' function does nothing to the input:
cat [] f = withFile f id
That is, it applies the 'id' function to the stream generated by withFile.
That was easy.
Now, if there are some arguments, we'll need to process them. This can be a little tricky,
since the effect of the command line flags are cumulative, and we better process
them in the right order. What is that order? Well, from experimentation :-) it seems
that (if all flags are enabled) 'cat' proceed to:
- first squeeze any blank lines;
- then any visibility flags are processed;
- then line numbering occurs;
- then, finally, any visible newlines are printed as '$'.
The visibility flags transform non-printing characters into a visible representation.
The key to coding this up is recognising that its just a functional pipeline.
So we can write it as:
cat as f = withFile f (newline . number . visible as)
Where 'visible' renders any non-printing chars. Then we number the resulting
lines (if the arguments are set), and then finally make any remaining newlines
visible. Note that the core of the algorithm does no IO. It's a pure function
from [String] -> [String]. Now the implementation of 'number':
where
number s = if Blanks `elem` as then numberSome s else ifset Number numberAll s
newline s = ifset Dollar (map (++"$")) s
visible as s = foldl' (flip render) s as
ifset a f = if a `elem` as then f else id
Here we actually handle all the data traversal. And use a little helper
function, 'ifset', to conditionally execute a function if the corresponding
command line is set. Note that slight trickiness involving numbering: either
we number all lines, or number the non blank lines, but not both.
'render', the function to print invisible characters, is just:
render Squeeze = map head . group
render Tabs = map $ concatMap (\c -> if c == '\t' then "^I" else [c])
render Invisible = map $ concatMap visible
where
visible c | c == '\t' || isPrint c = [c]
| otherwise = init . tail . show $ c
render _ = id
And the numbering function:
numberLine = printf "%6d %s"
numberAll s = zipWith numberLine [(1 :: Integer)..] s
numberSome s = reverse . snd $ foldl' draw (1,[]) s
where
draw (n,acc) s
| all isSpace s = (n, s : acc)
| otherwise = (n+1, numberLine n s : acc)
And we're done! In the end, our entire implementation is some 89 lines of code,
of which 60 are to do with importing modules, or command line argument parsing.
The actual heart of the program is fairly tiny in the end.
Let's run the code.
$ ghc -O Cat.hs -o cat
Check it actually prints its arguments:
$ ./cat Cat.hs | head
import System.Console.GetOpt
import System.IO
import System.Exit
import System.Environment
import Data.List
import Data.Char
import Control.Monad
import Text.Printf
main = do
Or multiple arguments:
$ ./cat Cat.hs /usr/share/dict/words | tail
zymotoxic
zymurgy
Zyrenian
Zyrian
Zyryan
zythem
Zythia
zythum
Zyzomys
Zyzzogeton
Does it number lines:
$ ./cat -n Cat.hs | tail
80
81 (_,_,errs) -> do
82 hPutStrLn stderr (concat errs ++ usageInfo header flags)
83 exitWith (ExitFailure 1)
84
85 where header = "Usage: cat [-benstuv] [file ...]"
86
87 set Dollar = [Dollar, Invisible]
88 set Tabs = [Tabs, Invisible]
89 set f = [f]
What about non blank lines:
$ ./cat -b Cat.hs | tail
72 (_,_,errs) -> do
73 hPutStrLn stderr (concat errs ++ usageInfo header flags)
74 exitWith (ExitFailure 1)
75 where header = "Usage: cat [-benstuv] [file ...]"
76 set Dollar = [Dollar, Invisible]
77 set Tabs = [Tabs, Invisible]
78 set f = [f]
How about some visibility flags:
$ cat -eb Cat.hs | tail
$
72 (_,_,errs) -> do$
73 hPutStrLn stderr (concat errs ++ usageInfo header flags)$
74 exitWith (ExitFailure 1)$
$
75 where header = "Usage: cat [-benstuv] [file ...]"$
$
76 set Dollar = [Dollar, Invisible]$
77 set Tabs = [Tabs, Invisible]$
78 set f = [f]$
And turning on all the flags:
$ cat -bnvste Cat.hs | tail
$
72 (_,_,errs) -> do$
73 hPutStrLn stderr (concat errs ++ usageInfo header flags)$
74 exitWith (ExitFailure 1)$
$
75 where header = "Usage: cat [-benstuv] [file ...]"$
$
76 set Dollar = [Dollar, Invisible]$
77 set Tabs = [Tabs, Invisible]$
78 set f = [f]$
Nice!
Summary
Well, in the end I didn't get on to exception handling, or the use of
bytestring to improve performance further. However, we have implemented (95%)
of the unix 'cat' program, including all argument handling and functionality,
in about an hour and a half.
Once it typechecked, the code just worked, except for one bug where I
originally rendered newline before counting lines (simply because the
spec was underspecified). Lesson: you can start writing your unix
scripts in Haskell right now. They'll be flexible, clean, and
easy to maintain. And most of all, fun to write!
Hopefully next time we'll look into using bytestrings for processing larger volumes of
data, and the use of exception handling to deal with unusual errors.
The complete source
And just for reference, there's the complete source:
import System.Console.GetOpt
import System.IO
import System.Exit
import System.Environment
import Data.List
import Data.Char
import Control.Monad
import Text.Printf
main = do
(args, files) <- getArgs >>= parse
when (Unbuffered `elem` args) $ hSetBuffering stdout NoBuffering
mapM_ (cat args) files
withFile s f = putStr . unlines . f . lines =<< open s
where
open f = if f == "-" then getContents else readFile f
cat [] f = withFile f id
cat as f = withFile f (newline . number . visible as)
where
number s = if Blanks `elem` as then numberSome s else ifset Number numberAll s
newline s = ifset Dollar (map (++"$")) s
visible as s = foldl' (flip render) s as
ifset a f = if a `elem` as then f else id
render Squeeze = map head . group
render Tabs = map $ concatMap (\c -> if c == '\t' then "^I" else [c])
render Invisible = map $ concatMap visible
where
visible c | c == '\t' || isPrint c = [c]
| otherwise = init . tail . show $ c
render _ = id
numberLine = printf "%6d %s"
numberAll s = zipWith numberLine [(1 :: Integer)..] s
numberSome s = reverse . snd $ foldl' draw (1,[]) s
where
draw (n,acc) s
| all isSpace s = (n, s : acc)
| otherwise = (n+1, numberLine n s : acc)
data Flag
= Blanks
| Dollar
| Squeeze
| Tabs
| Unbuffered
| Invisible
| Number
| Help
deriving (Eq,Ord,Enum,Show,Bounded)
flags =
[Option ['b'] [] (NoArg Blanks)
"Implies the -n option but doesn't count blank lines."
,Option ['e'] [] (NoArg Dollar)
"Implies the -v option and also prints a dollar sign (`$') at the end of each line."
,Option ['n'] [] (NoArg Number)
"Number the output lines, starting at 1."
,Option ['s'] [] (NoArg Squeeze)
"Squeeze multiple adjacent empty lines, causing the output to be single spaced."
,Option ['t'] [] (NoArg Tabs)
"Implies the -v option and also prints tab characters as `^I'."
,Option ['u'] [] (NoArg Unbuffered)
"The output is guaranteed to be unbuffered (see setbuf(3))."
,Option ['v'] [] (NoArg Invisible)
"Displays non-printing characters so they are visible."
,Option [] ["help"] (NoArg Help)
"Print this help message"
]
parse argv = case getOpt Permute flags argv of
(args,fs,[]) -> do
let files = if null fs then ["-"] else fs
if Help `elem` args
then do hPutStrLn stderr (usageInfo header flags)
exitWith ExitSuccess
else return (nub (concatMap set args), files)
(_,_,errs) -> do
hPutStrLn stderr (concat errs ++ usageInfo header flags)
exitWith (ExitFailure 1)
where header = "Usage: cat [-benstuv] [file ...]"
set Dollar = [Dollar, Invisible]
set Tabs = [Tabs, Invisible]
set f = [f]
/home ::
/haskell ::
permalink ::
rss