This is part three in a series of tutorials on programming Haskell. You
can get up to speed by reading:
Today we'll look more into how Haskell interacts with its environment,
robust command line argument parsing, and writing a complete program.
Administrivia
Some clarifications on issues raised by yesterday's article.
Chunks
One issue pointed out was that the 'chunk' function was missing. This was semi-intentional.
Anyway, this function just splits a list into 'n' chunks:
chunk n xs = chunk' i xs
where
chunk' _ [] = []
chunk' n xs = a : chunk' n b where (a,b) = splitAt n xs
i = ceiling (fromIntegral (length xs) / fromIntegral n)
You may be able to write a neater one..
Solipsistic philosophers
Also, an amusing thread appeared on reddit,
regarding solipsistic philosopher programs:
Of course, a good optimizing compiler will replace your solipsistic
philosopher with a no-op.
A good optimising compiler, or any Haskell compiler :-) Since results
that are never required are not computed in Haskell, due to laziness, we
can write high performance solipsism simulators
all day long:
main = do
let largestNumber = last [1..]
return ()
Running our simulation of the philosopher-mathematician pondering some
thoughts on large numbers:
$ ghc A.hs
$ ./a.out
./a.out 0.00s user 0.01s system 100% cpu 0.003 total
Ok. Enough jokes. Show me the code!
Getting to work
Yesterday we implemented a few toy unix programs, including 'cat'. Today
we'll look at writing a complete cat program, but with a focus on interacting
properly with the environment and being careful about command line handling.
For our running examples, we'll consider the 'cat' and 'tac' programs. The
basic spec for 'cat' is:
The cat utility reads files sequentially, writing them to the standard
output. The file operands are processed in command-line order. If file
is a single dash (`-') or absent, cat reads from the standard input.
It's the 'id' function of the unix shell. BSD 'cat.c' is a 255 line C
program. From the man page we
can see it does more than just concatenate files. It can also:
- Numbers the output lines, starting at 1.
- Squeezes multiple adjacent empty lines
- Displays non-printing characters so they are visible.
Let's start by looking at the command line argument processing code.
Getting in arguments
The basic way to get arguments in a Haskell program is provided
by the System.Environment library. We can use the getArgs function:
Prelude> :m + System.Environment
Prelude System.Environment> :t getArgs
getArgs :: IO [String]
Prelude System.Environment> do a <- getArgs; print a
[]
Which is empty, since we didn't provide any arguments!
In a small program we can implement all argument handling using just
'getArgs', and some simple list
functions. For example, here's a basic 'tac' program, to reverse its input:
import System.Environment
import System.Exit
main = getArgs >>= parse >>= putStr . tac
tac = unlines . reverse . lines
parse ["-h"] = usage >> exit
parse ["-v"] = version >> exit
parse [] = getContents
parse fs = concat `fmap` mapM readFile fs
usage = putStrLn "Usage: tac [-vh] [file ..]"
version = putStrLn "Haskell tac 0.1"
exit = exitWith ExitSuccess
die = exitWith (ExitFailure 1)
This program concatenates and prints the contents of files in reverse (or reads
from stdin with no arguments), along with a couple of basic command line flags
for version and help strings. It's also reasonably careful about setting exit status on finishing,
using the functions from System.Exit. The actual core algorithm for 'tac' is a nice
pure Haskell function, and really all the hard work is done processing the
command line args.
Some example use:
$ ./tac -h
Usage: tac [-vh] [file ..]
$ ./tac -v
Haskell tac 0.1
$ ./tac A.hs B.hs
return n
print 1
n <- getLine
g = do
...
import System.Exit
import System.Environment
$ ./tac < A.hs
die = exitWith (ExitFailure 1)
exit = exitWith ExitSuccess
version = putStrLn "Haskell tac 0.1"
...
import System.Exit
import System.Environment
As you can see, once compiled it behaves like a normal unix utility, properly
dealing with stdin, with file arguments and the shell.
Note that getArgs doesn't return the program name. To get that we use:
Prelude System.Environment> :t getProgName
getProgName :: IO String
The environment
Many programs also make use of environment variables. We can get access
to the full shell 'env' using
Prelude System.Environment> :t getEnvironment
getEnvironment :: IO [(String, String)]
Which returns an association list, mapping environment variables to
their values.
We can stick this list straight into an efficient Map structure, for later use. Here's an interactive
example:
Prelude System.Environment> env <- do e <- getEnvironment; return (Data.Map.fromList e)
which we could also write as:
env <- Data.Map.fromList `fmap` getEnvironment
Once we've got the environment in a useful Map, we can inspect it using Map
lookups:
Prelude System.Environment> :t env
env :: Data.Map.Map String String
Prelude System.Environment> :t Data.Map.lookup
Data.Map.lookup :: (Ord k, Monad m) => k -> Data.Map.Map k a -> m a
That is, the lookup function takes some key, 'k', and a Map from keys to
elements of type 'a', and returns an element, if found, in some monad.
More on failure
You may recall from the first tutorial that the Map 'lookup' function will fail
if the key is not found. The particular way you wish it to fail depends on
which monad you use. You can tell this from the type of lookup. The
lookup :: (Monad m) => ... m a
syntax indicates that lookup is polymorphic in its monad: it will work for any
monad type, and its behaviour is determined by the particular instance of the monad
interface you ask for. When a lookup fails, it calls the 'fail' function for
the monad you're using. When a lookup is successful, it calls the 'return'
function of the same monad. Being 'polymorphic in a monad' really just means that
it will call which particular concrete monad 'subclass' you happen to be using.
Looking at the various useful monads for this, we can choose which failure behaviour we
would prefer. Here's the implementation of the 'fail' interface for a variety of
monads. It's up to you to pick which behaviour you'd like.
For Maybes, we get the null value, Nothing, on failure:
instance Monad Maybe where
return = Just
fail _ = Nothing
For Eithers, we get an error string:
instance (Error e) => Monad (Either e) where
return = Right
fail s = Left (strMsg s)
For lists, we get the empty list on failure:
instance Monad [] where
return x = [x]
fail _ = []
And for IO we get an exception thrown:
instance Monad IO where
fail = ioError . userError
return = returnIO
So, depending on the type signature, the compiler will statically pick one of
these 'fail's to use on a lookup failing at runtime. For example, to fail with
a null value, we'd use the Maybe monad:
Prelude System.Environment> Data.Map.lookup "USER" env :: Maybe String
Just "dons"
Prelude System.Environment> Data.Map.lookup "LUSER" env :: Maybe String
Nothing
Using Nothing for fatal errors isn't the best practice for large programs,
since you usually need to know what failed. For a string-annotated Nothing, we
can use the Either monad:
Prelude System.Environment> :m + Control.Monad.Error
Prelude System.Environment Control.Monad.Error> Data.Map.lookup "LUSER" env :: Either String String
Left "Data.Map.lookup: Key not found"
Which is more useful. To fail with a proper exception we'd use the IO monad:
Prelude System.Environment> Data.Map.lookup "LUSER" env :: IO String
*** Exception: user error (Data.Map.lookup: Key not found)
We'll now turn to a more flexible approach to argument parsing.
GetOpt
The base Haskell library comes with an implementation of getopt, a useful library
for standardised argument handling. Let's implement the argument handling of
the unix 'cat' program using this lib.
A type for flags
The first thing to do is define a data type representing the valid flags.
First, let's import all the libraries I'll use:
import Control.Monad
import Data.Char
import Data.List
import System.Console.GetOpt
import System.Environment
import System.Exit
import System.IO
import Text.Printf
Now in a new file, Cat.hs, we'll write:
data Flag
= Blanks
| Dollar
| Squeeze
| Tabs
| Unbuffered
| Invisible
| Number
| Help
deriving (Eq,Ord,Enum,Show,Bounded)
The 'data' keyword defines a new data type, 'Flag', which can have one of several
values. Such a type is often called a sum (or union) type. So 'Flag' is
a new user-defined type, just like other types, such as Bool or Int.
The identifiers on the right hand side of the | are the types constructors. That is,
values which have type 'Flag'. We ask the compiler to also derive some
instances of various common classes for us (so we don't have to write the code ourselves).
With just this we can already start playing around with the flag data type in GHCi:
> :reload
> :m + Data.List
> let s = [Number, Squeeze, Unbuffered, Squeeze]
*Main Data.List> let s = [Number, Squeeze, Unbuffered, Squeeze]
*Main Data.List> sort s
[Squeeze,Squeeze,Unbuffered,Number]
*Main Data.List> nub s
[Number,Squeeze,Unbuffered]
*Main Data.List> map fromEnum s
[6,2,4,2]
*Main Data.List> [Blanks .. ]
[Blanks,Dollar,Squeeze,Tabs,Unbuffered,Invisible,Number,Help]
User defined data types are really first class citizens in Haskell, and behave just like
the 'inbuilt' types.
Binding to command line flags
The next step is to associate some particular command line strings with each
abstract flag. We do this by writing a list of 'Option's, which tie long and
short argument flags to the particular abstract Flag value we need, and also
associated a help string with each flag:
flags =
[Option ['b'] [] (NoArg Blanks)
"Implies the -n option but doesn't count blank lines."
,Option ['e'] [] (NoArg Dollar)
"Implies the -v option and also prints a dollar sign (`$') at the end of each line."
,Option ['n'] [] (NoArg Number)
"Number the output lines, starting at 1."
,Option ['s'] [] (NoArg Squeeze)
"Squeeze multiple adjacent empty lines, causing the output to be single spaced."
,Option ['t'] [] (NoArg Tabs)
"Implies the -v option and also prints tab characters as `^I'."
,Option ['u'] [] (NoArg Unbuffered)
"The output is guaranteed to be unbuffered (see setbuf(3))."
,Option ['v'] [] (NoArg Invisible)
"Displays non-printing characters so they are visible."
,Option [] ["help"] (NoArg Help)
"Print this help message"
]
Parsing the flags
To actually turn the list of command line flags getArgs gives us, into a useful list
of abstract Flag values, we use the 'getOpt' function, which returns a triple
consisting of flags that were set, a list of any non-flag arguments, and a list
of error messages. First we need a couple of libraries:
And now to parse the 'cat' argument grammar, we would use:
parse argv = case getOpt Permute flags argv of
(args,fs,[]) -> do
let files = if null fs then ["-"] else fs
if Help `elem` args
then do hPutStrLn stderr (usageInfo header flags)
exitWith ExitSuccess
else return (nub (concatMap set args), files)
(_,_,errs) -> do
hPutStrLn stderr (concat errs ++ usageInfo header flags)
exitWith (ExitFailure 1)
where header = "Usage: cat [-benstuv] [file ...]"
If the arguments don't make sense, we fail with a usage message,
and set the exit status to 1. The final list of flags to use, and any files to
open, can be returned to main for processing now:
main = do
(as, fs) <- getArgs >>= parse
putStrLn $ "Flags: " ++ show as
putStrLn $ "Files: " ++ show fs
We can now test out the argument parsing code:
$ ghc Cat.hs
$ ./a.out
Flags: []
Files: []
$ ./a.out A.hs Z.hs
Flags: []
Files: ["A.hs","Z.hs"]
Ok, files are good. How about the flags that imply other flags?
$ ./a.out -b A.hs Z.hs
Flags: [Number,Blanks]
Files: ["A.hs","Z.hs"]
$ ./a.out -btvu A.hs Z.hs
Flags: [Number,Blanks,Tabs,Invisible,Unbuffered]
Files: ["A.hs","Z.hs"]
Good. And invalid flags:
$ ./a.out -i A.hs Z.hs
unrecognized option `-i'
Usage: cat [-benstuv] [file ...]
-b Implies the -n option but doesn't count blank lines.
-e Implies the -v option and also prints a dollar sign (`$') at the end of each line.
-n Number the output lines, starting at 1.
-s Squeeze multiple adjacent empty lines, causing the output to be single spaced.
-t Implies the -v option and also prints tab characters as `^I'.
-u The output is guaranteed to be unbuffered (see setbuf(3)).
-v Displays non-printing characters so they are visible.
--help Print this help message
Ok, that was pretty easy. Now let's try to implement these functions!
Implementing cat
So now we have to map those abstract flag values to real behaviour. I'll start
with the easy ones first.
If -u is set, we turn off all buffering. After that, we map a 'cat' function over
each file. So the program's 'main' is just:
main = do
(args, files) <- getArgs >>= parse
when (Unbuffered `elem` args) $ hSetBuffering stdout NoBuffering
mapM_ (cat args) files
Where 'cat' will process the files one at a time. 'cat' is where all the hard
work is done.
Most of the operations the cat program does requires access to each line of the
file. We also need to be able to handle the special file name, "-". What we'd
like to do is separate out any IO, from operations on each file's content. To
do this we'll write a higher order function, 'withFile', which takes a
filename, opens it, splits it into lines and applies a function to the contents
of the file, before writing the result to stdout:
withFile s f = putStr . unlines . f . lines =<< open s
where
open f = if f == "-" then getContents else readFile f
Now we can implement the pure 'cat' function, implementing the cat
program's functionality. Firstly, if there are no command line flags,
the 'cat' function does nothing to the input:
cat [] f = withFile f id
That is, it applies the 'id' function to the stream generated by withFile.
That was easy.
Now, if there are some arguments, we'll need to process them. This can be a little tricky,
since the effect of the command line flags are cumulative, and we better process
them in the right order. What is that order? Well, from experimentation :-) it seems
that (if all flags are enabled) 'cat' proceed to:
- first squeeze any blank lines;
- then any visibility flags are processed;
- then line numbering occurs;
- then, finally, any visible newlines are printed as '$'.
The visibility flags transform non-printing characters into a visible representation.
The key to coding this up is recognising that its just a functional pipeline.
So we can write it as:
cat as f = withFile f (newline . number . visible as)
Where 'visible' renders any non-printing chars. Then we number the resulting
lines (if the arguments are set), and then finally make any remaining newlines
visible. Note that the core of the algorithm does no IO. It's a pure function
from [String] -> [String]. Now the implementation of 'number':
where
number s = if Blanks `elem` as then numberSome s else ifset Number numberAll s
newline s = ifset Dollar (map (++"$")) s
visible as s = foldl' (flip render) s as
ifset a f = if a `elem` as then f else id
Here we actually handle all the data traversal. And use a little helper
function, 'ifset', to conditionally execute a function if the corresponding
command line is set. Note that slight trickiness involving numbering: either
we number all lines, or number the non blank lines, but not both.
'render', the function to print invisible characters, is just:
render Squeeze = map head . group
render Tabs = map $ concatMap (\c -> if c == '\t' then "^I" else [c])
render Invisible = map $ concatMap visible
where
visible c | c == '\t' || isPrint c = [c]
| otherwise = init . tail . show $ c
render _ = id
And the numbering function:
numberLine = printf "%6d %s"
numberAll s = zipWith numberLine [(1 :: Integer)..] s
numberSome s = reverse . snd $ foldl' draw (1,[]) s
where
draw (n,acc) s
| all isSpace s = (n, s : acc)
| otherwise = (n+1, numberLine n s : acc)
And we're done! In the end, our entire implementation is some 89 lines of code,
of which 60 are to do with importing modules, or command line argument parsing.
The actual heart of the program is fairly tiny in the end.
Let's run the code.
$ ghc -O Cat.hs -o cat
Check it actually prints its arguments:
$ ./cat Cat.hs | head
import System.Console.GetOpt
import System.IO
import System.Exit
import System.Environment
import Data.List
import Data.Char
import Control.Monad
import Text.Printf
main = do
Or multiple arguments:
$ ./cat Cat.hs /usr/share/dict/words | tail
zymotoxic
zymurgy
Zyrenian
Zyrian
Zyryan
zythem
Zythia
zythum
Zyzomys
Zyzzogeton
Does it number lines:
$ ./cat -n Cat.hs | tail
80
81 (_,_,errs) -> do
82 hPutStrLn stderr (concat errs ++ usageInfo header flags)
83 exitWith (ExitFailure 1)
84
85 where header = "Usage: cat [-benstuv] [file ...]"
86
87 set Dollar = [Dollar, Invisible]
88 set Tabs = [Tabs, Invisible]
89 set f = [f]
What about non blank lines:
$ ./cat -b Cat.hs | tail
72 (_,_,errs) -> do
73 hPutStrLn stderr (concat errs ++ usageInfo header flags)
74 exitWith (ExitFailure 1)
75 where header = "Usage: cat [-benstuv] [file ...]"
76 set Dollar = [Dollar, Invisible]
77 set Tabs = [Tabs, Invisible]
78 set f = [f]
How about some visibility flags:
$ cat -eb Cat.hs | tail
$
72 (_,_,errs) -> do$
73 hPutStrLn stderr (concat errs ++ usageInfo header flags)$
74 exitWith (ExitFailure 1)$
$
75 where header = "Usage: cat [-benstuv] [file ...]"$
$
76 set Dollar = [Dollar, Invisible]$
77 set Tabs = [Tabs, Invisible]$
78 set f = [f]$
And turning on all the flags:
$ cat -bnvste Cat.hs | tail
$
72 (_,_,errs) -> do$
73 hPutStrLn stderr (concat errs ++ usageInfo header flags)$
74 exitWith (ExitFailure 1)$
$
75 where header = "Usage: cat [-benstuv] [file ...]"$
$
76 set Dollar = [Dollar, Invisible]$
77 set Tabs = [Tabs, Invisible]$
78 set f = [f]$
Nice!
Summary
Well, in the end I didn't get on to exception handling, or the use of
bytestring to improve performance further. However, we have implemented (95%)
of the unix 'cat' program, including all argument handling and functionality,
in about an hour and a half.
Once it typechecked, the code just worked, except for one bug where I
originally rendered newline before counting lines (simply because the
spec was underspecified). Lesson: you can start writing your unix
scripts in Haskell right now. They'll be flexible, clean, and
easy to maintain. And most of all, fun to write!
Hopefully next time we'll look into using bytestrings for processing larger volumes of
data, and the use of exception handling to deal with unusual errors.
The complete source
And just for reference, there's the complete source:
import System.Console.GetOpt
import System.IO
import System.Exit
import System.Environment
import Data.List
import Data.Char
import Control.Monad
import Text.Printf
main = do
(args, files) <- getArgs >>= parse
when (Unbuffered `elem` args) $ hSetBuffering stdout NoBuffering
mapM_ (cat args) files
withFile s f = putStr . unlines . f . lines =<< open s
where
open f = if f == "-" then getContents else readFile f
cat [] f = withFile f id
cat as f = withFile f (newline . number . visible as)
where
number s = if Blanks `elem` as then numberSome s else ifset Number numberAll s
newline s = ifset Dollar (map (++"$")) s
visible as s = foldl' (flip render) s as
ifset a f = if a `elem` as then f else id
render Squeeze = map head . group
render Tabs = map $ concatMap (\c -> if c == '\t' then "^I" else [c])
render Invisible = map $ concatMap visible
where
visible c | c == '\t' || isPrint c = [c]
| otherwise = init . tail . show $ c
render _ = id
numberLine = printf "%6d %s"
numberAll s = zipWith numberLine [(1 :: Integer)..] s
numberSome s = reverse . snd $ foldl' draw (1,[]) s
where
draw (n,acc) s
| all isSpace s = (n, s : acc)
| otherwise = (n+1, numberLine n s : acc)
data Flag
= Blanks
| Dollar
| Squeeze
| Tabs
| Unbuffered
| Invisible
| Number
| Help
deriving (Eq,Ord,Enum,Show,Bounded)
flags =
[Option ['b'] [] (NoArg Blanks)
"Implies the -n option but doesn't count blank lines."
,Option ['e'] [] (NoArg Dollar)
"Implies the -v option and also prints a dollar sign (`$') at the end of each line."
,Option ['n'] [] (NoArg Number)
"Number the output lines, starting at 1."
,Option ['s'] [] (NoArg Squeeze)
"Squeeze multiple adjacent empty lines, causing the output to be single spaced."
,Option ['t'] [] (NoArg Tabs)
"Implies the -v option and also prints tab characters as `^I'."
,Option ['u'] [] (NoArg Unbuffered)
"The output is guaranteed to be unbuffered (see setbuf(3))."
,Option ['v'] [] (NoArg Invisible)
"Displays non-printing characters so they are visible."
,Option [] ["help"] (NoArg Help)
"Print this help message"
]
parse argv = case getOpt Permute flags argv of
(args,fs,[]) -> do
let files = if null fs then ["-"] else fs
if Help `elem` args
then do hPutStrLn stderr (usageInfo header flags)
exitWith ExitSuccess
else return (nub (concatMap set args), files)
(_,_,errs) -> do
hPutStrLn stderr (concat errs ++ usageInfo header flags)
exitWith (ExitFailure 1)
where header = "Usage: cat [-benstuv] [file ...]"
set Dollar = [Dollar, Invisible]
set Tabs = [Tabs, Invisible]
set f = [f]
/home ::
/haskell ::
permalink ::
rss
2006-12-17
This is part two in a series of tutorials on programming Haskell. You
can get up to speed by reading yesterday's
introductory article.
Today we'll look more into the basic tools at our disposal in the Haskell language, in particular,
operations for doing IO and playing with files and strings.
Administrivia
Before we get started, I should clarify a small point raised by yesterday's
article. One issue I forgot to mention was that there are slight
differences between running Haskell in ghci, the bytecode interpreter,
and compiling it to native code with GHC.
Haskell programs are executed by evaluating the special 'main' function.
import Data.List
mylength = foldr (const (+1)) 0
main = print (mylength "haskell")
To compile this to native code, we would feed the source file to the compiler:
$ ghc A.hs
$ ./a.out
7
For a faster turnaround, we can run the code directly through
the bytecode interpreter, GHCi, using the 'runhaskell' program:
$ runhaskell A.hs
7
GHCi, the interactive Haskell environment, is a little bit different.
As it is an interactive system, GHCi must execute your code
sequentially, as you define each line. This is different to normal
Haskell, where the order of definition is irrelevant. GHCi effectively
executes your code inside a do-block. Therefore you can use the
do-notation at the GHCi prompt to define new functions:
$ ghci
Prelude> :m + Data.List
Prelude> let mylength = foldr (const (+1)) 0
Prelude> :t mylength
mylength :: [a] -> Integer
Prelude> mylength "haskell"
7
For this tutorial I will be developing code in a source file, and either
compiling it as above, or loading the source file into GHCi for testing.
To load a source file into GHCi, we do:
$ ghci
Prelude> :load A.hs
*Main> :t main
main :: IO ()
*Main> :t mylength
mylength :: [a] -> Integer
*Main> mylength "foo"
3
*Main> main
7
Now, let's get into the code!
IO
As the Camel Book says:
Unless you're using artificial intelligence to model a solipsistic
philosopher, your program needs some way to communicate with the
outside world.
In yesterday's tutorial, I briefly introduced 'readFile', for reading a
String from a file on disk. Let's consider now IO in more detail.
The most common IO operations are defined in the
System.IO
library.
For the most basic stdin/stdout Unix-style programs in Haskell, we can
use the 'interact' function:
interact :: (String -> String) -> IO ()
This higher order function takes, as an argument, some function for
processing a string (of type String -> String). It runs this function
over the standard input stream, printing the result to standard output.
A surprisingly large number of useful programs can be written this way.
For example, we can write the 'cat' unix program as:
main = interact id
Yes, that's it! Let's compile and run this program:
$ ghc O A.hs
$ cat A.hs | ./a.out
main = interact id
How does this work? Firstly, 'interact' is defined as:
interact f = do s <- getContents
putStr (f s)
So it reads a string from standard input, and writes to standard output
the result of applying its argument function to that string. The 'id'
function itself has the type:
id :: a -> a
'id' is a function of one argument, of any type (the lowercase 'a' in
the type means any type can be used in that position, i.e. it is a
polymorphic function (also called a generic function in some
languages)). 'id' takes a value of some type 'a', and returns a value of
the same type. There's is only one (non-trivial) function of this type:
id a = a
So 'interact id' will print to the input string to standard output
unmodified.
Let's now write the 'wc' program:
main = interact count
count s = show (length s) ++ "\n"
This will print the length of the input string, that is, the number of
chars:
$ runhaskell A.hs < A.hs
57
Line oriented IO
Only a small number of programs operate on unstructured input streams.
It is far more common to treat an input stream as a list of lines. So
let's do that. To break a string up into lines, we'll use the ...
'lines' function, defined in the Data.List library:
lines :: String -> [String]
The type, once again, tells the story. 'lines' takes a String, and
breaks it up into a list of strings, splitting on newlines.
To join a list of strings back into a single string, inserting newlines,
we'd use the ... 'unlines' function:
unlines :: [String] -> String
There are also similar functions for splitting on words, namely 'words'
and 'unwords'. Now, an example. To count the number of lines in a file:
main = interact (count . lines)
We can run this as:
$ ghc O A.hs
$ ./a.out < A.hs
3
Here we reuse the 'count' function from above, by composing it
with the lines function.
On composition
This nice code reuse via composition is achieved using the (.) function,
pronounced 'compose'. Let's look at how that works. (Feel free to skip
this section, if you want to just get things done).
The (.) function is just a normal everyday Haskell function, defined as:
(.) f g x = f (g x)
This looks a little like magic (or line noise), but its pretty easy. The
(.) function simply takes two functions as arguments, along with
another value. It applies the 'g' function to the value 'x', and then
applies 'f' to the result, returning this final value. The functions may
be of any type. The type of (.) is actually:
(.) :: (b -> c) -> (a -> b) -> a -> c
which might look a bit hairy, but it essentially specifies what types of
arguments make sense to compose. That is, only those where:
f :: b -> c
g :: a -> b
x :: a
can be composed, yielding a new function of type:
(f . g) :: a -> c
The nice thing is that this composition makes sense (and works) for all types a, b and
c.
How does this relate to code reuse? Well, since our 'count' function is
polymorphic, it works equally well counting the length of a
string, or the length of a list of strings. Our littler 'wc' program is
the epitome of the phrase: "higher order + polymorphic =
reusable". That is, functions which take other functions as
arguments, when combined with functions that work over any type, produce
great reusable 'glue'. You only need vary the argument function to gain
terrific code reuse (and the strong type checking ensures you can only
reuse code in ways that work).
More on lines
Another little example, let's reverse each line of a file (like the unix
'rev' command):
main = interact (unlines . map reverse . lines)
Which when run, reverses the input lines:
$ ./a.out < B.hs
rahC.ataD tropmi
ebyaM.ataD tropmi
tsiL.ataD tropmi
So we take the input string, split it into lines, and the loop over that
list of lines, reversing each of them, using the 'map' function.
Finally, once we've reversed each line, we join them back into a single
string with unlines, and print it out.
The 'map' function is a fundamental control structure of functional
programming, similar to the 'foreach' keyword in a number of imperative
languages. 'map' however is just a function on lists, not built in
syntax, and has the type:
map :: (a -> b) -> [a] -> [b]
That is, it takes some function, and a list, and applies that function
to each element of the list, returning a new list as a result. Since
loops are so common in programming, we'll be using 'map' a lot.
Just for reference, 'map' is implemented as:
map _ [] = []
map f (x:xs) = f x : map f xs
File IO
Operating on stdin/stdout is good for scripts (and this is how tools
like sed or perl -p work), but for 'real' programs we'll at least need
to do some file IO. The basic operations of files are:
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
'readFile' takes a file name as an argument, does some IO, and returns the
file's contents as a string. 'writeFile' takes a file name, a string,
and does some IO (writing that string to the file), before returning the
void (or unit) value, ().
We could implement a 'cp' program on files, as:
import System.Environment
main = do
[f,g] <- getArgs
s <- readFile f
writeFile g s
Running this program:
$ ghc O A.hs
$ ./a.out A.hs Z.hs
$ cat Z.hs
import System.Environment
main = do
[f,g] <- getArgs
s <- readFile f
writeFile g s
Since we're doing IO (the type of readFile and writeFile enforce this),
the code runs inside a do-block, using the IO monad. "Using the
IO monad" just means that we wish to use an imperative, sequential order
of evaluation. (As an aside, a wide range of other monads exist, for
programming different program evaluation strategies, such as
Prolog-style backtracking, or continutation-based evaluation. All of
imperative programming is just one subset of possible evaluation
strategies you can use in Haskell, via monads).
In do-notation, whenever you wish to run an action, for its side
effect, and save the result to a variable, you would write:
v <- action
For example, to run the 'readFile' action, which has the side effect of
reading a file from disk, we say:
s <- readFile f
Finally, we can use the 'appendFile' function to append to an existing
file.
File Handles
The most generic interface to files is provided via Handles. Sometimes
we need to keep a file open, for multiple reads or writes. To do this we
use Handles, an abstraction much like the underlying system's file
handles.
To open up a file, and get its Handle, we use:
openFile :: FilePath -> IOMode -> IO Handle
So to open a file for reading only, in GHCi:
Prelude System.IO> h <- openFile "A.hs" ReadMode
{handle: A.hs}
Which returns a Handle onto the file "A.hs". We can read a line from this handle:
Prelude System.IO> hGetLine h
"main = do"
To close a Handle, and flush the buffer:
hClose :: Handle -> IO ()
Once a Handle is closed, we can no longer read from it:
Prelude System.IO> hClose h
Prelude System.IO> hGetLine h
*** Exception: A.hs: hGetLine: illegal operation (handle is closed)
We can also flush explicitly with:
hFlush :: Handle -> IO ()
Other useful operations for reading from Handles:
hGetChar :: Handle -> IO Char
hGetLine :: Handle -> IO [Char]
hGetContents :: Handle -> IO [Char]
To write to Handles:
hPutChar :: Handle -> Char -> IO ()
hPutStr :: Handle -> [Char] -> IO ()
hPutStrLn :: Handle -> [Char] -> IO ()
hPrint :: Show a => Handle -> a -> IO ()
Some other useful actions:
hSeek :: Handle -> SeekMode -> Integer -> IO ()
hTell :: Handle -> IO Integer
hFileSize :: Handle -> IO Integer
hIsEOF :: Handle -> IO Bool
An example: spell checking
Here is a small example of combining the Data.Set
and List data structures from yesterday's tutorial, with more IO
operations. We'll implement a little spell checker, building the
dictionary in a Set data type. First, some libraries to import:
import System.Environment
import Control.Monad
import Data.Set
And the complete program:
main = do
[s] <- getArgs
f <- readFile "/usr/share/dict/words"
g <- readFile s
let dict = fromList (lines f)
mapM_ (spell dict) (words g)
spell d w = when (w `notMember` d) (putStrLn w)
Running this program, on its own source, and it reports the following
words are not found in the dictionary:
$ ghc O Spell.hs o spell
$ ./spell A.hs
Data.Char
=
<-
(map
toUpper
n)
=
<-
getLine
1
Writing the results out
If we wanted to write the results out to a temporary file, we can do
so. Let's import a couple of other modules:
import Data.Set
import Data.Maybe
import Text.Printf
import System.IO
import System.Environment
import System.Posix.Temp
Refactoring the main code to separate out the reading and writing phases
in to their own function, we end up with the core code:
main = do
(f, g) <- readFiles
let dict = fromList (lines f)
errs = mapMaybe (spell dict) (words g)
write errs
spell d w | w `notMember` d = Just w
| otherwise = Nothing
Where reading is now its own function:
readFiles = do
[s] <- getArgs
f <- readFile "/usr/share/dict/words"
g <- readFile s
return (f,g)
And writing errors out to their own file:
write errs = do
(t,h) <- mkstemp "/tmp/spell.XXXXXX"
mapM_ (hPutStrLn h) errs
hClose h
printf "%d spelling errors written to '%s'\n" (length errs) t
Pretty simple! Running this program:
$ ghc
[1 of 1] Compiling Main ( Spell.hs, Spell.o )
Linking myspell ...
$ ./myspell Spell.hs
67 spelling errors written to '/tmp/spell.ia8256'
Extension: using SMP parallelism
Finally, just for some bonus fun ... and hold on to your hat 'cause I'm
going to go fast ... we'll add some parallelism to the mix.
Haskell was designed from the start to support easy parallelisation, and
since GHC 6.6, multithreaded code will run transparently on multicore
systems using as many cores as you specify. Let's look at how we'd
parallelise our little program to exploit multiple cores. We'll use an
explicit threading model, via Control.Concurrent. You can also make your code implicitly
parallel, using Control.Parallel.Strategies,
but we'll leave that for another tutorial.
Here's the source, for you to ponder. First some imports:
import Data.Set hiding (map)
import Data.Maybe
import Data.Char
import Text.Printf
import System.IO
import System.Environment
import Control.Concurrent
import Control.Monad
The entry point, modified to break the word list into chunks, and then
dispatching a chunk to each thread:
main = do
(f, g, n) <- readFiles
let dict = fromList (lines f)
work = chunk n (words g)
run n dict work
The 'run' function sets up a channel between the main thread and all
children thread ('errs'), and prints spelling errors as they arrive on
the channel from the children. It then forks off 'n' children threads on
each piece of the work list:
run n dict work = do
chan <- newChan
errs <- getChanContents chan
mapM_ (forkIO . thread chan dict) (zip [1..n] work)
wait n errs 0
The main thread then just waits on all the threads to finish, printing
any spelling errors they pass up:
wait n xs i = when (i < n) $ case xs of
Nothing : ys -> wait n ys $! i+1
Just s : ys -> putStrLn s >> wait n ys i
Each thread spell checks its own piece of the work list. If it finds a
spelling error, it passes the offending word back over the channel to
the main thread.
thread chan dict (me, xs) = do
mapM_ spellit xs
writeChan chan Nothing
where
spellit w = do
when (spell dict w) $
writeChan chan . Just $ printf "Thread %d: %-25s" (me::Int) w
The 'spell' function is simplified a little:
spell d w = w `notMember` d
which we could also write as:
spell = flip notMember
We modify the readFiles phase to take an additional numeric command line
argument, specifying the number of threads to run:
readFiles = do
[s,n] <- getArgs
f <- readFile "/usr/share/dict/words"
g <- readFile s
return (f,g, read n)
We compile this with the GHC SMP parallel runtime system:
$ ghc O
Now, we can run 'n' worker threads (lightweight Haskell threads), mapped
onto 'm' OS threads. Since I'm using a 4 core linux server, we'll play
around with 4 OS threads. First, running everything in a single thread:
$ time ./spell test.txt 1 +RTS N1
...
Thread 1: week:
Thread 1: IO!
./spell test.txt 1 +RTS N1 99% cpu 2.533 total
Ok, now we change the command line flag to run it with 4 OS threads, to
try to utilise all 4 cores:
$ time ./spell 4 +RTS N4
...
Thread 2: week:
Thread 3: IO!
./spell test.txt 4 +RTS N4 111% cpu 2.335 total
Ok. Good... A little bit faster, uses a little bit more cpu. It turns
out however the program is bound currently by the time spent in the main
thread building the initial dictionary. Actual searching time is only
some 10% of the running time. Nonetheless, it was fairly painless to
break up the initial simple program into a parallel version.
If the program running time was extended (as the case for a server), the
parallelism would be a win. Additionally, should we buy more cores for
the server, all we need to is change the +RTS -N argument to the
program, to start utilising these extra cores.
Next week
In upcoming tutorials we'll look more into implicitly parallel programs,
and the use of the new high performance ByteString data type for string
processing.
/home ::
/haskell ::
permalink ::
rss
2006-12-16
Its about time we got some job done in Haskell, eh? Now, one of my favourite programming
books as an undergraduate was the Camel Book, "Programming Perl". It was full
of lots of practical examples of Perl code, written well. (And I'm
grateful to Larry Wall, Tom Christiansen and Randal Schwartz for writing
the book that made programming fun).
So what would it look like if we wrote a Haskell tutorial in this style?
Let's have at it!
Getting started
Like some languages Haskell can be both compiled and interpreted. The most
widely used implementation of Haskell currently is GHC, which provides both an
optimising native code compiler, and an interactive bytecode interpreter. I'll
be using GHC (or its interactive front
end, GHCi, for all code. So grab a copy of GHC now, from your package system,
or the GHC home page.
Start up GHCi:
$ ghci
___ ___ _
/ _ \ /\ /\/ __(_)
/ /_\// /_/ / / | | GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __ / /___| | http://www.haskell.org/ghc/
\____/\/ /_/\____/|_| Type :? for help.
Loading package base ... linking ... done.
Prelude>
The interpreter now waits for your code fragments. The "Prelude" prompt indicates
which library modules are in scope, and in this case, only the basic language
module, known as the
Prelude.
Now we can start running Haskell code.
Prelude> "G'day, world!"
"G'day, world!"
Prelude> putStrLn "G'day, world!"
G'day, world!
You can compile this code to a native binary using GHC, by writing in a source
file:
main = putStrLn "G'day, world!"
and then compiling the source to native code. Assuming your file is A.hs:
$ ghc A.hs
This produces a new executable, ./a.out (a.out.exe on windows), which you can
run like any other program on your system:
$ ./a.out
G'day, world!
Variables
We can name arbitrary fragments of Haskell using variables. Like so:
phrase = "G'day, world!"
main = putStrLn phrase
We don't have to define what type phrase is, as Haskell uses type inference to
infer at compile time the types of all expressions in the program. As "G'day,
world!" is a string, so must phrase be a string. There are a bunch of basic
types of values to play with. Here's a small sample:
answer = 42
pi = 3.141592653589793238462643383279502884197169399375105820974944592
avocados = 6.02e23
pet = "Lambda"
sign = "I love my " ++ pet
coat = "It costs $100"
hence = "whence"
thence = hence
moles = 2.5
x = moles * avocados
c = '#'
pair = (2.5, "lambdas")
list = [5,6,4,3,1]
options = Just "done"
failed = Nothing
void = ()
One important thing to remember is that Haskell's variables, like in most
functional programming languages, are like variables in mathematics, and are
just names for expressions. They're explicitly not
mutable boxes,
like in most imperative programming languages. As a result, you never need to
worry about initialising a Haskell variable, nor do you need to worry about the
current value in a variable: it always has the same value, and can
always be replaced with its definition. So the following behaves just like it
would in maths:
answer = 42
another = answer + 1
more = another + answer
main = print more
That is,
$ ghc A.hs
$ ./a.out
85
Now, since variables are just names for program fragments, you can evaluate
Haskell on paper by replacing all names with their definition, until you reach
a final value, like so:
main = print more
=>
main = print (another + answer)
=>
main = print ((answer + 1) + answer)
=>
main = print ((answer + 1) + 42)
=>
main = print ((42 + 1) + 42)
=>
main = print (43 + 42)
=>
main = print 85
=>
85
Having such a simple system for variables allows for a wide range of
interesting optimisations, and makes understanding what a program is doing at
any point much easier, since you don't have to worry about what state a
variable might currently be in. (Of course, some problems need (theadsafe) mutable
boxes, and they're available as a library for when you need that).
Collections
Often you need to collect a bunch of values together into some kind of
collection. Haskell has many many collection types, but in particular, it has
lists and finite maps, which operate much like arrays and hashes of the
imperative world.
Lists
A list is just an ordered, um, list of values. They can be nested, and
transformed in all sorts of ways, using functions. Assuming your file, A.hs,
contains:
home = ["couch", "chair", "table", "stove"]
We can play around with this stuff like so:
$ ghci A.hs
*Main> home
["couch","chair","table","stove"]
*Main> head home
"couch"
*Main> tail home
["chair","table","stove"]
*Main> last home
"stove"
*Main> home !! 2
"table"
*Main> reverse home
["stove","table","chair","couch"]
*Main> map reverse home
["hcuoc","riahc","elbat","evots"]
Loading in the List library gives us some more functions to use:
*Main> :m + Data.List
*Main Data.List> intersperse "#" home
["couch","#","chair","#","table","#","stove"]
*Main Data.List> concat (intersperse "#" home)
"couch#chair#table#stove"
*Main Data.List> home \\ ["table","stove"]
["couch","chair"]
Finite Maps
Finite maps
(or maps) are the lookup tables of purely functional programming. Whenever
you'd use some kind of hash in an imperative language, you can replace it with
a Map in Haskell.
Like hashes, maps can be seen as a table of pairs of keys and values. You can
declare a new map:
import Data.Map
days = fromList
[ ("Sun", "Sunday" )
, ("Mon", "Monday" )
, ("Tue", "Tuesday" )
, ("Wed", "Wednesday" )
, ("Thu", "Thursday" )
, ("Fri", "Friday" )
, ("Sat", "Saturday" ) ]
You can also convert a map to a list, using (well, duh!) toList:
*Main> toList days
[("Fri","Friday"),("Mon","Monday"),("Sat","Saturday"),("Sun","Sunday"),("Thu","Thursday"),("Tue","Tuesday"),("Wed","Wednesday")]
Note that they come out unordered, just like in hashes. If you just want the keys of the map:
*Main> keys days
["Fri","Mon","Sat","Sun","Thu","Tue","Wed"]
*Main> elems days
["Friday","Monday","Saturday","Sunday","Thursday","Tuesday","Wednesday"]
Since maps are a good structure for looking up values, you can search them using
the lookup function. This function returns the element, if found:
*Main> Data.Map.lookup "Tue" days
"Tuesday"
Since the name 'lookup' is also used by a list function of similar purpose in
the Prelude, we use the qualified name here to disambiguate which
'lookup' to use.
On failure
But what happens if the key is not found? (Feel free to skip this section if
you don't care about errors yet) lookup will then fail, and how it fails
depends on what kind of failure you want. Haskell goes to great lengths to make
programming for failure flexible. For example, to fail with an exception:
*Main> Data.Map.lookup "Thor" days
*** Exception: user error (Data.Map.lookup: Key not found)
Which is the same as failing with an IO error. We can specify this specifically
with a type annotation, to say "fail with an IO error":
*Main> Data.Map.lookup "Thor" days :: IO String
*** Exception: user error (Data.Map.lookup: Key not found)
Often you might instead prefer that some special value is returned on failure:
*Main> Data.Map.lookup "Thors" days :: Maybe String
Nothing
Maybe you'd just like an empty list:
*Main> Data.Map.lookup "Thor" days :: [String]
[]
Finally, you can always provide an explicit default value:
*Main> findWithDefault "Not found" "Thor" days
"Not found"
Failure is entirely under your control!
Actions
Now, real programs interact with the outside world. They call functions which
do IO, as a side effect, and then return some value. In Haskell, functions with
side effects are often called actions, to distinguish them from normal
Haskell functions (which behave like mathematical functions: they take inputs
and return a result, with no side effects). Programming with side effects is
carefully handled in Haskell, again to control the possibility of errors, and
all functions which have side effects have a special type: the IO type.
For example, the function to print a string has the following type (and you can
ask the interpreter for the type interactively):
Prelude> :t putStr
putStr :: String -> IO ()
which tells you that this function takes a String as an argument, does some IO
side effect, and returns the null value. It is equivalent to the following C type:
void putStr(char *);
but with a bit of extra information, namely, that the function does some IO.
We would print out some element of our map like so:
main = print ("Tue in long form is " ++ findWithDefault "Not found" "Tue" days)
*Main> main
"Tue in long form is Tuesday"
An example
One of the classic programming puzzles for introducing real world problems is
the 'class grade' problem. You have a text file containing a list of student
names and their grades, and you'd like to extract various information and
display it. In deference to The Camel Book, we'll follow this lead, and start
with a file "grades", containing something like this:
Alonzo 70
Simon 94
Henk 79
Eugenio 69
Bob 80
Oleg 77
Philip 73
...
Student's appear multiple times, with entries for each of their subjects Let's
read this file, populate a map with the data, and print some statistical
information about the results. First thing to do is import some basic libraries:
import Data.Char
import Data.Maybe
import Data.List
import Data.Map hiding (map)
import Text.Printf
And now here's the entire program, to read the grades file, compute all the
averages, and print them:
main = do
src <- readFile "grades"
let pairs = map (split.words) (lines src)
grades = foldr insert empty pairs
mapM_ (draw grades) (sort (keys grades))
where
insert (s, g) = insertWith (++) s [g]
split [name,mark] = (name, read mark)
draw g s = printf "%s\t%s\tAverage: %f\n" s (show marks) avg
where
marks = findWithDefault (error "No such student") s g
avg = sum marks / fromIntegral (length marks) :: Double
Running it
How do we run this program? There's lots of ways:
Compile it to native code
$ ghc O Grades.hs
$ ./a.out
Alonzo [70.0,71.0] Average: 70.5
Bob [80.0,88.0] Average: 84.0
Eugenio [69.0,98.0] Average: 83.5
Henk [79.0,81.0] Average: 80.0
Oleg [77.0,68.0] Average: 72.5
Philip [73.0,71.0] Average: 72.0
Simon [94.0,83.0] Average: 88.5
Run it in the bytecode interpreter
$ runhaskell Grades.hs
Alonzo [70.0,71.0] Average: 70.5
Bob [80.0,88.0] Average: 84.0
Eugenio [69.0,98.0] Average: 83.5
Henk [79.0,81.0] Average: 80.0
Oleg [77.0,68.0] Average: 72.5
Philip [73.0,71.0] Average: 72.0
Simon [94.0,83.0] Average: 88.5
Execute it interactively
$ ghci Grades.hs
Prelude Main> main
Alonzo [70.0,71.0] Average: 70.5
Bob [80.0,88.0] Average: 84.0
Eugenio [69.0,98.0] Average: 83.5
Henk [79.0,81.0] Average: 80.0
Oleg [77.0,68.0] Average: 72.5
Philip [73.0,71.0] Average: 72.0
Simon [94.0,83.0] Average: 88.5
Make the script executable
Under unix, you can use the #! convention to make a script executable. Add the
following to the top of the source file:
#!/usr/bin/env runhaskell
And then set the script executable:
$ chmod +x Grades.hs
$ ./Grades.hs
Alonzo [70.0,71.0] Average: 70.5
Bob [80.0,88.0] Average: 84.0
Eugenio [69.0,98.0] Average: 83.5
Henk [79.0,81.0] Average: 80.0
Oleg [77.0,68.0] Average: 72.5
Philip [73.0,71.0] Average: 72.0
Simon [94.0,83.0] Average: 88.5
Next week: more IO!
/home ::
/haskell ::
permalink ::
rss
2006-12-15
Yesterday's article, about programming language syntax in an ideal world, provoked a small storm of controversy over on
reddit.
Unfortunately, since there were a couple of other Haskell articles on
the same page, the article's argument, that languages of the future
should look like pseudocode, was somewhat lost. A number of
readers assumed the article was a piece of Haskell advocacy, which wasn't the
intention. Instead, it was a plea for the syntax of future languages to
be optimised for writing by humans, and not for parsing by compilers.
That is:
The languages of the future should look like pseudocode
I'll cache the original example of "future syntax" here, so you can keep
it in mind:
fib 0 = 0
fib 1 = 1
fib n = fib (n1) + fib (n2)
Note that this is an argument for syntax, independent of whatever
language you want to use. And it ignores all efficiency issues in current
languages (that's a job for the compiler designers of the future).
Of course, programmers being very practical people took this as an
opportunity to demonstrate how their language of choice (and being
reddit, this means lisp) already provides this "future syntax". Well, sort of:
(define fib
(matchlambda
[0 0]
[1 1]
[n (+ (fib ( n 1)) (fib ( n 2)))]))
(define (fibo x)
(cond
((= 0 x) 0)
((= 1 x) 1)
(else (+ (fibo ( x 1))
(fibo ( x 2))))))
(define (fibo x)
(case x ((0 1) 1)
(else (+ (fibo ( x 1)) (fibo ( x 2))))))
(defmethod fib ((n (eql 0))) 0)
(defmethod fib ((n (eql 1))) 1)
(defmethod fib (n)
(+ (fib ( n 1)) (fib ( n 2))))
(define (fibo x)
(if (< x 2)
x
(+ (fibo ( x 1)) (fibo ( x 2)))))
(let ((fibs (makehashtable)))
(defun fib (n)
(if (< n 2) n
(or (gethash n fibs)
(setf (gethash n fibs)
(+ (fib ( n 1)) (fib ( n 2))))))))
Others focused on improving the particular languages I cited. The
best contribution on this issue came from Greg Buchholz, who illustrated neat
implementations in a variety of languages, far more flattering than
the examples I used (which, incidentally, were not my creation, they were
wikipedia's ....)
Perhaps unsurprisingly (in hindsight) plenty of readers assumed the
plea for future syntax was a plea for "Haskell now!", and talked at length
about efficiency concerns, and the lack of real world code/IO in the
examples:
all of these are monstrously inefficient...
> let fibs = 1 : 1 : zipWith (+) (fibs) (tail fibs)
> take 10 fibs
[1,1,2,3,5,8,13,21,34,55]
That's surely more efficient
And kinda memory eating :-)
You'd better choose something like
fib = fib_iter 1 1
where
fib_iter x y n | n == 1 = x
| otherwise = fib_iter y (x + y) (n 1)
there's a much more efficient way to do it in Hs using the language's
lazyness (the ability to recursively define infinite lists):
fibs = 1 : 1 : zipWith (+) (fibs) (tail fibs)
And, this being the interblargh, some people missed the point entirely:
This code doesn't have any resemblance to everyday programming however -
which generally involves trawling through databases, string matching, and I/O.
What is the point of showing examples which do not reflect reality of
most(?) programing tasks?
I've been a ASM/C++/C# developer for over a year now. But seriously, how
stupid is this? You are going to tout a language strength due to how
efficiently it can work out the Fibonacci sequence?
That is not a real-world example, and if you were to use a real-world
example, this code would be horrific.
So what can be made of all this? I think that it can be objectively stated that
current programming languages, for the most part, don't look like ideal
pseudocode languages, with syntax often bogged down in language features, and
parser hints. Overuse of keywords (defun,define,def,let rec), extraneous typing
hints (like $x), and parser hints (';' and '{' '}'), all obscure the underlying
code. Step back for a second: you're 10 years old, and you're hoping to become
a cool hacker. What syntax do you wish you could use?
Now Python, Haskell and the Lisp family seem to be the closest to
where we want to go, but they're not there yet. Even in Haskell, where years
were
spent
sorting
out
the
syntax, and as a result has some nice features for
human programmers (pattern matching on data structures, and lovely whitespace),
you can still end up writing awkward code (and this is entirely my own fault):
withGS :: (g -> (g -> LB ()) -> LB ()) -> ModuleT (GlobalPrivate g p) LB ()
withGS f = withMS $ \state writer ->
f (global state) $ \g -> writer $ state { global = g }
readGS :: ModuleT (GlobalPrivate g p) LB g
readGS = global `fmap` readMS
writeGS :: g -> ModuleT (GlobalPrivate g p) LB ()
writeGS g = withGS (\_ writer -> writer g)
forkLB :: LB a -> LB ()
forkLB f = (`liftLB` f) $ \g -> do
forkIO $ do
timeout (15 * 1000 * 1000) g
return ()
liftLB :: (IO a -> IO b) -> LB a -> LB b
liftLB f = LB . mapReaderT f . runLB
So finally, if you're in any way involved in the design of syntax for a new
language, or fixing up the syntax of existing languages, please think
hard about what you would have wished to program in, had you never been
exposed to the languages of today. Programming languages don't have to be ugly
and noisy! Concentrate on what the human author of the code would like to
write: code that is aesthetically pleasing and even beautiful, and don't
compromise your syntax, for the short term gain of making parsing easier.
/home ::
/haskell ::
permalink ::
rss
2006-12-14