tr
The humble tr tool is surprisingly handy. It readily disposes of many little tasks:
-
conversion of newlines from one operating system to another
-
subsitution ciphers
-
extraction of, say, alphabetic characters from a file
-
changing lowercase to uppercase or vice versa
-
replacing consecutive spaces with a single space
Let’s look at a simplified tr, which only translates (it cannot delete nor squeeze) and only supports set of single characters (no ranges, escapes, classes).
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
void die(const char *err, ...) {
va_list params;
va_start(params, err);
vfprintf(stderr, err, params);
fputc('\n', stderr);
exit(1);
va_end(params);
}
int main(int argc, char **argv) {
if (argc < 2) die("tr: missing operand");
if (argc < 3) die("tr: missing operand after `%s'", argv[1]);
if (argc > 3) die("tr: extra operand `%s'", argv[2]);
char tab[256];
for(int i=0; i<256; i++) tab[i] = i;
char *q = argv[2];
for(char *p = argv[1]; *p; p++) {
tab[(unsigned int)*p] = *q;
if (*(q+1)) q++;
}
int c;
while(EOF != (c = getchar())) {
if (EOF == putchar(tab[c])) perror("tr"), exit(1);
}
if (ferror(stdin)) perror("tr"), exit(1);
return 0;
}
UTF-8
This time, instead of moving to a Go program that behaves identically, we take advantage of Go’s features to make our program more versatile. Our Go version supports UTF-8, despite resembling the C original.
We use a map instead of an array, because there are much more than 256 Unicode characters. Go thankfully provides a built-in map type; in C, we’d have to supply our own.
package main
import("bufio";"os";"fmt";"flag")
func die(s string, v... interface{}) {
fmt.Fprintf(os.Stderr, "tu: ");
fmt.Fprintf(os.Stderr, s, v...);
fmt.Fprintf(os.Stderr, "\n");
os.Exit(1)
}
func main() {
flag.Parse()
if 1 > flag.NArg() { die("missing operand"); }
if 2 > flag.NArg() { die("missing operand after `%s'", flag.Arg(0)); }
if 2 < flag.NArg() { die("extra operand after `%s'", flag.Arg(1)); }
tab := make(map[int]int)
set1 := []int(flag.Arg(0))
set2 := []int(flag.Arg(1))
j := 0
for i := 0; i < len(set1); i++ {
tab[set1[i]] = set2[j]
if j < len(set2) - 1 { j++ }
}
in := bufio.NewReader(os.Stdin)
out := bufio.NewWriter(os.Stdout)
flush := func() {
if er := out.Flush(); er != nil { die("flush: %s", er.String()) }
}
writeRune := func(r int) {
if _, er := out.WriteRune(r); er != nil { die("write: %s", er.String()) }
}
for done := false; !done; {
switch r,_,er := in.ReadRune(); er {
case os.EOF: done = true
case nil:
if s,found := tab[r]; found {
writeRune(s)
} else {
writeRune(r)
}
if '\n' == r { flush() }
default: die("%s: %s", os.Stdin.Name(), er.String())
}
}
flush()
}
Then if the binary is named tu:
$ tu 0123456789 〇一二三四五六七八九 <<< 31415 三一四一五
Full translation
A complete tr utility takes a bit more work. For a classic version, we can get by with manipulating arrays of size 256. For a Unicode-aware version, complications arise with set complements and ranges.