cksum
I had trouble unearthing the CRC used by GNU’s cksum. Its manpage mentions the particular algorithm is a POSIX standard. On seeing cksum print 32-bit integers on test inputs, I assumed the algorithm was CRC-32, the most popular 32-bit CRC.
I eventually realized I should have searched for “posix cksum”. Although it shares the same polynomial, the algorithm differs slightly from CRC-32.
Here’s a simple C implementation (typical implementations use a lookup table for the polynomial reduction):
#include<stdint.h>
#include<stdio.h>
int main() {
unsigned int p = 0x04C11DB7, r = 0;
void f(char c) {
for(int i = 7; i >= 0; i--) {
int msb = r & (1<<31);
r <<= 1;
if (msb) r = r ^ p;
}
r ^= c;
}
int c, len = 0;
while(EOF != (c = getchar())) {
len++;
f(c);
}
int n = len;
do {
f(n & 0xff);
n >>= 8;
} while(n);
f(0), f(0), f(0), f(0);
printf("%u %u\n", ~r, len);
return 0;
}
and a Go version, with support for filename arguments:
package main
import ("bufio";"flag";"fmt";"io";"os")
func main() {
cksum := func(file *os.File) {
in := bufio.NewReader(file)
var p, r uint
p = 0x04C11DB7
r = 0
f := func(b byte) {
for i := 7; i >= 0; i-- {
msb := r & (1<<31)
r = r<<1
if msb != 0 { r = r^p }
}
r ^= uint(b)
}
n := 0
for done := false; !done; {
switch b,er := in.ReadByte(); er {
case io.EOF: done = true
case nil:
f(b)
n++
default:
println("cksum:", file.Name() + ":", er)
return
}
}
for m := n;; {
f(byte(m) & 0xff)
m = m >> 8
if m == 0 { break }
}
f(0); f(0); f(0); f(0)
fmt.Print(^r, n);
if file == os.Stdin {
fmt.Println()
} else {
fmt.Printf(" %v\n", file.Name())
}
}
flag.Parse()
if 0 == flag.NArg() {
cksum(os.Stdin)
return
}
for i := 0; i < flag.NArg(); i++ {
if flag.Arg(i) == "-" {
cksum(os.Stdin)
continue
}
f,er := os.Open(flag.Arg(i))
if f == nil {
println("cksum:", er)
continue
}
cksum(f)
f.Close()
}
}
Bits
Go seems at least as good as C when it comes to bit-fiddling. One might argue it is better, because of little differences:
-
the operator precedence is more natural
-
bitwise NOT is "^", rather than "~"
-
the dedicated "&^" operator combines AND and NOT.
-
shifts are defined in the specification, rather than left to the implementation; shifts are arithmetic when the first operand is signed, and logical otherwise.
-
lack of implicit casting reduces bugs
The CRC32 package
A reader noted that Go has a crc32 package, which can compute more commonly encountered CRC-32 checksums.
Unfortunately, I can’t remember how cksum differs (perhaps it goes MSB first instead of LSB first or something like that); it’s been a while since I looked into it, and I’m busy.