C to English to C in Perl

FAQ
Perl stuff
Mason stuff
Tk stuff
* RecDescent
stuff
C stuff
Randomness
Punditry
Links

C2Eng2C, a.k.a. DeCSS and ReCSS, a.k.a the soon to come First Amendment File System, is available from these parts. A few reviews:

It ain't Hemingway, but at least it's pronounceable.
--Tim O'Reilly.
Wow. It's like "What if ANSI C was designed by the COBOL committee?".
--Nathan Torkington.
So, you've heard of what I'm doing. You'd like to give it a go. Excellent. Obtaining and using Decss/Recss:

http://www.mit.edu/~ocschwar/decss2.pl
also available as http://www.mit.edu/~ocschwar/c2eng
http://www.mit.edu/~ocschwar/recss.pl
also available as http://www.mit.edu/~ocschwar/eng2c
Here is the RCS archive for the scripts. Serious users will want a look at it.

Also, you will need to go to CPAN for Parse::RecDescent and Lingua::EN::Numbers::Ordinate. You will need Perl 5.005 or better. Or, if you just want to see what it does, take a look at demunck.c and demunck.eng.

What is this all about?

Oh, boy, where would I begin? David Touretzky explains it better. Also, so does Emannuel Goldstein. Basically, an open source piece of computer source code is about to be censored off the net, and I had to do something about it. I also wrote an essay on the topic. Hal Abelson also gave the issue some attention. I wrote an article for The Perl Journal about this, and an elaboration on how I did all this is to be found here.

c2eng is not the first program to do this. For the Bernstein case, regarding encryption source code and export restrictions thereon, a program called c2txt2c was written by Leevi Marttila using Bison and flex. I chose to write mine from scratch because 1. when I started out, c2txt2c came with a disclaimer that it was only working for the Blowfish source code, 2. c2txt2c produces Dadaist sentences and is thus in my view too facetious to persuade mundanes with gavels, and 3. I don't know Bison.

Jonathan Baccash, of Princeton University, wrote another C to English demonstration, using SML/NJ. His style in the translation is better, but he doesn't try CPP directives. In the future I aim to write a new version of c2eng with some of his style incorporated in.

Notes regarding the use of Eng2c and C2eng:

-1. Observe the irregular hashbangs. They're what I have to use.

0. Both scripts just dump their stuff to STDOUT. The way to use them is to do c2eng foo.c > bar.eng and eng2c foo.eng > foo.eng.c

1. Both spew huge amounts of something into STDERR. Direct STDERR somewhere other than the output you want. I needed STDERR for debugging info. You may find it an interesting marker for the script's progress.

2. C2eng's output is not formatted much at all. Luckily, we have the fmt command on most Unix stations to give the output an amount of linewrapping. Eng2c is written not to discriminate between newlines and other whitespace, so reformat to your heart's content!

3. Eng2c's output is not indented at all. Luckily, we have the indent command on most Unix machines to give the output and amount of indenting. Apropos: If you take foo.c and go through this sequence:

0. indent -bacc -bad -bap -bbb -bc -bs -sob foo.c
1. c2eng foo.c > foo.eng
2. eng2c foo.eng > foo.eng.c
3. indent -bacc -bad -bap -bbb -bc -bs -sob foo.eng.c
4. diff -bwc foo.c foo.eng.c > big.diff
With the indentation sufficiently exacting, the diff file should only show differences that indicate bugs in my script. If you run into any of those, email me, please.

4. C2eng will take multiple input arguments and concatenate them all into one big file (Assuming all of the files will be parsed correctly. This is not yet guaranteed.) So, you can do c2eng foo.c bar.c > foobar.eng and it will DTRT.

5. Eng2c will soon have a reciprocal capability. The result, coupled with gzip and gunzip, will be a new form of a tarfile, defined by the First Amendment File System. You'll be able to do zcat distribution.eng.gz | eng2c and spread out a tree.

6. C2eng so far has shown that it can deal with comments and preprocessor directives between major elements (function definitions, global variables, et cetera) and between statements. When one of these interrupts C code at a finer spot, C2eng will barf. I'm working on a fix to that, but it will not be easy.

A Call for Help

I have a wish list: 1. If some kind soul would patch Eng2c to fill in item 5 in the list above, I would be much obliged. Otherwise, I'll do it Real Soon Now (TM). 2. For a harder project, I would like some kind soul (after contacting me first) to help me make C2eng and Eng2c more customizable for other styles of translation. I think I've figured out the fastest way to do it. Email me, please.

Home To top

Omri Schwarz, May 14, 2001