Celebrating 30th Anniversary of The First C++ Compiler: Let's Find Bugs in It

C++

This article was originally published at viva64.com. Republished by the authors’ permission.

Cfront is a C++ compiler which came into existence in 1983 and was developed by Bjarne Stroustrup. At that time it was known as «C with Classes». Cfront had a complete parser, symbol tables, and built a tree for each class, function, etc. Cfront was based on CPre. Cfront defined the language until circa 1990. Many of the obscure corner cases in C++ are related to the Cfront implementation limitations. The reason is that Cfront performed translation from C++ to C. In short, Cfront is a sacred artifact for a C++ programmer. So I just couldn’t help checking such a project.

Introduction

The idea to check Cfront occurred to me after reading an article, devoted to the 30-th anniversary of the first Release version of this compiler: “30 YEARS OF C++”. I contacted Bjarne Stroustrup to get the source code of Cfront. For some reason I thought it would be a long story of getting the code. But it turned out to be quite easy. This source code is open, available for everybody and can be found here: http://www.softwarepreservation.org/projects/c_plus_plus/

I’ve decided to check the first commercial version of Cfront, released in October, 1983 as it’s this version that turned 30 this year.

Bjarne warned me that checking Cfront could be troublesome:

Please remember this is *very* old software designed to run on a 1MB 1MHz machine and also used on original PCs (640KB). It was also done by one person (me) as only part of my full time job.

Indeed, to check such a project was impossible. At that time, for instance, to separate a class name from a function name they used a simple dot (.) instead of four dots (::). For example:

inline Pptr type.addrof() {
    return new ptr(PTR,this,0);
}

Our PVS-Studio analyzer wasn’t ready for this. So I had to ask our colleague to look through the code and correct such spots manually. It really helped, although there still were some troubles. When the analyzer was checking some fragments, at times it got quite confused and was refusing to do the analysis. Nevertheless, I did manage to check the project.

I should say right away, I haven’t found anything crucial. I think there are 3 reasons why PVS-Studio hasn’t found serious bugs:

  1. The project size is small. It’s just 100 KLOC in 143 files.
  2. The code is of high quality.
  3. PVS-Studio analyzer didn’t understand some fragments of the code.

«Talk is cheap. Show me the code» © Linus Torvalds

So, enough talking. I guess, the readers are here to see at least one error of THE Stroustrup. Let’s have a look at the code.

Fragment 1.

typedef class classdef * Pclass;
#define PERM(p) p->permanent=1
Pexpr expr.typ(Ptable tbl)
{
  ....
  Pclass cl;
  ....
  cl = (Pclass) nn->tp;
  PERM(cl);
  if (cl == 0) error('i',"%k %s'sT missing",CLASS,s);
  ....
}

PVS-Studio warning: V595 The ‘cl’ pointer was utilized before it was verified against nullptr. Check lines: 927, 928. expr.c 927

The ‘cl’ pointer can be equal to NULL. The if (cl == 0) check indicates that. What’s worse is that this pointer gets dereferenced before this check. It occurs in the PERM macro.

So if we open the macro, we get:

cl = (Pclass) nn->tp;
cl->permanent=1
if (cl == 0) error('i',"%k %s'sT missing",CLASS,s);

Fragment 2.

The same here. The pointer was dereferenced and only then it was checked:

Pname name.normalize(Pbase b, Pblock bl, bit cast)
{
  ....
  Pname n;
  Pname nn;
  TOK stc = b->b_sto;
  bit tpdf = b->b_typedef;
  bit inli = b->b_inline;
  bit virt = b->b_virtual;
  Pfct f;
  Pname nx;
  if (b == 0) error('i',"%d->N.normalize(0)",this);
  ....
}

PVS-Studio warning: V595 The ‘b’ pointer was utilized before it was verified against nullptr. Check lines: 608, 615. norm.c 608

Fragment 3.

int error(int t, loc* lc, char* s ...)
{
  ....
  if (in_error++)
    if (t!='t' || 4

PVS-Studio warning: V563 It is possible that this ‘else’ branch must apply to the previous ‘if’ statement. error.c 164

I am not sure if there is an error here or not, but the code is formatted incorrectly. ‘Else’ refers to the closest ‘if’. That’s why the code doesn’t execute in the way it should. If we format it, then we’ll have:

if (in_error++)
  if (t!='t' || 4

Fragment 4.

extern
genericerror(int n, char* s)
{
  fprintf(stderr,"%s\n",
          s?s:"error in generic library function",n);
  abort(111);
  return 0;
};

PVS-Studio warning: V576 Incorrect format. A different number of actual arguments is expected while calling ‘fprintf’ function. Expected: 3. Present: 4. generic.c 8

Note the format specifiers: “%s”. The string will be printed, but the ‘n’ variable won’t be used.

Miscellaneous:

Unfortunately (or may be vice versa) I won’t be able to show you anything else that could look like real errors. The analyzer issued some warnings, that could be worth looking at but they not really serious. For example, the analyzer didn’t like some global variable names:

extern int Nspy, Nn, Nbt, Nt, Ne, Ns, Nstr, Nc, Nl; PVS-Studio warning: V707 Giving short names to global variables is considered to be bad practice. It is suggested to rename ‘Nn’ variable. cfront.h 50

Another example: to print pointer values by means of fprintf() function Cfront uses the “%i” specificator. In the modern version of the language we have “%p”. But as far as I understand, there was no “%p” 30 years ago and the code was totally correct.

Thought-provoking observations

This pointer

My attention was drawn by the fact that previously ‘this’ pointer was used in a different way. Couple of examples:

expr.expr(TOK ba, Pexpr a, Pexpr b)
{
  register Pexpr p;
  if (this) goto ret;
  ....
  this = p;
  ....
}
inline toknode.~toknode()
{
  next = free_toks;
  free_toks = this;
  this = 0;
}

As you see, it wasn’t forbidden to change ‘this’ value. Now it’s prohibited not only to change the pointer, but to compare ‘this’ to null, as this comparison has completely lost its sense. (Still Comparing «this» Pointer to Null?)

This is the place for paranoia

I’ve also come across an interesting fragment. Nothing seems safe anymore. I liked this code fragment:

/* this is the place for paranoia */
if (this == 0) error('i',"0->Cdef.dcl(%d)",tbl);
if (base != CLASS) error('i',"Cdef.dcl(%d)",base);
if (cname == 0) error('i',"unNdC");
if (cname->tp != this) error('i',"badCdef");
if (tbl == 0) error('i',"Cdef.dcl(%n,0)",cname);
if (tbl->base != TABLE) error('i',"Cdef.dcl(%n,tbl=%d)",
                              cname,tbl->base);

Bjarne Stroustrup’s commentaries

  • Cfront was bootstrapped from Cpre, but it was a complete rewrite. There wasn’t a line of Cpre code in Cfront
  • The use-before-test-of-0 bad is of course bad, but curiously, the machine and OS i mostly used (DEC and research Unix) had page zero write protected, so that bug could not have been triggered without being caught.
  • The if-then-else bug (or not) is odd. I read the source, it’s not just misformatted, it’s wrong, but curiously, that doesn’t matter: the difference is just a slight difference in the error message used before terminating. No wonder I did not spot it.
  • Yes, I should have used more readable names. I hadn’t counted on having other people maintain this program for years (and I’m a poor typist).
  • Yes, there were no %p then
  • Yes, the rules for «this» changed
  • The paranoia test is in the compiler’s main loop. My thought was that if anything when wrong with the software or hardware, one of those tests were likely to fail. At least once, it caught the effect of a bug in the code generator used to build Cfront. I think all significant programs should have a «paranoia test» against «impossible» errors.

Conclusion:

It’s really hard to estimate significance of Cfront. It influenced the development of a whole sphere of programming and gave this world an everlasting C++ language which continues developing. I am really grateful to Bjarne for all the work that he has done in creating and developing C++. Thank you. In my turn I was really glad to dig into the code of this wonderful compiler.

I thank all our readers for attention and wish you to have less bugs.

Comments

    3,751

    Ropes — Fast Strings

    Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.