The Collapse of the UNIX Philosophy

Programming Open Source

In the first part of the article, I will enumerate lots of UNIX cheap and dirty hacks, and other various drawbacks. In the second part, we’ll talk about the UNIX philosophy. This article was written hastily, and I don’t want to further improve it. You’re lucky I wrote it. Therefore, I may provide some facts without source links.

Dirty hacks in UNIX started to arise when UNIX was released, and it was long before Windows came to the scene, I guess there wasn’t even Microsoft DOS at the time (I guess and I don’t bother to check, so check it yourself). If you don’t feel like reading, at least look through all the points. You’ll find something interesting. It’s not a complete list. These are simply the flaws I wanted to mention.

  • At the very beginning, make was a program one person wrote for himself and some of his friends. Without thinking twice, he made commands understand strings that start with a tab. I.e., a tab was treated differently from a space, which was extremely bad and unusual for UNIX and everywhere beyond it. He did it this way, as he didn’t think that make would be used by anyone except for this small group of people. Later came the idea that make was a good thing and it would be nice to include it into the standard UNIX package. In order not to break the already written makefiles, (meaning written by these ten people), he decided not to change anything. Well, that’s how it goes… We all suffer because of those ten guys.

  • Almost at the very beginning, there was no /usr folder in UNIX. All binaries were located in /bin and /sbin. Later, all the information could no longer fit on the disc that was at the disposal of UNIX authors (Dennis Ritchie and Ken Thompson). Therefore, they took another disc and created /usr folder, and another bin, and another sbin in it. They mounted the new disc to /usr. That’s where it went from. That’s how “the second hierarchy” of /usr appeared. At some moment, there also appeared “the third hierarchy” of /usr/local, and then /opt. The narrator of the story said “I’m still waiting for /opt/local to show up…” Here’s where I learned the story. It’s a more accurate version of what happened.

  • sbin originally meant “static bin”, rather than “superuser bin” as one might think. sbin used to contain static binaries. But then sbin began to contain dynamic binaries, and its name lost its meaning.

  • Windows is often criticized for having a registry. At the same time, people say the approach of UNIX-like systems (tons of configs) is better. By the way, once there appeared a “feature” in ext4, (it’s a big question whether it’s a bug), that made Gnome loose all its configs in the user’s working directory after a power loss. In the discussion of the bug report, ext4’s maintainer said that Gnome’ should have used something like a registry for storing the information. The sources: one and two. The name of ext4’s maintainer is Theodore Ts’o. Here’s what he said:

If you really care about making sure something is on disk, you have to use fsync or fdatasync. If you are about the performance overhead of fsync(), fdatasync() is much less heavyweight, if you can arrange to make sure that the size of the file doesn’t change often. You can do that via a binary database, that is grown in chunks, and rarely truncated. I’ll note that I use the GNOME desktop (which means the gnome panel, but I’m not a very major desktop user), and find .[a-zA-Z]\* -mtime 0 doesn’t show a large number of files. I’m guessing it’s certain badly written applications which are creating the «hundreds of dot files» that people are reporting become zero lengh, and if they are seeing it happen a lot, it must be because the dot files are getting updated very frequently. I don’t know what the bad applications are, but the people who complained about large number of state files disappearing should check into which application were involved, and try to figure out how often they are getting modified. As I said, if large number of files are getting frequently modified, it’s going to be bad for SSD’s as well, there are multiple reasons to fix badly written applications, even if 2.6.30 will have a fix for the most common cases. (Although some server folks may mount with a flag to disable it, since it will cost performance.)

That’s not to mention the fact that critical UNIX files (such as /etc/passwd) that are read upon every (!) call, say, ls -l, are plain text files. The system reads and parses these files again and again, after every single call! It would be much better to use a binary format. Or a database. Or a registry. At least for system files that are critical for performance.

Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called “PC loser-ing” because the PC is being coerced into “loser mode”, where “loser” is the affectionate name for “user” at MIT. The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing. — The Rise of «Worse is Better» By Richard Gabriel

In other words, if you’ve captured Ctrl+C from the user’s input, then the operating system, instead of just calling your handler, will interrupt the syscall that was running before and return EINTR error code from the kernel. As a result, the programmer will have to foresee this EINTR and that complicates the userspace code. At the cost of simplifying the kernel code. Yes, it should have been done differently – complicate the kernel code and simplify the userspace code. But the guy from the quote above didn’t give a damn about it. He actually said “I don’t care that everyone will suffer. The main thing is that the kernel code is simpler”.

It took off from there. The mentioned behaviour was later fixed in UNIX systems by adding the so-called SA_RESTART. So, they added a special flag instead of fixing everything. Not only have they added SA_RESTART, it does not always work! In particular, in GNU/Linux select, poll, nanosleep and others do not continue their operation after the captured interrupt even in case of SA_RESTART!

  • In general, specific circumstances that have arisen during the development of the original UNIX, had much influence on it. I’ve read somewhere that the cp command is called cp not because of copy but because UNIX was developed with the use of terminals that output characters very slowly. Therefore, it was faster to type cp than copy. I couldn’t find the link I saw a long time ago with the example of cp and copy, but here’s this link. > Commands — Are These Real Words? > The basic AIX commands (and all UNIX system commands) are, for the most > part, very short, cryptic, two-letter command names. Imagine back years ago, > when computers had only very slow teletype keyboards and paper “displays.” > (Some of us aren’t imagining, we’re remembering!) Imagine also, people who > didn’t like typing long commands because there was such a long delay between > commands and the computer response. If there were any mistakes, the user had > to retype the whole thing (especially aggravating for folks that type with only > two fingers!). > Also, some UNIX commands came from university students and researchers who > weren’t bound by usability standards (no rules, merely peer pressure). They > could write a very useful, clever command and name it anything—their own initials, > for example (awk by Aho, Weinberger, and Kernighan), or an acronym > (yacc, Yet Another Compiler-Compiler).

  • The names of UNIX utilities is another story. For example, grep comes from command g/re/p in the ed text editor. Well, cat comes from concatenation. I hope you already knew it. To top it all up, vmlinuz — gZipped LINUx with Virtual Memory support.

  • Out of the blue, printf is far from being the fastest way of displaying information on the screen or in a file. You didn’t know this, did you? The thing is that printf, as well as UNIX itself, was created not for the optimization of time but for the optimization of memory. printf parses a format string in the runtime every single time. That’s exactly why a special preprocessor has been invented on the H2O web server that shifts parsing of a format string to the compilation stage. The source can be found here.

  • When Ken Thompson, the author of UNIX (together with Dennis Ritchie), was asked what he would change in UNIX, he replied that he would name the creat (sic!) function as create. There are lots of sources, e.g. this one. No comments. I should note that Ken Thompson together with other developers of the original UNIX have created the Plan 9 system that fixes lots of UNIX flaws. This function is called create in it :) He did it :)

  • Here’s another quote:

A child which dies but is never waited for is not really gone in that it still consumes disk swap and system table space. This can make it impossible to create new processes. The bug can be noticed whenseveral & separators are given to the shell not followed by ancommand without an ampersand. Ordinarily things clean themselves upwhen an ordinary command is typed, but it is possible to get into asituation in which no commands are accepted, so no waits are done;the system is then hung.The fix, probably, is to have a new kind of fork which creates aprocess for which no wait is necessary (or possible); also to limit the number of active or inactive descendants allowed to a process. — The Source

This quote is from the early UNIX manual. Even then, the existence of zombie processes was considered a bug. But this bug was later simply forgotten. Needless to say, this problem was later solved anyway. So, there are tools in modern GNU/Linux for killing zombie processes, but few people know about them. You can’t get rid of a zombie with a regular kill. Everybody says about the existence of zombie processes that “It’s for design”.

  • Let’s talk a bit more about the mentioned C language. Actually, C was developed at the same time with UNIX. That’s why when criticizing UNIX, we should comment on C as well. There’re lots of things written about how bad C is, and I’m not going to repeat all of them. The syntax of types is bad, the processor is terrible, all these 4["string"], and sizeof ('a') != sizeof (char) (in C, not in C++!), and all these i++ + ++i, and while (*p++ = *q++) ; (the example from Stroustrup’s book, second revised edition), and so on and so forth.

    There’s just one thing I’d like to say. We still don’t know how to work conveniently with strings in C. The inconvenience of working with strings always leads to various security issues. This problem is still not solved! Here’s a relatively recent document from the committee of C. It discusses quite a questionable way of solving the problem with strings. The conclusion is that this method is bad. Year of publication: 2015. This means that there wasn’t a final solution even by 2015!

    Say nothing of the lack of a simple, user-friendly and multiplatform build system (not this autotools monster that does not support Windows, and not another cmake monster that supports Windows but it’s still a monster), the standard package manager, and a user-friendly, like npm (js) or carge (rust), portability library, with the help of which one could at least read content of the folder across all platforms, and at least the main website of C that would be the main entry point for all beginners and would contain not only documentation but also a brief manual on installing C tools on any platform, as well as a manual on creating a simple project in C, and would also contain a user-friendly list of C packages (that should be in the standard repository), and, most importantly, it would be a gathering place for the user community. I’ve even registered c-language.org domain hoping to create such a website there. Yeah, dream on! (I even have cpp-language.org, bwahaha!). But they don’t have all of this, even though all popular languages have it, except for C and C++. Even Haskell has this! And Rust!

    In Rust, this jackanapes, that is aimed at the same niche as C, there’s a single config that is also a project config, a builder config, and a config for the package manager (actually, cargo is a package manager and a build system at the same time). As a dependency for the given package, it is possible to specify another package located somewhere in GIT, including specifying a GITHUB repository as a direct dependency. An out-of-the-box support for documentation generation from markdown comments in source code. A package manager that uses SEMVER for versions. So, GIT, GITHUB, MARKDOWN, SEMVER. In other words, BUZZWORDS, BUZZWORDS and HIPSTERS’ BUZZWORDS. All of this out-of-the-box. You just go to their main website, and here it is on a silver platter. All of this works the same way on all platforms. Despite the fact that Rust is a system programming language, and not just some JavaScript. Despite the fact that we can play bytes in Rust. There’s also a pointer arithmetic in it. So why do Rusters have all of these hipster buzzwords, and we, C-guys, don’t? What a shame.

    I remember how one of my friends asked me where to look for list of packages for C/C++. I had to tell him that there’s no such place. He asked me whether C/C++ programmers must suffer. I had nothing to tell him.
    Oh, right. I forgot one more thing. Take a look at a prototype of a signal function in the form we see it in the C standard:

    void (*signal(int sig, void (*func)(int)))(int);
    

    Try to understand it.

  • Terminals in UNIX — weird legacy. The details are here.

  • Filenames in Unix file systems (ext2 and others) are simply a stream of bytes with no encoding. It depends on the locale what encoding they will be interpreted in. So, if we create a file in the operating system in one locale, and then try to take a look at its name in another locale, nothing good will come out of it. There’s no such problem in the Windows NTFS.

  • UNIX shell is worse than PHP! Yes, it is, didn’t you know? It’s a popular thing nowadays to criticize PHP. But UNIX shell is even worse… It becomes especially bad when we try to develop in it, as it’s not a full-fledged programming language. But it’s no good even for its niche, scripting common administrative tasks. The reasons for it are shell primitivity, general inefficient arrangement, legacy, tons of special cases, dirty hacks, a complete mess with quotation marks, backslashes, special characters and shell’s obsessiveness (just like the entire UNIX ) with plain text.

    • Let’s begin with a teaser. How can we recursively find all the files with \ name in a folder foo? The correct answer is: find foo -name '\\'. We can also do it like this: find foo -name \\\\. The latter way will cause lots of questions. Try to explain to a person who is not good at UNIX shell why exactly four backslashes are necessary here, not two or eight. We need to write four backslashes here as UNIX shell performs backslash expanding, and find does it too.
    • How to touch all files in foo (and its subfolders)? At first glance, we could do it like this: find foo | while read A; do touch $A; done. Well, at first glance. Actually, we can come up with 5 things that can ruin it all (and lead to security problems):
      • Filename can contain a backslash. Therefore, we should write read -r A instead of read A
      • Filename can contain a slash. That’s why we should write touch "$A" instead of touch $A
      • Filename can not only contain a space but also start with a space. So we need to write IFS="" read -r A instead of read -r A_
      • Filename can contain a newline, so we should use find foo -print0 and instead of IFS="" read -r A use IFS="" read -rd "" (I’m not really sure here)
      • Filename can start with a hyphen, so we need to write touch -- "$A" instead of touch "$A". The final version looks like this: bash find foo -print0 | while IFS="" read -rd "" A; do touch -- "$A"; done Cool, isn’t it? By the way, we didn’t take into account that POSIX does not guarantee that touch supports option --. Considering this fact, we’ll have to check each file on whether it starts with a hyphen (or that it does not start with a slash) and add ./ to the beginning. Do you understand now why configure scripts generated by autoconf are so large and difficult to read? Because configure needs to take into account all of this crap, including compatibility with various shells. In this example, I used the solution with pipe and loop. I could also use the solution with exec or xargs, but it wouldn’t be so eye-catching. (Well, okay. We know that the filename starts with foo, so it cannot start with a space of hyphen).
    • Let’s say we need to delete a file on host a@a. The name of the file is in a variable A. How can we do it? Perhaps, like this: ssh a@a rm -- "$A"? (As you might have noticed, we have already taken into account that the filename can contain spaces and start with a hyphen) Never ever do this! ssh is not chroot, or setsid, or nohup, or sudo or any other command that receives an exec-command (meaning a command for direct transmission of the execve family by system calls. ssh (just like su) receives a shell-command, i.e. a command for processing by shell (the exec-command and shell-command are of my own). ssh combines all the arguments into a string, and passes the string to the remote side and performs by shell there. Okay, maybe like this: ssh a@a 'rm — "$A"'? No, this command tries to find variable A on the remote side. But it’s not there, as variables cannot be passed via ssh. Well, maybe like this: ssh a@a "rm -- '$A'"? Nope, this won’t work if the filename contains a single quote. Anyway, the correct answer is: ssh a@a "rm -- $(printf '%q\n' "$A")" Convenient, don’t you think?
    • How to get to host a@a, and then to b@b from it, then to c@c, and then to d@d and delete the /foo file from it? Well, this one is simple: bash ssh a@a "ssh b@b \"ssh c@c \\\"ssh d@d \\\\\\\"rm /foo\\\\\\\"\\\"\"" Too many backslashes, huh? Well, if you don’t like it, let’s alternate single and double quotation marks: bash ssh a@a 'ssh b@b "ssh c@c '\''ssh d@d \"rm /foo\"'\''"' By the way, if we were to use Lisp instead of shell, and the ssh function would pass not a string but a parsed AST (abstract syntax tree) to the remote side, there wouldn’t be so many backslashes: lisp (ssh "a@a" '(ssh "b@b" '(ssh "c@c" '(ssh "d@d" '(rm "foo"))))) “Huh? What? Lisp? What Lisp?” Curious, aren’t you? Go read here. You can also refer to other articles by Paul Graham.
    • Let’s combine the previous two paragraphs. A name of the file is in a variable A. We need to go to a@a, and then to b@b, then toc@c, d@d and delete the file in variable A. I’m going to leave it for you as an exercise. (I don’t know how to do it. :) Well, I might if I thought about it)
    • echo is sort of designed for displaying strings on the screen. But the thing is, we can’t use it for this purpose if the string is a bit more complex than “Hello, world!” The only true way to print a random string (e.g. from variable A) is like this: printf '%s\\n' "$A".
    • Suppose you want to direct stdout and stderr cmd commands to /dev/null. The riddle: which of these six commands perform the task? bash cmd > /dev/null 2>&1 cmd 2>&1 > /dev/null { cmd > /dev/null; } 2>&1 { cmd 2>&1; } > /dev/null ( cmd > /dev/null ) 2>&1 ( cmd 2>&1 ) > /dev/null Turns out, the correct answer is: the 1st, the 4th and the 6th. And the 2nd, the 3rd, and the 5th don’t. And again, I’m leaving it to you to figure out the reason as an exercise. :)
  • Actually, this post appeared as a response to this one. It says that a special date is used in Windows as a driver timestamp. Instead of introducing a special attribute or checking the manufacturer. There’re lots of similar things in UNIX. The file is hidden only based on a dot at the beginning of the file instead of the special attribute. I was shocked when I first learnt about it (yeah, in those old times I installed Ubuntu for the first time). “What an idiots!”, I thought. But I’m used to it now. But thinking about it, it’s a terrible workaround. Then, shell decides whether it is a login shell based on the hyphen passed by the first character to argv0. This is a misuse of argv[0]. argv[0] is not meant for this purpose. Any other method would be better, e.g. using another argument or some environment variable.

  • In BSD sockets, user must change the byte order of the port number on its own. And all because someone has made a mistake in the UNIX kernel code, missing to foresee the byte order change. As a temporary hack, this someone fixed the user space instead of the kernel code. That’s how it goes. That’s where it came from to Windows (together with file /etc/hosts, aka C:\windows\system32\drivers\etc\hosts). The Source.

UNIX Philosophy

Some people think that UNIX is great and perfect, and that all its basic ideas («everything is a file», «everything is text» and so on) are amazing and form the so-called ”UNIX Philosophy”. I guess you’re starting to understand that it’s not quite so. Let’s review this “Unix philosophy”. Have a look at some points below. I’m not trying to say that all of these things should be canceled, I’m simply pointing at some drawbacks. * “Everything is text”. As we’ve already seen in the example with /etc/passwd, the widespread use of plain text can lead to performance problems. UNIX authors have actually invented a format for each system config (passwd, fstab, etc.). With their rules of escaping special characters. Surprised? /etc/fstab uses spaces and line breaks as separators. But what if folder names include, say, spaces? For this case, the format of fstab provides special escape characters for folder names. Turns out, any script reading fstab should be able to interpret the escape character. For example, the fstab-decode utility meant for this purpose (run as root). You didn’t know this, did you? Go fix your scripts. :) As a result, we need a parser for each system config. It would be much easier if we used JSON or XML for system configs. Or maybe some binary format. Especially for those configs that are constantly read by different programs. As a result, they need a good read rate (it’s higher in binary formats).

That’s not all I wanted to say about “everything is text”. Standard utilities provide the output in the form of a plain text. For each utility, we actually need a parser of its own. We often need to parse an output by using sed, grep, awk, etc. Each utility has its own options to determine what columns to display, what columns to sort by, and so on. It would be better if utilities provided the output in the form of XML, JSON, some binary format or anything else. To display this information in a user-friendly way and to work further with it, we could pipe the result to additional utilities that remove some columns, sort by some column, select the required strings, etc. They either display the result in the form of a nice table or pass it somewhere else. All of this is carried out in a multipurpose way that does not depend on the initial utility that generated the output. With no need to parse anything by regex. UNIX shell isn’t good at working with JSON and XML. But UNIX shell has plenty of other drawbacks. We’d better throw it away and use some other language that can work well with things like JSON and do lots of everything else.

Just imagine! Let’s say we need to delete all files in the current folder of size bigger than 1 kilobyte. Yes, I know that we can do this with find. But let’s suppose we definitely need to do this via ls (and without xargs). How to do it? Like this:

LC_ALL=C ls -l | while read -r MODE LINKS USER GROUP SIZE M D Y FILE; do if [ "$SIZE" -gt 1024 ]; then rm -- "$FILE"; fi; done

We need LC_ALL here to be sure that the date will take exactly three words in the output of ls. This solution not only looks ugly, but also has a number of drawbacks. Firstly, it will not work if the file name contains a line break, or begins with a space. Next, we need to explicitly list the names of all ls columns or at least remember where the ones we need (i.e. SIZE and FILE) are located. If we make a mistake in the order of columns, the error will become apparent only during the runtime. When we delete the wrong files. :)

How would the solution look like in the perfect world I’m suggesting? Something like this: ls | grep 'size > 1kb' | rm. It’s short, and, most importantly, you can see the meaning in code and it’s impossible to make a mistake. Let’s see. In my world, ls always gives all the information. We don’t need a special -l option for this. When it’s necessary to delete all columns and leave the filename only , we can do this with a special utility we should direct the ls output to. Thus, ls provides a lift of files in some structured form, say, JSON. This representation “knows” names of columns and their types. i.e. that is a string, a number or something else. Then, this output is piped to grep that, in my world, selects the necessary strings from JSON. JSON “knows” field names, so grep “understands” what “size” means here. Moreover, JSON contains information about the type of size field. It contains information that it’s a number, and even that it’s not just a number but a file size. Therefore, we can compare it to 1kb. Next, grep pipes the output to rm. rm “sees” that it’s going to receive files. Yes, JSON also stores information about the type of these strings, that they’re files. rm deletes them. JSON is also responsible for correct special characters escaping. That’s why files with special characters “simply work”. Cool, right? I took the idea from here. It should also be mentioned that something of the kind is implemented in Windows Powershell.

  • UNIX shell. Another basic idea of UNIX. I’ve already mentioned small disadvantages of UNIX shell in the first part of the article. What’s “cool” about UNIX shell? At the moment of its release (it was a long time ago), it was much stronger than command interpreters embedded in other operating systems. It allowed to write more powerful scripts. Seems like the UNIX shell was the most powerful scripting language when it was released. Because there were no sane scripting languages back in the days (meaning the ones that would allow full-fledged programming and not just scripting). It was later when a programmer named Larry Wall noticed that UNIX shell lacked a lot to be considered a good programming language. He decided to combine the simplicity of UNIX shell with the ability of full-fledged programming from C. He created Perl. Yes, Perl and other following script programming languages actually replaced UNIX shell. Even Rob Pike, one of the authors (to my mind) of “UNIX philosophy”, confirms the fact. Answering a question about “one tool for one job” here, he said: “Those days are dead and gone and the eulogy was delivered by Perl”. Actually, I believe that this phrase refers to a typical use of UNIX shell, i.e. to a situation of combining a big number of small tools in a shell script. No, says Pike, simply use Perl.

    I’m not done talking about UNIX shell. Let’s review the example of shell code I’ve already provided above: shell find foo -print0 | while IFS="" read -rd "" A; do touch -- "$A"; done We call touch in the loop here (I do know that the code can be re-written with xargs, so that touch is called only once; but let’s forget about it for now, okay?). We call touch in the loop! This means there is a new process for each file. This is extremely inefficient. Code in any other programming language will work faster than this one. But when UNIX shell appeared on the scene, it was one of the few languages that allowed writing this action in one string.

    Long story short, we should use any other script programming language instead of UNIX shell. A language that will be suitable not only for scripting, but for real programming as well. A language that does not run a new process every time we need to touch a file. Perhaps, we’ll have to borrow some features from shell to simplify things even more.
  • Simplicity. I’m not talking about shell and combining lots of simple utilities from shell (it was the previous point). I’m talking about simplicity I general. Using simple tools. Say, editing a picture with sed. Yes, yes. To convert jpg into ppm with the help of the command line. Then, edit the picture using the graphic editor, grep, and sed. And then convert it back into jpg. Yes, we can do this. However, it’s often better to do this with Photoshop or GIMP, although they’re large, integrated programs. Not in the UNIX style.

I guess I’ll finish adding points now. Yep, that’s enough. There’re some ideas in UNIX I really like. Like this one: “a program should do one thing, and do it well”. But not within the context of shell. I guess you’ve realized by now that I don’t like shell. (Once again, I think that in the interview provided above Rob Pike took the principle “a program should do one thing, and do it well” in the context of shell and therefore rejected it) I’m talking about this principle in its essence. For example, a console mail client should not have a built-in text editor; it should just run some external editor. Or the principle, by which one must write a program’s console kernel before its graphical user interface.

Now, the general picture. Once there was UNIX. It was a breakthrough at the time. It was better than its competitors in lots of things. UNIX had a lot of ideas. As any other operating system, UNIX required from programmers compliance with certain principles when writing application programs. The fundamental ideas got the name of “UNIX philosophy”. One of the people who formulated the UNIX philosophy was the already mentioned Rob Pike. He did this in his presentation titled “UNIX Style, or cat -v Considered Harmful”. After the presentation, Rob Pike and Brian Kerninghan published an article based on the presentation. They told us that, say, the purpose of cat is concatenation and nothing else. Perhaps, Rob Pike was the one to invent the “UNIX philosophy”. cat-v.org website was named after this presentation. Read it, it’s a very interesting website.

But then, many years later, Pike made two more presentations, in which, I think, he abandoned his philosophy. You got it, fans? Your idol gave up his own philosophy. You can go home now. In the first presentation“Systems Software Research is Irrelevant” Pike complains that no one writes new operating systems anymore. Even if they do, it’s another UNIX: “New operating systems today tend to be just ways of reimplementing Unix. If they have a novel architecture — and some do — the first thing to build is the Unix emulation layer. How can operating systems research be relevant when the resulting operating systems are all indistinguishable”

The second Pike’s presentation is titled “The Good, the Bad, and the Ugly: The Unix Legacy”. Pike says that flat text is not multipurpose; it’s good but it doesn’t always work: “What makes the system good at what it’s good at is also what makes it bad at what it’s bad at. Its strengths are also its weaknesses. A simple example: flat text files. Amazing expressive power, huge convenience, but serious problems in pushing past a prototype level of performance or packaging. Compare the famous spell pipeline with an interactive spell-checker”. Then: “C hasn’t changed much since the 1970s… And — let’s face it — it’s ugly”. Then Pike admits the limitation of pipes connecting simple utilities, as well as the limitation of regex.

UNIX was genius at the time of its introduction. Especially if we remember what tools the authors of UNIX had. They didn’t have a ready-made UNIX to develop UNIX on. They didn’t have IDE. They even developed in assembler at the beginning. I guess the only things they had were an assembler and a text editor.

At a certain point, people standing at the origins of UNIX began to write a new operating system: Plan 9. Including Ken Thompson, Dennis Ritchie and Rob Pike. Taking into account the numerous mistakes of UNIX. However, no one raises Plan 9 on a pedestal. Pike mentions Plan 9 in “Systems Software Research is Irrelevant”, but still encourages us to write new operating systems.

James Hague, a veteran of programming (has been in programming since the eighties), writes the following: “What I was trying to get across is that if you romanticize Unix, if you view it as a thing of perfection, then you lose your ability to imagine better alternatives and become blind to potentially dramatic shifts in thinking”. The Source. Read this article and the one he refers to: «Free Your Technical Aesthetic from the 1970s». Actually, if you like my article, you’ll like his blog too.

So, I do not want to say that UNIX – is a bad system. I’m just drawing your attention to the fact that it has tons of drawbacks, just like other systems do. I also do not cancel the “UNIX philosophy”, just trying to say that it’s not an absolute. My post is mostly referred to UNIX and GNU/Linux fans. The provocative tone has been used just to attract your attention.

Comments

  1. Short note about JSON: it’s meant as a data exchange format over networks, not configuration. It is a really bad idea to use it for config files. - no comments possible - very picky syntax, easy to break it - not very well readable See https://arp242.net/weblog/json_as_configuration_files-_please_dont for more.
  2. Unix sucks; but everything else sucks more.

    I wish we could do away with all the legacy and “redo” Unix, sort of like Plan9 attempted. It’s ridiculous that I can’t map control+i in my shell or Vim without remapping the Tab key.

    But there’s a reason Plan9 failed to get any traction… Breaking backwards compatibility is a pain.

    I don’t know what the plan forward is, but I do know that ranting about the same things over-and-over again isn’t it. Many of the same points were addressed in the Unix Hater’s Handbook from 1993, for example, and are hardly new insights.

    As a final point, this article conflates “Unix philosophy” with “Unix implementation(s)”. Those are two very different things. The Unix philosophy is sound, but I agree that some implementations are less so (in part due to the long history).

  3. Pike is full of shit himself. Syntax highlighting is evil, golang stupid design, who needs tunable GC and so on and so on…
  4. his has got to be the most uninformed article on unix that I’ve read in a decade…naaaa, ever. Unix does have its flaws, to be sure, but the author manages to get all of them wrong. All the ones I’ve read, at least. I quit in disgust at about 75%. Wanna read about about bits of bad design examples in Unix? Go fetch the venerable pages of the “Unix haters” forum - at least they understood what they wrote about and did their research (rather then going on with and on “I do not remember where I read but some guy down the road told me…” And, I do not what the Unix “philosophy” (assuming there is even one) has obviously nothing to do with this crap.
  5. Is this article supposed to be an elaborate joke, or some other kind of performance art? I don’t mean to offend, but I’m genuinely confused. It’s like hearing Florence Foster Jenkins explain Unix design.
  6. And don’t forget the fact, that UNIX system failed to keep T-Rex into park. It was even on TV, I think it was a documentary, but go look it up yourself.
3,751

Ropes — Fast Strings

Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.