njms.ca

Everything is a Stream

Published on 2023-12-13


A few years ago I was in my first year of university-level computer science and we were learning Java. One of the concepts my professor introduced to us was the "stream," which she defined as follows:

A stream is a sequence of bytes representing a flow of data from a source to a destination

A flash card that sits in my Anki deck to this day. This is notably not what java.util.Stream represents, so we were probably talking about something like java.io.InputStream. It was a while ago. But nonetheless, this strangely philosophical definition has stuck with me through the years.

A lot of things in computer science can be thought of as flows of information from a source to a destination, or more elaborately, a network of flows. I mean, you could make a pretty convincing argument that all of computing is input and output, and the network of channels through which that information travels. The act of computer programming is, then, a lot like building a watershed of data.

How a given tool chain models IO tells you a lot about its relationship to this philosophy.

For example, most of the C-like programming languages I've worked with (and for that matter, the entire Unix philosophy) represents IO as two parallel streams that never intersect. Working with IO is kind of like dipping a bucket into the input stream (stdin), manipulating it in some way, and then pouring it into the output stream (stdout). This model has its advantages and disadvantages. From the perspective of programming language design, this approach feels "honest," at least for Unix systems, because that's what IO means in Unix. It's pretty modular. It works well for practical purposes, but I feel like it misses the bigger picture. I've always been inclined to believe that IO is a continuous cycle of action and reaction, and I don't think this model of two parallel streams properly captures the nuances of that idea.

It feels too... Euclidean.

Haskell has a very different and interesting way of thinking about IO. Haskell is a "pure" functional programming language, which for our purposes means that functions can't have any side effects. IO is kind of inherently side-effectful, at least in our classic, two parallel stream model. Eventually, your going to need to produce a side effect. You're going to need to take your bucket (the program) and pour it into the stream (stdout). The way Haskell gets around this without violating its purity in principle is by adopting what feels to me like more of a whole-systems approach to IO. That is, side-effectful functions take the whole state of the computer and return a new state, which the computer then embodies. The function is a pure transformation of the whole machine's state. At the same time, it feels a like this model reduces IO to a black box.

The most common model of IO I encounter is a lot more boring. In the cases of Python and Node.js, IO is hidden behind a series of helper functions that obscure the true process with layers of abstraction. This is, in my opinion, the easiest way to do IO, but I don't think that's strictly a good thing.

I think this stuff matters, and that it's not just a mental exercise, or me being obtuse, because the way we model computing influences the way we write programs, and dare I say: the kinds of programs we write. This sort of thing comes up a lot in what may seem like more practical ways when we make programming language design decisions to help programmers write better programs, or to discourage them from bad behaviour. Rust does this, for example, by preventing people from writing programs that leak memory. This is a design decision; someone thought people shouldn't be allowed to make memory leaks and they made Rust to realize it. Guaranteeing no memory leaks is good for many reasons, but it also eliminates a lot of the flexibility that languages like C give you.

A long time ago, someone made a lot of very fundamental decisions about how we should conceptualize computing, and it's quite rare that anyone tries to go against them. Like, for example, that IO should be modelled as two parallel streams rather than, say, two ends of a continuous flow of information. But maybe that's because the stdin/stdout dichotomy just works, and people only want to build on things that work.

For the better or the worse, I got into computers for reasons beyond them "just working."

(it torments me every day)

Respond to this article

If you have thoughts you'd like to share, send me an email!

See here for ways to reach out