home blog portfolio Ian Fisher

CS644 week 9: Advanced I/O

The problem

This week's lecture will be a brief exploration of a common problem: What if you want to do lots of I/O at once?

Maybe you are writing a web server that wants to talk to many clients at once. Or an SSH client, which must read and write from the terminal (i.e., standard input and output) as well as from a network socket connected to the remote machine.

The root of the problem is that I/O operations can block: they can hang indefinitely while waiting for input to read, or in some cases for room to write output (e.g., a full pipe). And if you need to do I/O on multiple devices at once, you can't block on any one device because you don't know which one will be ready first.

This week, we will focus on network I/O and terminal I/O rather than disk I/O. Disk I/O is slow compared to other syscalls, or to regular function calls, but it will not block forever like network I/O or terminal I/O can. And with a single disk device, doing multiple I/O operations concurrently isn't necessarily going to be faster.

We've already learned one solution to this problem: multithreading. You can do separate I/O operations on different threads, and then each thread can block without interfering with the others.

But as we have seen, multithreading can greatly complicate your program. Fortunately, Linux has alternate solutions that do not require multiple threads.

O_NONBLOCK

open takes an O_NONBLOCK flag that causes subsequent read and write operations to not block. Instead, if the operation can't be completed immediately, EAGAIN or EWOULDBLOCK will be returned. Sockets can be made non-blocking by passing SOCK_NONBLOCK to socket, by using accept4 with the appropriate options, or by setting it with fcntl.

So, you could write something like:

int fds[N];
for (int i = 0; i < N; i++) {
    ssize_t nread = read(fds[i], buf, BUFSZ);
    if (nread < 0) {
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            continue
        } else {
            // handle I/O error
        }
    } else {
        // do something with data
    }
}

The problem with this solution is that you are busy polling – wasting CPU time waiting for something to happen. It would be much better to be put to sleep until I/O is ready to be done.

epoll

epoll is an API to do exactly that: inform the kernel of the file descriptors you are interested in, and then go to sleep until one of them is ready. It's a newer version of the old poll and select syscalls. To use epoll, you:

Conveniently, epoll_wait also takes a timeout parameter, so you can wait for only a finite amount of time.

io_uring

io_uring originated as an asynchronous interface for I/O, although now it is closer to an "alternative system-call interface for Linux that is inherently asynchronous" as it can do many things other than I/O. (source) Along with eBPF, it's one of the most significant new developments in the Linux kernel in the past decade.

The basis of io_uring is two ring buffers, a submission queue and a completion queue, that are shared between userspace and the kernel. Your program adds entries to the submission queue and reads off results from the completion queue. Unlike with epoll, the kernel will actually do the syscall for you, not just notify you when it's ready. Because of this, and because you can batch multiple syscalls together (including in chains of execution), io_uring programs can make many fewer syscalls. In fact, you can enable a mode to have the kernel poll for entries in the submission queue, and not have to do any syscalls at all!

Case studies

Homework exercises

None for this week. Get caught up on the old exercises and your final project!

Further reading