Week 9: Advanced I/O
The problem
This week's lecture will be a brief exploration of a common problem: What if you want to do lots of I/O at once?
Maybe you are writing a web server that wants to talk to many clients at once. Or an SSH client, which must read and write from the terminal (i.e., standard input and output) as well as from a network socket connected to the remote machine.
The root of the problem is that I/O operations can block: they can hang indefinitely while waiting for input to read, or in some cases for room to write output (e.g., a full pipe). And if you need to do I/O on multiple devices at once, you can't block on any one device because you don't know which one will be ready first.
This week, we will focus on network I/O and terminal I/O rather than disk I/O. Disk I/O is slow compared to other syscalls, or to regular function calls, but it will not block forever like network I/O or terminal I/O can. And with a single disk device, doing multiple I/O operations concurrently isn't necessarily going to be faster.
We've already learned one solution to this problem: multithreading. You can do separate I/O operations on different threads, and then each thread can block without interfering with the others.
But as we have seen, multithreading can greatly complicate your program. Fortunately, Linux has alternate solutions that do not require multiple threads.
O_NONBLOCK
open
takes an O_NONBLOCK
flag that causes subsequent read
and write
operations to not block. Instead, if the operation can't be completed immediately, EAGAIN
or EWOULDBLOCK
will be returned.
So, you could write something like:
int fds[N];
for (int i = 0; i < N; i++) {
ssize_t nread = read(fds[i], buf, BUFSZ);
if (nread < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
continue
} else {
// handle I/O error
}
} else {
// do something with data
}
}
The problem with this solution is that you are busy polling – wasting CPU time waiting for something to happen. It would be much better to be put to sleep until I/O is ready to be done.
epoll
epoll
is an API to do exactly that: inform the kernel of the file descriptors you are interested in, and then go to sleep until one of them is ready. It's a newer version of the old poll
and select
syscalls. To use epoll
, you:
- Create an
epoll
instance withepoll_create
. - Register your interest in certain file descriptors with
epoll_ctl
. - Wait for file descriptors to be ready with
epoll_wait
.
Conveniently, epoll_wait
also takes a timeout
parameter, so you can wait for only a finite amount of time.
io_uring
io_uring
originated as an asynchronous interface for I/O, although now it is closer to an "alternative system-call interface for Linux that is inherently asynchronous" as it can do many things other than I/O. (source) Along with eBPF, it's one of the most significant new developments in the Linux kernel in the past decade.
Some notable characteristics:
- Shared memory ring buffers allow for efficient submission of I/O operations – in some cases not even requiring a syscall.
- Unlike
epoll
,io_uring
will actually do the requested operation, instead of just notifying you that it's available. - Operations can be chained and executed in order.
Homework exercises
None for this week. Get caught up on the old exercises and your final project!
Further reading
- Efficient IO with io_uring by Jens Axboe, the creator of
io_uring
- io_uring guide by Nick Black
- LWN on io_uring