home blog portfolio Ian Fisher

Notes on epoll and io_uring

epoll

epoll is a set of Linux syscalls for asynchronous I/O. It's most useful for writing network servers – unlike with blocking I/O, high performance can be achieved in single threaded programs.

The basic structure of an epoll loop (error handling omitted):

// create the epoll instance
int epollfd = epoll_create(...);

// register file descriptors of interest
epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, event);

// main event loop
struct epoll_events events[MAX_EVENTS];
while (1) {
  int count = epoll_wait(epollfd, events, MAX_EVENTS, -1);
  for (int i = 0; i < count; i++) {
    struct epoll_event event = events[i];
    int fd = event.data.fd;
    // check fd to decide what to do (e.g., call `read`)
  }
}

epoll just notifies when a file descriptor is ready to be read from or written to; you still have to do the actual read or write call yourself.

Simple example of a TCP echo server here.

A few subtler points:

Further reading

io_uring

io_uring is a Linux kernel interface for making asynchronous syscalls. It aims to be performant: ring buffers shared between userspace and the kernel reduce syscall overhead, and multiple syscall requests can be dispatched at once.

Unlike epoll, the kernel will actually do the syscall for you, instead of just notifying you when the syscall is ready to be done.

At the syscall level, you call io_uring_setup and then mmap to set up the data structures, and io_uring_enter to submit requests to the kernel. But usually you use liburing rather than the low-level syscalls, unless you want to deal with the ring buffers and memory barriers yourself.

If you do want to deal with the ring buffers and memory barriers yourself, here are a few points of confusion that I struggled over:

Further reading

Reference:

LWN coverage:

Case studies: