home blog portfolio Ian Fisher

Week 7: Multithreading

Concurrency and parallelism

The program we have written so far have been sequential: each line executes after the next. Sequential programs are easy to write and think about. But sometimes we want to do work non-sequentially:

Concurrency is when different tasks execute independently in overlapping periods of time. If the tasks execute at the same time (because you have two or more CPU cores), then we have parallelism, but concurrency is possible and useful even if you only have one CPU core.

In fact, we've seen concurrency already: when we used fork to create a child process, the child and the parent executed concurrently (and possibly in parallel). It's not just processes that can execute concurrently, though: Linux allows you to spawn multiple threads of execution within the same process. And unlike processes, threads automatically share the same address space of values in memory, so it's easier for threads to communicate with each other.

Preemption and cooperation

Concurrent programming paradigms differ in whether or not the scheduler can forcibly interrupt a running task. Cooperative multitasking is when each task explicitly yields back to the scheduler. Preemptive multitasking is when tasks can be interrupted in the middle of doing work.

Multithreading on Linux (like process scheduling) is preemptive. If it weren't, then a buggy (or malicious) program with an infinite loop could freeze your system. But it makes it harder to write correct programs, because you must be prepared for your code to be interrupted at any point.

pthreads

The syscall to start a thread is called clone:

int clone(
    int (*fn)(void*),
    void* stack,
    int flags,
    void* arg,
    ...
);

But unless you are writing your own language runtime or threading library, you probably shouldn't call clone yourself. So this week, we are going to depart from making raw syscalls and instead use the standard threading library on Linux: pthreads.

(Technically, pthreads is the name of the interface, and the actual implementation on Linux is called NPTL, but people tend to just call it pthreads.)

The equivalent to clone in pthreads is pthread_create:

int pthread_create(
    pthread_t* thread,
    const pthread_attr_t* attr,
    void* (*fn)(void*),
    void* arg,
);

thread is an out parameter that will be filled in with the thread's ID (represented by an opaque pthread_t type). attr allows you to control some aspects of how the thread is created. fn is the function the thread executes when it starts. It takes in and returns a void pointer, which allows it to handle arbitrary data, since pthreads doesn't know ahead of time what data types your thread will use. arg is the argument passed to fn.

The thread begins executing immediately; unlike fork, pthread_create does not return into both threads: the main thread continues after pthread_create, while the spawned thread starts from fn.

Also unlike fork, there is not a hierarchy between the thread that was spawned and the thread that spawned it. All threads in a process are equals.

Use pthread_join to wait for a thread to finish:

int pthread_join(pthread_t thread, void** retval);

pthread_join will block until the thread exits. retval is an out parameter that is filled in with the void* return value of the thread.

pthread_exit is how you exit a thread (other than simply by returning from fn). Be careful not to call exit() in a multi-threading program unless you intend to exit the entire process!

[[noreturn]] void pthread_exit(void* retval);

pthreads has a function, pthread_cancel, to cancel a thread in the middle of execution. This is often a bad idea – if the thread is holding a lock, for instance, that lock will not be released, which could cause the whole program to deadlock.

Synchronization

Whenever we have concurrent access to a shared resource, we need a way to synchronize.

We've seen this already with file locks (multiple processes accessing the filesystem) and semaphores (multiple processes accessing shared memory).

Multithreading is just the same, except that the shared resource is memory (and the data structures it contains) itself.

Suppose you were writing a subroutine to truncate a string. You might have code like:

s->size = new_size;
s->data[s->size] = '\0';

What happens if the program is interrupted in between the two lines?

The string is in an inconsistent state – the size has been updated, but we haven't put a null terminator in the right place. If another thread starts running that has a reference to this string, it might behave incorrectly.

What you need is a way to synchronize access to the data structure – readers should not be able to read a data structure while a writer is writing to it.

Synchronization brings back a little sequential execution into concurrency. You don't want too much synchronization or else you lose the benefits of concurrency. But you may need some to protect the integrity of your data structures.

Mutexes

Mutexes are a simple synchronization mechanism. A mutex is either in a locked or unlocked state, and has two operations: acquire and release.

int pthread_mutex_init(pthread_mutex_t* mutex, pthread_mutexattr_t* attr);

Mutexes have three functions to lock and unlock. pthread_mutex_lock blocks until the mutex can be acquired, while pthread_mutex_trylock will return immediately with 0 if the mutex was acquired, or EBUSY otherwise. pthread_mutex_unlock releases the mutex and does not block.

int pthread_mutex_lock(pthread_mutex_t* mutex);
int pthread_mutex_trylock(pthread_mutex_t* mutex);
int pthread_mutex_unlock(pthread_mutex_t* mutex);

To clean up a mutex when you are done with it:

int pthread_mutex_destroy(pthread_mutex_t* mutex);

Returning to our example from the previous section, we could rewrite the string truncation operation safely as:

pthread_mutex_lock(s->lock);
s->size = new_size;
s->data[s->size] = '\0';
pthread_mutex_unlock(s->lock);

Beware: this only works if every place that uses the string also acquires the lock first. The operating system has no idea that s->lock is meant to protect the other fields of s, and it won't stop code from accessing them without taking the lock – that's up to the discipline of the programmer.

To spell things out completely, what happens if the truncation function is interrupted in the middle, as before? Another thread starts running, and it tries to access the string. Assuming it was written correctly, it will have to take the lock first. But the lock is already acquired by the first thread, so the second thread will block. Eventually, the first thread will be scheduled again, finish truncating, and release the lock.

In this way, we have sequenced the execution of the program: once the write starts, it must finish before any read happens. We've lost a bit of concurrency – but preserved the integrity of our data structures.

Read-write locks

A downside of mutexes is that they allow exactly one thread to access the value at a time. This restriction may be overly conservative: often, you have both readers and writers, and it's safe for multiple readers to have access as long as no writer does. Enter read-write locks.

Read-write locks allow for unlimited readers, or one writer (but not multiple writers, or a writer and a reader at the same time).

int pthread_rwlock_init(pthread_rwlock_t* rwlock, pthread_rwlockattr_t* attr);

There are separate functions to acquire the lock for reading and writing, with blocking and non-blocking variants:

int pthread_rwlock_rdlock(pthread_rwlock_t* rwlock);
int pthread_rwlock_tryrdlock(pthread_rwlock_t* rwlock);

int pthread_rwlock_wrlock(pthread_rwlock_t* rwlock);
int pthread_rwlock_trywrlock(pthread_rwlock_t* rwlock);

int pthread_rwlock_unlock(pthread_rwlock_t* rwlock);

And for clean-up:

int pthread_rwlock_destroy(pthread_rwlock_t* rwlock);

Why is concurrency hard?

Multithreading tips

Homework exercises

Note: In Python, you'll find threading primitives under the threading library.

  1. (★) What is (a) the syscall and (b) the library function for starting a new thread?
    Solution clone is the syscall; pthread_create is the library function.
  2. (★) What's the difference between pthread_exit and exit?
    Solution pthread_exit terminates a single thread, while exit terminates the entire process and every thread in it.
  3. (★) What does it mean that multithreading is preemptive?
    Solution Preemptive means that the kernel can interrupt a running thread and switch to a different one at arbitrary points in the program – unlike cooperative multitasking, where a task must explicitly yield control.
  4. (★★) Final project (database): It's time to make the database multithreaded! In week 4, you spun up multiple processes to do database operations. Convert these to use threads instead of processes. You can also convert it to use pthreads locks instead of file locks, but think about the pros and cons. Bonus: make the socket interface multithreaded, too.
  5. (★★) Final project (web server): It's time to make the web server multithreaded! It's a good idea for a web server to have multiple threads so it can serve more than one request simultaneously. Instead of using multiple processes, use multiple threads. You can choose whether to start a new thread for every request, or have a fixed pool of threads created at start-up.
  6. (★★) What happens if main returns (without calling exit) while other threads are still active?
    Solution main_returns.c shows that the program exits immediately without waiting for the other threads to finish.
  7. (★★) What syscalls do pthreads locks use under the hood? Write a program and run it under strace to find out.
    Solution locks.c shows that they call futex – but because of the design of the API, a syscall is only necessary in the contended case ("futex" stands for "fast userspace mutex").
  8. (★★) Write a program that demonstrates a deadlock due to acquiring locks in different orders.
    Solution deadlock.c
  9. (★★★) Read the list of thread-unsafe functions in man 7 pthreads. Are there any surprises on the list? Why would a function not be thread-safe?
    Solution It seems to me that some of these functions could be written in a thread-safe way, but the standard doesn't require it. Others, like strtok, have interfaces that are inherently thread-unsafe because they maintain implicit global state between calls. Surprising to me were asctime, getenv, and rand.
  10. (★★★) Many higher-level languages have restrictions on multithreading (e.g., until very recently Python code could not execute in multiple threads simultaneously). Why is this? What makes multithreading in high-level languages hard?
    Solution One thing that is hard about multithreading in high-level languages is that garbage collection has to work properly with code running in multiple threads. Much of PEP 703, the specification for multithreaded Python, is about how to change the garbage collector to make it work.