CS644 week 1: File I/O, part 1

Slides

Course introduction

A brief history lesson

Unix was a 1970s-era multi-user time-sharing operating system from Bell Labs that was hugely influential on OS design. Much of what we study in this course dates from the early days of Unix.

Unix spawned many direct and indirect descendants, among them Linux, an operating system that began in the early 1990s as a hobby project by Linus Torvalds and is now the dominant open-source general-purpose operating system, used on everything from cloud servers to supercomputers to mobile phones to microcontrollers.

When people say "Unix" today, they most often mean to evoke the common characteristics of the whole family of Unix-like operating systems, which besides Linux notably includes macOS, the BSDs, and proprietary systems like IBM's AIX and HP's HP-UX.

The kernel, userspace, and syscalls

Modern computer processors have a concept of privilege level. The kernel is the part of the operating system that runs with elevated privileges. Regular programs run with restricted privileges. Code that is running in an unprivileged mode on the processor is said to be in userspace.

A userspace process cannot directly write to disk, receive keyboard input, communicate with other processes, or even allocate memory. To do so, it must go through the kernel.

Syscalls are the primary interface between the kernel and userspace programs. A syscall looks like a function call, but instead of jumping to another point in the program, it switches out of the program entirely and into the operating system.

Syscalls are particular to an operating system. The syscalls that are available on Linux are different from those available on macOS (and very different from those available on Windows).

This course examines Linux syscalls in great detail – most weeks of the course will involve looking through the set of syscalls that deal with a certain concept, like the filesystem or process control. We do this in order to learn, not because using low-level OS APIs is a good idea for most programs. Most programs should use platform-independent APIs whenever they are available. Nonetheless, syscalls are ultimately how a program accomplishes anything useful other than pure computation, and so understanding the syscall interface is a critical part of understanding software in general.

Getting help: Python docs and man pages

Python's standard library has excellent documentation which you can access online at https://docs.python.org/3/library/.

For this course, we will be using low-level Python functions, mainly from the os module, which are thin wrappers around the underlying syscalls. The syscalls themselves are documented in man pages that are available from the command-line on our shared server and which often include additional details beyond the Python docs.

From a shell, run:

$ man 2 open

to see the documentation for the open syscall. (The "2" means "section 2 of the man pages", which is the section where syscalls are documented. Some names are in multiple sections, so specifying the "2" ensures you get the right page.)

Syscall signatures in the man pages are specified in C.

int open(const char* pathname, int flags);

If you are unfamiliar with C syntax:

int – The value returned by this function is an integer.
open – The name of this function is "open".
const char* pathname – The first parameter is called "pathname" and has the type const char*, which in C represents an array of bytes (corresponding to str or bytes in Python).
int flags – The second parameter is called "flags" and is an integer.

Be careful to run the man command on the shared server or another Linux environment, and not on, e.g., your macOS laptop – man pages also exist on macOS, but macOS syscalls have important differences from Linux syscalls.

The filesystem, with fresh eyes

The filesystem is a global, hierarchical, mutable store of unstructured data.

global – There is one filesystem that all programs have access to.
hierarchical – The filesystem is a hierarchical tree, where the interior nodes are directories and the leaf nodes are files.
mutable – The contents of files can be changed, and the files themselves can be renamed, deleted, and moved. New files and directories can be created.
unstructured – Regular files on Linux are just sequences of bytes. The filesystem itself does not know what they are meant to represent. It doesn't even distinguish between text files and binary files. File extensions like .pdf and .txt are mnemonics for human use. The filesystem doesn't care.

This is a simplified picture that we will flesh out in the coming weeks. Notable omissions include file permissions, symlinks, and namespaces. Nonetheless, we can already infer a couple of consequences from this description:

Because the filesystem is global, it can be used to share data between programs – including programs that were not specifically designed to work with each other.
Because the filesystem is global and mutable, we have to consider the possibility of concurrency bugs and race conditions – what happens if two programs write to the same file at once?

As we shall see, the filesystem is a powerful tool, but careful programs cannot be cavalier in how they use it.

Exercise: Duplicating a file

Exercise: Write a function duplicate(inpath, outpath) that reads the file at inpath and writes its contents to outpath.

To do so, we will need to make use of four file APIs:

# https://docs.python.org/3/library/os.html#os.open
os.open(path: os.PathLike, flags: int) -> int

# https://docs.python.org/3/library/os.html#os.read
os.read(fd: int, n: int) -> bytes

# https://docs.python.org/3/library/os.html#os.write
os.write(fd: int, buf: bytes) -> int

# https://docs.python.org/3/library/os.html#os.close
os.close(fd: int) -> None

os.open (not to be confused with the global open function) opens the file at path and returns a file descriptor, an integer that serves as a handle to the open file.

The first argument to os.open is a pathname. Pathnames in Linux look like /home/alice/hello.txt. Pathnames that begin with / are called absolute; pathnames that do not are called relative and depend on the current working directory. alice/hello.txt is a relative path; if you are in /home/ then it is equivalent /home/alice/hello.txt, but if you are in /usr/shared/ then it is /usr/shared/alice/hello.txt instead. The special .. component means 'the parent of the current directory', so ../bob/hello.txt when inside of /home/alice is equal to /home/bob/hello.txt.

The second argument to os.open controls how the file is opened. You can pass:

os.O_RDONLY to open the file for reading only.
os.O_WRONLY to open the file for writing only.
os.O_RDWR to open the file for reading and writing.
os.O_CREAT to create the file if it does not exist.

There are a few other flags not covered here – refer to man 2 open. To pass more flags, use the bitwise 'or' operator: os.O_WRONLY | os.O_CREAT, for instance.

os.read reads up to n bytes from the file descriptor. If there are fewer than n bytes left in the file, it will return whatever is left. If we are already at the end of the file, the empty bytestring is returned.

Python distinguishes between binary data, which is represented by the bytes type, and textual data, which is represented by the str type.

Use decode("ascii") to convert bytes to str and encode("ascii") to convert str to bytes. (Replace "ascii" with whatever encoding your data actually uses.)

You can also construct a bytes object directly, either by passing in a list of bytes (bytes([65, 66, 67])) or with a bytestring literal (b"ABC" – only works for ASCII characters)

os.write writes the bytes to the file. It returns the number of bytes written, which for regular files will equal len(buf) except under extraordinary circumstances (like running out of disk space mid-write).

os.close closes the open file, after which the file descriptor should not be used. The OS limits how many files a process can have open at once, so it is prudent to close files when you are done with them.

Follow-up

Extend duplicate to support duplicating only part of the file. Add range_start and range_end parameters. Can you do it without reading more than range_end - range_start + 1 bytes? (Hint: Look at os.lseek.)
Change duplicate to throw an error if outpath already exists. (Hint: You will need to pass a flag to os.open.)
Experiment with different block sizes. As the block size increases, does the function run faster or slower? What is the optimal block size?

Exercise: Simultaneous read/write

Exercise: What happens if one process is writing to a file while another is reading from it? Are the writes atomic, or is it possible for the reader to observe a partial write? Formulate a hypothesis, then devise a pair of programs to test it.

Final project milestone

Let's use filesystem APIs to implement the core operations of the key-value server. Your program should have two commands: get and set. The set command takes a key and a value and writes it to disk, and the get command takes a key and prints the value, if it exists. Store all data in a single file (hard-coded path is fine). Use whatever data format you want – this is not a class on data structures or algorithms, so we won't focus on making the key-value server efficient.

Bonus material

Using os.read, write a Python function or class that reads a file line-by-line.
os.readv and os.writev – read/write from multiple buffers at once
os.pread and os.pwrite – read/write from a fixed offset
"Files are hard" by Dan Luu – How do you avoid malformed data in the case of a crash?

Table of contents

CS644 week 1: File I/O, part 1

Course introduction

A brief history lesson

The kernel, userspace, and syscalls

Getting help: Python docs and man pages

The filesystem, with fresh eyes

Exercise: Duplicating a file

Follow-up

Exercise: Simultaneous read/write

Final project milestone

Bonus material