“You are clever, O Samana.”

CS644 week 1: File I/O, part 1

Course introduction

See the syllabus

A brief history lesson

Unix was a 1970s-era multi-user time-sharing operating system from Bell Labs that was hugely influential on OS design. Much of what we study in this course dates from the early days of Unix.

Unix spawned many direct and indirect descendants, among them Linux, an operating system that began in the early 1990s as a hobby project by Linus Torvalds and is now the dominant open-source general-purpose operating system, used on everything from cloud servers to supercomputers to mobile phones to microcontrollers.

When people say "Unix" today, they most often mean to evoke the common characteristics of the whole family of Unix-like operating systems, which besides Linux notably includes macOS, the BSDs, and proprietary systems like IBM's AIX and HP's HP-UX.

The kernel, userspace, and syscalls

Modern computer processors have a concept of privilege level. The kernel is the part of the operating system that runs with elevated privileges. Regular programs run with restricted privileges. Code that is running in an unprivileged mode on the processor is said to be in userspace.

A userspace process cannot directly write to disk, receive keyboard input, communicate with other processes, or even allocate memory. To do so, it must go through the kernel.

Syscalls are the primary interface between the kernel and userspace programs. A syscall looks like a function call, but instead of jumping to another point in the program, it switches out of the program entirely and into the operating system.

Syscalls are particular to an operating system. The syscalls that are available on Linux are different from those available on macOS (and very different from those available on Windows).

This course examines Linux syscalls in great detail – most weeks of the course will involve looking through the set of syscalls that deal with a certain concept, like the filesystem or process control. We do this in order to learn, not because using low-level OS APIs is a good idea for most programs. Most programs should use platform-independent APIs whenever they are available. Nonetheless, syscalls are ultimately how a program accomplishes anything useful other than pure computation, and so understanding the syscall interface is a critical part of understanding software in general.

Getting help: Python docs and man pages

Python's standard library has excellent documentation which you can access online at https://docs.python.org/3/library/.

For this course, we will be using low-level Python functions, mainly from the os module, which are thin wrappers around the underlying syscalls. The syscalls themselves are documented in man pages that are available from the command-line on our shared server and which often include additional details beyond the Python docs.

From a shell, run:

$ man 2 open

to see the documentation for the open syscall. (The "2" means "section 2 of the man pages", which is the section where syscalls are documented. Some names are in multiple sections, so specifying the "2" ensures you get the right page.)

Syscall signatures in the man pages are specified in C.

int open(const char* pathname, int flags);

If you are unfamiliar with C syntax:

Be careful to run the man command on the shared server or another Linux environment, and not on, e.g., your macOS laptop – man pages also exist on macOS, but macOS syscalls have important differences from Linux syscalls.

The filesystem, with fresh eyes

The filesystem is a global, hierarchical, mutable store of unstructured data.

This is a simplified picture that we will flesh out in the coming weeks. Notable omissions include file permissions, symlinks, and namespaces. Nonetheless, we can already infer a couple of consequences from this description:

As we shall see, the filesystem is a powerful tool, but careful programs cannot be cavalier in how they use it.

Exercise: Duplicating a file

Exercise: Write a function duplicate(inpath, outpath) that reads the file at inpath and writes its contents to outpath.


To do so, we will need to make use of four file APIs:

# https://docs.python.org/3/library/os.html#os.open
os.open(path: os.PathLike, flags: int) -> int

# https://docs.python.org/3/library/os.html#os.read
os.read(fd: int, n: int) -> bytes

# https://docs.python.org/3/library/os.html#os.write
os.write(fd: int, buf: bytes) -> int

# https://docs.python.org/3/library/os.html#os.close
os.close(fd: int) -> None

os.open (not to be confused with the global open function) opens the file at path and returns a file descriptor, an integer that serves as a handle to the open file.

The first argument to os.open is a pathname. Pathnames in Linux look like /home/alice/hello.txt. Pathnames that begin with / are called absolute; pathnames that do not are called relative and depend on the current working directory. alice/hello.txt is a relative path; if you are in /home/ then it is equivalent /home/alice/hello.txt, but if you are in /usr/shared/ then it is /usr/shared/alice/hello.txt instead. The special .. component means 'the parent of the current directory', so ../bob/hello.txt when inside of /home/alice is equal to /home/bob/hello.txt.

The second argument to os.open controls how the file is opened. You can pass:

There are a few other flags not covered here – refer to man 2 open. To pass more flags, use the bitwise 'or' operator: os.O_WRONLY | os.O_CREAT, for instance.

os.read reads up to n bytes from the file descriptor. If there are fewer than n bytes left in the file, it will return whatever is left. If we are already at the end of the file, the empty bytestring is returned.

Aside: bytes and strings

Python distinguishes between binary data, which is represented by the bytes type, and textual data, which is represented by the str type.

Use decode("ascii") to convert bytes to str and encode("ascii") to convert str to bytes. (Replace "ascii" with whatever encoding your data actually uses.)

You can also construct a bytes object directly, either by passing in a list of bytes (bytes([65, 66, 67])) or with a bytestring literal (b"ABC" – only works for ASCII characters)

os.write writes the bytes to the file. It returns the number of bytes written, which for regular files will equal len(buf) except under extraordinary circumstances (like running out of disk space mid-write).

os.close closes the open file, after which the file descriptor should not be used. The OS limits how many files a process can have open at once, so it is prudent to close files when you are done with them.

Follow-up

Exercise: Simultaneous read/write

Exercise: What happens if one process is writing to a file while another is reading from it? Are the writes atomic, or is it possible for the reader to observe a partial write? Formulate a hypothesis, then devise a pair of programs to test it.

Final project milestone

Let's use filesystem APIs to implement the core operations of the key-value server. Your program should have two commands: get and set. The set command takes a key and a value and writes it to disk, and the get command takes a key and prints the value, if it exists. Store all data in a single file (hard-coded path is fine). Use whatever data format you want – this is not a class on data structures or algorithms, so we won't focus on making the key-value server efficient.

Further reading