home blog portfolio Ian Fisher

Notes on tree data structures

A plain binary tree can be implemented in a few dozen lines of code but decays into a linked list if nodes are inserted in sorted order; thus the worst-case time complexity for insertion and retrieval is linear instead of logarithmic.

Self-balancing binary trees are a class of data structures that offer logarithmic insertion and retrieval by proactively keeping the tree balanced upon insertion and deletion.

Why use binary trees? To implement key–value maps. Compared to hash tables, they do not require the choice of a hash function, tuning of parameters such as table size, or collision-handling strategies, and they can be traversed in sorted order. They are popular in functional languages (e.g., in OCaml's Base library) as they can be updated non-destructively. The downside is that insertion and retrieval take logarithmic instead of constant time.

Another tree data structure (not a self-balancing binary tree) is the trie, which represents a set of strings and can efficiently find all strings in the set matching a prefix (which hash sets cannot do). In a traditional trie, each node has an array of children whose length is equal to the number of characters in the 'alphabet'. Even for just the ASCII character set, this is wildly inefficient, and for Unicode it is totally intractable. A space-optimized version is the radix tree which stores whole prefixes instead of individual characters at each node.

Bibliography