home blog portfolio Ian Fisher

Software lore

CSV as code

At one job, I was working on a system that ingested a lot of data and classified it according to certain attributes. It took me a while of looking through the codebase before I realized that the core function of the system – the classification logic – was not in code at all, but in a large CSV file which was essentially executed like a program: each row represented a classification outcome, and the input was compared row-by-row and column-by-column until it found a row that fully matched. There was special syntax for a column to match all values or a fixed set of values, as well as some custom values that, depending on the column, would match a set of inputs determined at runtime. The CSV had blank rows, blank columns, and even a column titled "meaningless description".

All of this was implemented by a module which compiled the CSV into an anonymous function and executed it on each line of input. It was a sight to behold. I think the original intention was to allow non-technical people to update the classification logic, but the file got so gnarly that no one really understood it anymore.

I tamed it a little bit by getting rid of some of the cruft (MM/DD/YY date formats, inconsistent syntax for boolean values, long-unused classification rules), writing a comprehensive test suite, writing validation checks that looked for unreachable rules, and documenting everything. But because it was the core of the system's logic, it was still a little scary to touch the file.

Others

See also