Feed

What I look for in empirical software papers
(buttondown.email)
Behind on the talk and still reading a lot of research papers on empirical software engineering (ESE). Evaluating a paper studying software engineers is a slightly different skill than evaluating other kinds of CS papers, which feel a little closer to hard sciences or mathematics. So even people used to grinding through type theory or AI papers can find ESE studies offputting.

So here are some of my thoughts on how I approach it. This is mostly unedited and unorganized, just getting my thoughts down.

The high-level view

The goal of ESE is to understand how we develop software better in the real world. Compared to "regular" computer science, ESE has two unique problems:
1. As with all kinds of knowledge work, software development is highly varied and lots of factors affect the final outcome.
2. Software engineers develop new tools/processes/paradigms incredibly quickly and there's maybe a 1000:1 ratio of developers to researchers.
This makes it hard to say anything for certain with a high level of confidence. Instead of thinking of whether papers are correct or incorrect, I look for things that make me trust it more or less. I'll use this interchangeably with "like/dislike": "I like this study" is the same as "I trust it more".

Quantitative Metrics

I like metrics that are clear, unambiguous, and sensible to practitioners. "Quality" and "defects" are too ambiguous, "coupling" and "cohesion" aren't sensible. "Number of support tickets filed" and "distribution of model code lengths" are better.

In practice, clear and unambiguous metrics end up being more narrowly-scoped. I trust research on narrow metrics more than research on broad metrics, even if the former are less applicable to software at large.

Clinical Trials

I don't like "clinical trials", where developers are split into control and experiment groups and asked to complete a task. The problem is sample size: actual clinical trials in medicine are run on thousands of people at a time and cost millions of dollars. Most software clinical trials study less than 50 engineers at a time. With such a small sample size, there's just too much room for confounding factors.

It doesn't help that most studies are done on university students, which may not generalize to more experienced professionals.

There's a couple of exceptions, though. I'm more inclined to trust clinical trials how how to teach CS, as students are the target population anyway. And I pay more attention to clinical trials that find enormous effects. For example, Need for Sleep finds that a night of sleep deprivation reduced the trial group's coding quality by fifty percent.¹ That's not the kind of effect you can write off as a confounding factor!

(I'd still like to see it replicated, though. Also, does mild sleep deprivation effect coding skill?)

Other Quantitative Research

Scraping GitHub for data is fine, as long as the researchers are careful to audit the data. Don't publish a groundbreaking paper claiming a correlation between language and program quality but accidentally count each bitcoin bug as 17 C++ bugs.

Developer surveys are good for measuring things like developer perceptions, emotional state, how widespread a technology is, etc. I don't think they're good at measuring the effectiveness of practices or determining if those opinions match reality. IE if a survey showed that 90% of developers think that their Java code is buggier than their Kotlin code, this tells us that developer perceive Kotlin as higher quality, not it actually is!

Qualitative Studies

I like qualitative studies, things that aren't gathering hard metrics. If a quantitative study of correctness imperative vs logic programming would be "which has fewer bugs" or "what kinds of bugs does each type have", qualitative would be more like "how do imperative and logic programmers debug stuff differently" or "let's dissect this one unusual bug and see if we learn anything interesting."

IME science fields undervalue the impact of qualitative research. I find a lot less of it than quantitative papers, but I tend to trust the qualitative papers more. Maybe they're just overall higher quality; more likely it's because they're more narrowly scoped. And I believe (without evidence) that a field needs a strong foundation of qualitative knowledge to do stellar quantitative work.

Some papers do both qualitative and quantitative work. The researchers in Abbreviated vs. Full-word Identifier Names started with a clinical trial, and then shadowed individual developers to see how they navigated a code file. I trust the quantitative sections of hybrid papers a little more than just-quantitative papers, but it's not an enormous factor.

"Quantitative" and "qualitative" aren't words anymore.

Reading the Paper

I use the same basic approach with all CS papers: read the abstract and intro, read the conclusion and post-results discussion, then decide if I care enough to look further. If I like the results, I read the methodology more carefully. I skim background info and "related work" mostly to see if they make any claims I'd want to read up on.

Sometimes I'll use an AI to generate a paper summary. I only do this if I'm going to read the full paper; the summary is just to help me digest it.

I trust papers that share their data.

Threats to validity

Most software papers have a "threats to validity" section, where they talk about possible confounding factors. In practice it's mostly a way for the writers to show they covered their bases, by explaining why said confounding factors aren't a problem. I like papers which include TtVs they can't dismiss, or discussing unusual-but-important TtVs. I don't like papers which have bad explanations for why TtVs aren't a problem, or that miss really obvious TtVs.

Example of doing it wrong: Quantifying Detectable Bugs in JavaScript claims that typescript catches 15% of "public bugs for public projects on GitHub". Their threats to validity were "we only looked at public bugs" and "we may have been too inexperienced in typescript to catch even more bugs." One TtV they do not list is that they uniformly sampled all of GitHub for bugs and didn't filter the projects at all. When I checked some of their projects, a bunch were simple lesson repos by novices in code bootcamps.² Why didn't they mention this TtV?

Example of doing it right: oh God it's already Thursday and I haven't gotten this newsletter out yet

References

For many papers, the most useful part is the reference section, because odds are I'll find a more interesting paper there. Even if none of the references look interesting, I'll still check a few to make sure the original paper is using them correctly. I don't like it when a paper that say reference P makes claim C about topic T, and then I read R and it never even mentions T.

I wish journals let papers have two reference sections, one for the really essential references and one for the incidental ones, like background information or related work.

...Okay, that's all I got this week. Back to the talk mines.

Things I'm curious about
- DORA Report: What's the state of research about the DORA report? Has anybody been able to replicate their findings? Has anyone evaluated them critically?
- Puzzle Games: Puzzle games like flow free have thousands of levels. Are these programmatically generated? If so, how is it done, and how do they control the difficulty?
1. Measured as as "number of tests passed", from a researcher-provided test suite for the task. ↩
2. I checked this back in 2019 or so, but it seems like the data is now offline. I checked this back in 2020 or so. Props to making the data available! ↩

Implementing MVCC and major SQL transaction isolation levels

(notes.eatonphil.com)

In this post we'll build a database in 400 lines of code with basic support for five standard SQL transaction levels: Read Uncommitted, Read Committed, Repeatable Read, Snapshot Isolation and Serializable. We'll use multi-version concurrency control (MVCC) and optimistic concurrency control (OCC) to accomplish this. The goal isn't to be perfect but to explain the basics in a minimal way.

You don't need to know what these terms mean in advance. I did not understand them before doing this project. But if you've ever dealt with SQL databases, transaction isolation levels are likely one of the dark corners you either 1) weren't aware of or 2) wanted not to think about. At least, this is how I felt.

While there are many blog posts that list out isolation levels, I haven't been able to internalize their lessons. So I built this little database to demonstrate the common isolation levels for myself. It turned out to be simpler than I expected, and made the isolation levels much easier to reason about.

Thank you to Justin Jaffray, Alex Miller, Sujay Jayakar, Peter Veentjer, and Michael Gasch for providing feedback and suggestions.

All code is available on GitHub.

Why do we need transaction isolation?

If you already know the answer, feel free to skip this section.

When I first started working with databases in CRUD applications, I did not understand the point of transactions. I was fairly certain that transactions are locks. I was wrong about that, but more on that later.

I can't remember exact code I wrote, but here's something I could have written:

with database.transaction() as t:
  users = t.query("SELECT * FROM users WHERE group = 'admin';")
  ids = []
  for user in users:
    if some_complex_logic(user):
      ids.push(user.id)

  t.query("UPDATE users SET metadata = 'some value' WHERE id IN ($1)';", ids)

I would have thought that all users that were seen from the initial SELECT who matched the some_complex_logic filter would be exactly the same that are updated in my second SQL statement.

And if I were using SQLite, my guess would have been correct. But if I were using MySQL or Postgres or Oracle or SQL Server, and hadn't made any changes to defaults, that wouldn't necessarily be true! We'll discover exactly why throughout this post.

For example, some other connection and transaction could have set a user's group to admin after the initial SELECT was executed. It would then be missed from the some_complex_logic check and from the subsequent UPDATE.

Or, again after our initial SELECT, some other connection could have modified the group for some user that previously was admin. It would then be incorrectly part of the second UPDATE statement.

These are just a few examples of what could go wrong.

This is the realm of transaction isolation. How do multiple transactions running at the same time, interacting with the same data, interact with each other?

The answer is: it depends. The SQL standard itself loosely prescribes four isolation levels. But every database implements these four levels slightly differently. Sometimes using entirely different algorithms. And even among the standard levels, the default isolation level for each database differs.

Funky bugs that can show up across databases and across isolation levels, often dependent on particular details of common ways of implementing isolation levels, create what are called "anomalies". Examples include intimidating terms like "dirty reads" and "write cycles" and G2-Item.

The topic is so complex that we've got decades of research papers critiquing SQL isolation levels, categorization of common isolation anomalies, walkthroughs of anomalies by Martin Kleppmann in Designing Data-Intensive Applications, Martin Kleppman's Hermitage project documenting common anomalies across isolation levels in major databases, and the ACIDRain paper showing isolation-related bugs in major open-source ecommerce projects.

These aren't just random links. They're each quite interesting. And particularly for practitioners who don't know why they should care, check out Designing Data-Intensive Applications and the last link on ACIDRain.

And this is only a small list of some of the most interesting research and writing on the topic.

So there's a wide variety of things to consider:

Not every database implements transaction isolation levels identically, resulting in different behavior
Not all researchers agree, and not all database developers agree, on what any given isolation level means
Not every database has the same default isolation level, and most developers tend not to change the default
Not every developer is correctly using the isolation level they pick (default or not)

Transaction isolation levels are basically vibes. The only truth for real projects is Martin Kleppmann's Hermitage project that catalogs behavior across databases. And a truth some people align with is Generalized Isolation Level Definitions.

So while all these linked works above are authoritative, and even though we can see that there might be some anomalies we have to worry about, the research can still be difficult to internalize. And many developers, my recent self included, do not have a great understanding of isolation levels.

Throughout this post we'll stick to informal definitions of isolation levels to keep things simple.

Let's dig in.

Locks? MVCC?

Historically, databases implemented isolation with locking algorithms such as Two-Phase Locking (not the same thing as Two-Phase Commit). Multi-version concurrency control (MVCC) is an approach that lets us completely avoid locks.

It's worthwhile to note that while we will validly not use locks (implementing what is called optimistic concurrency control or OCC), most MVCC databases do still use locks for certain things (implementing what is called pessimistic concurrency control).

But this is the story of databases in general. There are numerous ways to implement things.

We will take the simpler lockless route.

Consider a key-value database. With MVCC, rather than storing only the value for a key, we would store versions of the value. The version includes the transaction id (a monotonic incrementing integer) wherein the version was created, and the transaction id wherein the version was deleted.

With MVCC, it is possible to express transaction isolation levels almost solely as a set of different visibility rules for a version of a value; rules that vary by isolation level. So we will build up a general framework first and discuss and implement each isolation level last.

Scaffolding

We'll build an in-memory key-value system that acts on transactions. I usually try to stick with only the standard library for projects like this but I really wanted a sorted data structure and Go doesn't implement one.

In main.go, let's set up basic helpers for assertions and debugging:

package main

import (
        "fmt"
        "os"
        "slices"

        "github.com/tidwall/btree"
)

func assert(b bool, msg string) {
        if !b {
                panic(msg)
        }
}

func assertEq[C comparable](a C, b C, prefix string) {
        if a != b {
                panic(fmt.Sprintf("%s '%v' != '%v'", prefix, a, b))
        }
}

var DEBUG = slices.Contains(os.Args, "--debug")

func debug(a ...any) {
        if !DEBUG {
                return
        }

        args := append([]any{"[DEBUG]"}, a...)
        fmt.Println(args...)
}

As mentioned previously, a value in the database will be defined with start and end transaction ids.

type Value struct {
        txStartId uint64
        txEndId   uint64
        value     string
}

Every transaction will be in an in-progress, aborted, or committed state.

type TransactionState uint8
const (
        InProgressTransaction TransactionState = iota
        AbortedTransaction
        CommittedTransaction
)

And we'll support a few major isolation levels.

// Loosest isolation at the top, strictest isolation at the bottom.
type IsolationLevel uint8
const (
        ReadUncommittedIsolation IsolationLevel = iota
        ReadCommittedIsolation
        RepeatableReadIsolation
        SnapshotIsolation
        SerializableIsolation
)

We'll get into detail about the meaning of the levels later.

A transaction has an isolation level, an id (monotonic increasing integer), and a current state. And although we won't make use of this data yet, transactions at stricter isolation levels will need some extra info. Specifically, stricter isolation levels need to know about other transactions that were in-progress when this one started. And stricter isolation levels need to know about all keys read and written by a transaction.

type Transaction struct {
        isolation  IsolationLevel
        id         uint64
        state      TransactionState

        // Used only by Repeatable Read and stricter.
        inprogress btree.Set[uint64]

        // Used only by Snapshot Isolation and stricter.
        writeset   btree.Set[string]
        readset    btree.Set[string]
}

We'll discuss why later.

Finally, the database itself will have a default isolation level that each transaction will inherit (for our own convenience in tests).

The database will have a mapping of keys to an array of value versions. Later elements in the array will represent newer versions of a value.

The database will also store the next free transaction id it will use to assign ids to new transactions.

type Database struct {
        defaultIsolation  IsolationLevel
        store             map[string][]Value
        transactions      btree.Map[uint64, Transaction]
        nextTransactionId uint64
}

func newDatabase() Database {
        return Database{
                defaultIsolation:  ReadCommittedIsolation,
                store:             map[string][]Value{},
                // The `0` transaction id will be used to mean that
                // the id was not set. So all valid transaction ids
                // must start at 1.
                nextTransactionId: 1,
        }
}

To be thread-safe: store, transactions, and nextTransactionId should be guarded by a mutex. But to keep the code small, this post will not use goroutines and thus does not need mutexes.

There's a bit of book-keeping when creating a transaction, so we'll make a dedicated method for this. We must give the new transaction an id, store all in-progress transactions, and add it to database transaction history.

func (d *Database) inprogress() btree.Set[uint64] {
        var ids btree.Set[uint64]
        iter := d.transactions.Iter()
        for ok := iter.First(); ok; ok = iter.Next() {
                if iter.Value().state == InProgressTransaction {
                        ids.Insert(iter.Key())
                }
        }
        return ids
}

func (d *Database) newTransaction() *Transaction {
        t := Transaction{}
        t.isolation = d.defaultIsolation
        t.state = InProgressTransaction

        // Assign and increment transaction id.
        t.id = d.nextTransactionId
        d.nextTransactionId++

        // Store all inprogress transaction ids.
        t.inprogress = d.inprogress()

        // Add this transaction to history.
        d.transactions.Set(t.id, t)

        debug("starting transaction", t.id)

        return &t
}

And we'll add a few more helpers for completing a transaction, for fetching a transaction by id, and for validating a transaction.

func (d *Database) completeTransaction(t *Transaction, state TransactionState) error {
        debug("completing transaction ", t.id)

        // Update transactions.
        t.state = state
        d.transactions.Set(t.id, *t)

        return nil
}

func (d *Database) transactionState(txId uint64) Transaction {
        t, ok := d.transactions.Get(txId)
        assert(ok, "valid transaction")
        return t
}

func (d *Database) assertValidTransaction(t *Transaction) {
        assert(t.id > 0, "valid id")
        assert(d.transactionState(t.id).state == InProgressTransaction, "in progress")
}

The final bit of scaffolding we'll set up is an abstraction for database connections. A connection will have at most associated one transaction. Users must ask the database for a new connection. Then within the connection they can manage a transaction.

type Connection struct {
        tx *Transaction
        db *Database
}

func (c *Connection) execCommand(command string, args []string) (string, error) {
        debug(command, args)

        // TODO
        return "", fmt.Errorf("unimplemented")
}

func (c *Connection) mustExecCommand(cmd string, args []string) string {
        res, err := c.execCommand(cmd, args)
        assertEq(err, nil, "unexpected error")
        return res
}

func (d *Database) newConnection() *Connection {
        return &Connection{
                db: d,
                tx: nil,
        }
}

func main() {
        panic("unimplemented")
}

And that's it for scaffolding. Now set up the go module and make sure this builds.

$ go mod init gomvcc
go: creating new go.mod: module gomvcc
go: to add module requirements and sums:
        go mod tidy
$ go mod tidy
go: finding module for package github.com/tidwall/btree
go: found github.com/tidwall/btree in github.com/tidwall/btree v1.7.0
$ go build
$ ./gomvcc
panic: unimplemented

goroutine 1 [running]:
main.main()
        /Users/phil/tmp/main.go:166 +0x2c

Great!

Transaction management

When the user asks to begin a transaction, we ask the database for a new transaction and assign it to the current connection.

 func (c *Connection) execCommand(command string, args []string) (string, error) {
         debug(command, args)

+       if command == "begin" {
+               assertEq(c.tx, nil, "no running transactions")
+               c.tx = c.db.newTransaction()
+               c.db.assertValidTransaction(c.tx)
+               return fmt.Sprintf("%d", c.tx.id), nil
+       }
+
         // TODO
         return "", fmt.Errorf("unimplemented")
 }

To abort a transaction, we call the completeTransaction method (which makes sure the database transaction history gets updated) with the AbortedTransaction state.

                return fmt.Sprintf("%d", c.tx.id), nil
        }

+       if command == "abort" {
+               c.db.assertValidTransaction(c.tx)
+               err := c.db.completeTransaction(c.tx, AbortedTransaction)
+               c.tx = nil
+               return "", err
+       }
+
         // TODO
         return "", fmt.Errorf("unimplemented")
 }

And to commit a transaction is similar.

                return "", err
        }

+       if command == "commit" {
+               c.db.assertValidTransaction(c.tx)
+               err := c.db.completeTransaction(c.tx, CommittedTransaction)
+               c.tx = nil
+               return "", err
+       }
+
         // TODO
         return "", fmt.Errorf("unimplemented")
 }

The neat thing about MVCC is that beginning, committing, and aborting a transaction is metadata work. Committing a transaction will get a bit more complex when we add support for Snapshot Isolation and Serializable Isolation, but we'll get to that later. Even then, it will not involve modifying any values we get, set, or delete.

Get, set, delete

Here is where things get fun. As mentioned earlier, the key-value store is actually map[string][]Value. With the more recent versions of a value at the end of the list of values for the key.

For get support, we'll iterate the list of value versions backwards for the key. And we'll call a special new isvisible method to determine if this transaction can see this value. The first value that passes the isvisible test is the correct value for the transaction.

                return "", err
        }

+       if command == "get" {
+               c.db.assertValidTransaction(c.tx)
+
+               key := args[0]
+
+               c.tx.readset.Insert(key)
+
+               for i := len(c.db.store[key]) - 1; i >= 0; i-- {
+                       value := c.db.store[key][i]
+                       debug(value, c.tx, c.db.isvisible(c.tx, value))
+                       if c.db.isvisible(c.tx, value) {
+                               return value.value, nil
+                       }
+               }
+
+               return "", fmt.Errorf("cannot get key that does not exist")
+       }
+
         // TODO
         return "", fmt.Errorf("unimplemented")
 }

I snuck in tracking which keys are read, and we'll also soon sneak in tracking which keys are written. This is necessary in stricter isolation levels. More on that later.

set and delete are similar to get. But this time when we walk the list of value versions, we will set the txEndId for the value to the current transaction id if the value version is visible to this transaction.

Then, for set, we'll append to the value version list with the new version of the value that starts at this current transaction.

                return "", err
        }

+       if command == "set" || command == "delete" {
+               c.db.assertValidTransaction(c.tx)
+
+               key := args[0]
+
+               // Mark all visible versions as now invalid.
+               found := false
+               for i := len(c.db.store[key]) - 1; i >= 0; i-- {
+                       value := &c.db.store[key][i]
+                       debug(value, c.tx, c.db.isvisible(c.tx, *value))
+                       if c.db.isvisible(c.tx, *value) {
+                               value.txEndId = c.tx.id
+                               found = true
+                       }
+               }
+               if command == "delete" && !found {
+                       return "", fmt.Errorf("cannot delete key that does not exist")
+               }
+
+               c.tx.writeset.Insert(key)
+
+               // And add a new version if it's a set command.
+               if command == "set" {
+                       value := args[1]
+                       c.db.store[key] = append(c.db.store[key], Value{
+                               txStartId: c.tx.id,
+                               txEndId:   0,
+                               value:     value,
+                       })
+
+                       return value, nil
+               }
+
+               // Delete ok.
+               return "", nil
+       }
+
        if command == "get" {
                c.db.assertValidTransaction(c.tx)

This time rather than modifying the readset we modify the writeset for the transaction.

And that is how commands get executed!

Let's zoom in to the core of the problem we have mentioned but not implemented: MVCC visibility rules and how they differ by isolation levels.

Isolation levels and MVCC visibility rules

To varying degrees, transaction isolation levels prevent concurrent transactions from messing with each other. The looser isolation levels prevent this almost not at all.

Here is what the 1999 ANSI SQL standard (page 84) has to say.

/sql99isolation.png

But as I mentioned in the beginning of the post, we're going to be a bit informal. And we'll mostly refer to Jepsen summaries of each isolation levels.

Read Uncommitted

According to Jepsen, the loosest isolation level, Read Uncommitted, has almost no restrictions. We can merely read the most recent version of a value, regardless of if the transaction that set it has committed or aborted or not.

func (d *Database) isvisible(t *Transaction, value Value) bool {
       // Read Uncommitted means we simply read the last value
       // written. Even if the transaction that wrote this value has
       // not committed, and even if it has aborted.
       if t.isolation == ReadUncommittedIsolation {
               return true
       }

       assert(false, "unsupported isolation level")
       return false
}

Let's write a test that demonstrates this. We create two transactions, c1 and c2, and set a key in c1. The value set for the key in c1 should be immediately visible if c2 asks for that key. In main_test.go:

package main

import (
        "testing"
)

func TestReadUncommitted(t *testing.T) {
        database := newDatabase()
        database.defaultIsolation = ReadUncommittedIsolation

        c1 := database.newConnection()
        c1.mustExecCommand("begin", nil)

        c2 := database.newConnection()
        c2.mustExecCommand("begin", nil)

        c1.mustExecCommand("set", []string{"x", "hey"})

        // Update is visible to self.
        res := c1.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c1 get x")

        // But since read uncommitted, also available to everyone else.
        res = c2.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c2 get x")
}

That's pretty simple! But also pretty useless if your workload has conflicts. If you can arrange your workload in a way where you know no concurrent transactions will ever read or write conflicting keys though, this could be pretty efficient! The rules will only get more complex (and thus potentially more of a bottleneck) from here on.

But for the most part, people don't use this isolation level. SQLite, Yugabyte, Cockroach, and Postgres don't even implement it. It is also not the default for any major database that does implement it.

Let's get a little stricter.

Read Committed

We'll pull again from Jepsen:

Read committed is a consistency model which strengthens read uncommitted by preventing dirty reads: transactions are not allowed to observe writes from transactions which do not commit.

This sounds pretty simple. In isvisible we'll make sure that the value has a txStartId that is either this transaction or a transaction that has committed. Moreover we will now begin checking against txEndId to make sure the value wasn't deleted by any relevant transaction.

                return true
        }

+       // Read Committed means we are allowed to read any values that
+       // are committed at the point in time where we read.
+       if t.isolation == ReadCommittedIsolation {
+               // If the value was created by a transaction that is
+               // not committed, and not this current transaction,
+               // it's no good.
+               if value.txStartId != t.id &&
+                       d.transactionState(value.txStartId).state != CommittedTransaction {
+                       return false
+               }
+
+               // If the value was deleted in this transaction, it's no good.
+               if value.txEndId == t.id {
+                       return false
+               }
+
+               // Or if the value was deleted in some other committed
+               // transaction, it's no good.
+               if value.txEndId > 0 &&
+                       d.transactionState(value.txEndId).state == CommittedTransaction {
+                       return false
+               }
+
+               // Otherwise the value is good.
+               return true
+       }
+
        assert(false, "unsupported isolation level")
        return false
 }

This begins to look useful! We will never read a value that isn't part of a committed transaction (or isn't part of our own transaction). Indeed this is the default isolation level for many databases including Postgres, Yugabyte, Oracle, and SQL Server.

Let's add a test to main_test.go. This is a bit long, but give it a slow read. It is thoroughly commented.

func TestReadCommitted(t *testing.T) {
        database := newDatabase()
        database.defaultIsolation = ReadCommittedIsolation

        c1 := database.newConnection()
        c1.mustExecCommand("begin", nil)

        c2 := database.newConnection()
        c2.mustExecCommand("begin", nil)

        // Local change is visible locally.
        c1.mustExecCommand("set", []string{"x", "hey"})

        res := c1.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c1 get x")

        // Update not available to this transaction since this is not
        // committed.
        res, err := c2.execCommand("get", []string{"x"})
        assertEq(res, "", "c2 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c2 get x")

        c1.mustExecCommand("commit", nil)

        // Now that it's been committed, it's visible in c2.
        res = c2.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c2 get x")

        c3 := database.newConnection()
        c3.mustExecCommand("begin", nil)

        // Local change is visible locally.
        c3.mustExecCommand("set", []string{"x", "yall"})

        res = c3.mustExecCommand("get", []string{"x"})
        assertEq(res, "yall", "c3 get x")

        // But not on the other commit, again.
        res = c2.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c2 get x")

        c3.mustExecCommand("abort", nil)

        // And still not, if the other transaction aborted.
        res = c2.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c2 get x")

        // And if we delete it, it should show up deleted locally.
        c2.mustExecCommand("delete", []string{"x"})

        res, err = c2.execCommand("get", []string{"x"})
        assertEq(res, "", "c2 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c2 get x")

        c2.mustExecCommand("commit", nil)

        // It should also show up as deleted in new transactions now
        // that it has been committed.
        c4 := database.newConnection()
        c4.mustExecCommand("begin", nil)

        res, err = c4.execCommand("get", []string{"x"})
        assertEq(res, "", "c4 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c4 get x")
}

Again this seems great. However! You can easily get inconsistent data within a transaction at this isolation level. If the transaction A has multiple statements it can see different results per statement, even if the transaction A did not modify data. Another transaction B may have committed changes between two statements in this transaction A.

Let's get a little stricter.

Repeatable Read

Again as Jepsen says, Repeatable Read is the same as Read Committed but with the following anomaly not allowed (quoting from the ANSI SQL 1999 standard):

P2 (“Non-repeatable read”): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.

To support this, we will add additional checks to the Read Committed logic that make sure the value was not created and not deleted within a transaction that started before this transaction started.

As it happens, this is the same logic that will be necessary for Snapshot Isolation and Serializable Isolation. The additional logic (that makes Snapshot Isolation and Serializable Isolation different) happens at commit time.

                return true
        }

-       assert(false, "unsupported isolation level")
-       return false
+       // Repeatable Read, Snapshot Isolation, and Serializable
+       // further restricts Read Committed so only versions from
+       // transactions that completed before this one started are
+       // visible.
+
+       // Snapshot Isolation and Serializable will do additional
+       // checks at commit time.
+       assert(t.isolation == RepeatableReadIsolation ||
+               t.isolation == SnapshotIsolation ||
+               t.isolation == SerializableIsolation, "invalid isolation level")
+       // Ignore values from transactions started after this one.
+       if value.txStartId > t.id {
+               return false
+       }
+
+       // Ignore values created from transactions in progress when
+       // this one started.
+       if t.inprogress.Contains(value.txStartId) {
+               return false
+       }
+
+       // If the value was created by a transaction that is not
+       // committed, and not this current transaction, it's no good.
+       if d.transactionState(value.txStartId).state != CommittedTransaction &&
+               value.txStartId != t.id {
+               return false
+       }
+
+       // If the value was deleted in this transaction, it's no good.
+       if value.txEndId == t.id {
+               return false
+       }
+
+       // Or if the value was deleted in some other committed
+       // transaction that started before this one, it's no good.
+       if value.txEndId < t.id &&
+               value.txEndId > 0 &&
+               d.transactionState(value.txEndId).state == CommittedTransaction &&
+               !t.inprogress.Contains(value.txEndId) {
+               return false
+       }
+
+       return true
 }

 type Connection struct {

How do I derive these rules? Mostly by writing tests that should pass or fail and seeing what doesn't make sense. I tried to steal from existing projects but these rules were not so simple to discover. Which is part of what I hope makes this project particularly useful to look at.

Let's write a test for Repeatable Read. Again, the test is long but well commented.

func TestRepeatableRead(t *testing.T) {
        database := newDatabase()
        database.defaultIsolation = RepeatableReadIsolation

        c1 := database.newConnection()
        c1.mustExecCommand("begin", nil)

        c2 := database.newConnection()
        c2.mustExecCommand("begin", nil)

        // Local change is visible locally.
        c1.mustExecCommand("set", []string{"x", "hey"})
        res := c1.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c1 get x")

        // Update not available to this transaction since this is not
        // committed.
        res, err := c2.execCommand("get", []string{"x"})
        assertEq(res, "", "c2 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c2 get x")

        c1.mustExecCommand("commit", nil)

        // Even after committing, it's not visible in an existing
        // transaction.
        res, err = c2.execCommand("get", []string{"x"})
        assertEq(res, "", "c2 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c2 get x")

        // But is available in a new transaction.
        c3 := database.newConnection()
        c3.mustExecCommand("begin", nil)

        res = c3.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c3 get x")

        // Local change is visible locally.
        c3.mustExecCommand("set", []string{"x", "yall"})
        res = c3.mustExecCommand("get", []string{"x"})
        assertEq(res, "yall", "c3 get x")

        // But not on the other commit, again.
        res, err = c2.execCommand("get", []string{"x"})
        assertEq(res, "", "c2 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c2 get x")

        c3.mustExecCommand("abort", nil)

        // And still not, regardless of abort, because it's an older
        // transaction.
        res, err = c2.execCommand("get", []string{"x"})
        assertEq(res, "", "c2 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c2 get x")

        // And again still the aborted set is still not on a new
        // transaction.
        c4 := database.newConnection()
        res = c4.mustExecCommand("begin", nil)

        res = c4.mustExecCommand("get", []string{"x"})
        assertEq(res, "hey", "c4 get x")

        c4.mustExecCommand("delete", []string{"x"})
        c4.mustExecCommand("commit", nil)

        // But the delete is visible to new transactions now that this
        // has been committed.
        c5 := database.newConnection()
        res = c5.mustExecCommand("begin", nil)

        res, err = c5.execCommand("get", []string{"x"})
        assertEq(res, "", "c5 get x")
        assertEq(err.Error(), "cannot get key that does not exist", "c5 get x")
}

Let's get stricter!

Snapshot Isolation

Back to [Jepsen](https://jepsen.io/consistency/models/snapshot-isolation for a definition:

In a snapshot isolated system, each transaction appears to operate on an independent, consistent snapshot of the database. Its changes are visible only to that transaction until commit time, when all changes become visible atomically to any transaction which begins at a later time. If transaction T1 has modified an object x, and another transaction T2 committed a write to x after T1’s snapshot began, and before T1’s commit, then T1 must abort.

So Snapshot Isolation is the same as Repeatable Read but with one additional rule: the keys written by any two concurrent committed transactions must not overlap.

This is why we tracked writeset. Every time a transaction modified or deleted a key, we added it to the transaction's writeset. To make sure we abort correctly, we'll add a conflict check to the commit step. (This idea is also well documented in A critique of snapshot isolation. This paper can be hard to find. Email me if you want a copy.)

When a transaction A goes to commit, it will run a conflict test for any transaction B that has committed since this transaction A started.

Serializable Isolation is going to have a similar check. So we'll add a helper for iterating through all relevant transactions, running a check function for any transaction that has committed.

func (d *Database) hasConflict(t1 *Transaction, conflictFn func(*Transaction, *Transaction) bool) bool {
        iter := d.transactions.Iter()

        // First see if there is any conflict with transactions that
        // were in progress when this one started.
        inprogressIter := t1.inprogress.Iter()
        for ok := inprogressIter.First(); ok; ok = inprogressIter.Next() {
                id := inprogressIter.Key()
                found := iter.Seek(id)
                if !found {
                        continue
                }
                t2 := iter.Value()
                if t2.state == CommittedTransaction {
                        if conflictFn(t1, &t2) {
                                return true
                        }
                }
        }

        // Then see if there is any conflict with transactions that
        // started and committed after this one started.
        for id := t1.id; id < d.nextTransactionId; id++ {
                found := iter.Seek(id)
                if !found {
                        continue
                }

                t2 := iter.Value()
                if t2.state == CommittedTransaction {
                        if conflictFn(t1, &t2) {
                                return true
                        }
                }
        }

        return false
}

It was around this point that I decided I did really need a B-Tree implementation and could not just stick to vanilla Go data structures.

Now we can modify completeTransaction to do this check if the transaction intends to commit. If the current transaction A's write set intersects with any other transaction B committed since transaction A started, we must abort.

 func (d *Database) completeTransaction(t *Transaction, state TransactionState) error {
         debug("completing transaction ", t.id)

+
+       if state == CommittedTransaction {
+               // Snapshot Isolation imposes the additional constraint that
+               // no transaction A may commit after writing any of the same
+               // keys as transaction B has written and committed during
+               // transaction A's life.
+               if t.isolation == SnapshotIsolation && d.hasConflict(t, func(t1 *Transaction, t2 *Transaction) bool {
+                       return setsShareItem(t1.writeset, t2.writeset)
+               }) {
+                       d.completeTransaction(t, AbortedTransaction)
+                       return fmt.Errorf("write-write conflict")
+               }
+       }
+
         // Update transactions.
         t.state = state
         d.transactions.Set(t.id, *t)

Lastly, the definition of setsShareItem.

func setsShareItem(s1 btree.Set[string], s2 btree.Set[string]) bool {
        s1Iter := s1.Iter()
        s2Iter := s2.Iter()
        for ok := s1Iter.First(); ok; ok = s1Iter.Next() {
                s1Key := s1Iter.Key()
                found := s2Iter.Seek(s1Key)
                if found {
                        return true
                }
        }

        return false
}

Since Snapshot Isolation shares all the same visibility rules as Repeatable Read, the tests get to be a little simpler! We'll simply test that two transactions attempting to commit a write to the same key fail. Or specifically: that the second transaction cannot commit.

func TestSnapshotIsolation_writewrite_conflict(t *testing.T) {
        database := newDatabase()
        database.defaultIsolation = SnapshotIsolation

        c1 := database.newConnection()
        c1.mustExecCommand("begin", nil)

        c2 := database.newConnection()
        c2.mustExecCommand("begin", nil)

        c3 := database.newConnection()
        c3.mustExecCommand("begin", nil)

        c1.mustExecCommand("set", []string{"x", "hey"})
        c1.mustExecCommand("commit", nil)

        c2.mustExecCommand("set", []string{"x", "hey"})

        res, err := c2.execCommand("commit", nil)
        assertEq(res, "", "c2 commit")
        assertEq(err.Error(), "write-write conflict", "c2 commit")

        // But unrelated keys cause no conflict.
        c3.mustExecCommand("set", []string{"y", "no conflict"})
        c3.mustExecCommand("commit", nil)
}

Not bad! But let's get stricter.

Serializable Isolation

In terms of end-result, this is the simplest isolation level to reason about. Serializable Isolation must appear as if only a single transaction were executing at a time. Some systems, like SQLite and TigerBeetle, do Actually Serial Execution where only one transaction runs at a time. But few databases implement Serializable like this because it removes a number of fair concurrent execution histories. For example, two concurrent read-only transactions.

Postgres implements serializability via Serializable Snapshot Isolation. MySQL implements serializability via Two-Phase Locking. FoundationDB implements serializability via sequential timestamp assignment and conflict detection.

But the paper, A critique of snapshot isolation, provides a simple (though not necessarily efficient; I have no clue) approach via what they call Write Snapshot Isolation. In their algorithm, if any two transactions read and write set intersect (but not write and write set intersect), the transaction should be aborted. And this (plus Repeatable Read rules) is sufficient for Serializability.

I leave it to that paper for the proof of correctness. In terms of implementing it though it's quite simple and very similar to the Snapshot Isolation we already mentioned.

Inside of completeTransaction add:

                }) {
                        d.completeTransaction(t, AbortedTransaction)
                        return fmt.Errorf("write-write conflict")
+               }
+
+               // Serializable Isolation imposes the additional constraint that
+               // no transaction A may commit after reading any of the same
+               // keys as transaction B has written and committed during
+               // transaction A's life, or vice-versa.
+               if t.isolation == SerializableIsolation && d.hasConflict(t, func(t1 *Transaction, t2 *Transaction) bool {
+                       return setsShareItem(t1.readset, t2.writeset) ||
+                               setsShareItem(t1.writeset, t2.readset)
+               }) {
+                       d.completeTransaction(t, AbortedTransaction)
+                       return fmt.Errorf("read-write conflict")
                }
        }

And if we add a test for read-write conflicts:

func TestSerializableIsolation_readwrite_conflict(t *testing.T) {
        database := newDatabase()
        database.defaultIsolation = SerializableIsolation

        c1 := database.newConnection()
        c1.mustExecCommand("begin", nil)

        c2 := database.newConnection()
        c2.mustExecCommand("begin", nil)

        c3 := database.newConnection()
        c3.mustExecCommand("begin", nil)

        c1.mustExecCommand("set", []string{"x", "hey"})
        c1.mustExecCommand("commit", nil)

        _, err := c2.execCommand("get", []string{"x"})
        assertEq(err.Error(), "cannot get key that does not exist", "c5 get x")

        res, err := c2.execCommand("commit", nil)
        assertEq(res, "", "c2 commit")
        assertEq(err.Error(), "read-write conflict", "c2 commit")

        // But unrelated keys cause no conflict.
        c3.mustExecCommand("set", []string{"y", "no conflict"})
        c3.mustExecCommand("commit", nil)
}

We see it work! And that's it for a basic implementation of MVCC and major transaction isolation levels.

Production-quality testing

There are two major projects I'm aware of that help you test transaction implementations: Elle and Hermitage. These are probably where I'd go looking if I were implementing this for real.

This project took me long enough on its own and I felt reasonably comfortable with my tests that the gist of my logic was right that I did not test further. For that reason it surely has bugs.

Vacuuming and cleanup

One of the major things this implementation does not do is cleaning up old data. Eventually, older versions of values will be required by no transactions. They should be removed from the value version array. Similarly, eventually older transactions will be required by no transactions. They should be removed from the database transaction history list.

Even if we had the vacuuming process in place though, what about some extreme use patterns. What if a key's value was always going to be 1GB long. And what if multiple transactions made only small changes to the 1GB data. We'd be duplicating a lot of the value across versions.

It sounds less extreme when thinking about storing rows of data rather than key-value data. If a user has 100 columns and only updates one column a number of times, in our scheme we'd end up storing a ton of duplicate cell data for a row.

This is a real-world issue in Postgres that was called out by Andy Pavlo and the Ottertune folks. It turns out that Postgres alone among major databases stores the entire value for every version. In contrast other major databases like MySQL store a diff.

Conclusion

This post only begins to demonstrate that database behavior differs quite a bit both in terms of results and in terms of optimizations. Everyone implements the ideas differently and to varying degrees.

Moreover, we have only begun to implement the behavior a real SQL database supports. For example, how do visibility rules and conflict detection work with range queries? What about sub-transactions, and save points? These will have to be covered another time.

Hopefully seeing this simple implementation of MVCC and visibility rules helps to clarify at least some of the basics.

Here's a new post walking through an implementation of MVCC and major SQL transaction isolation levels, in 400 lines of Go code.

These ideas might sound esoteric, but they impact almost every developer using any database.https://t.co/crFKM74R5h pic.twitter.com/o9awTPpvvx
— Phil Eaton (@eatonphil) May 16, 2024

Ask questions first, shoot later
(blog.danslimmon.com)

The fact that fixing and diagnosing often converge to the same actions doesn't change the fact that these two concurrent activities have different goals. The goal of fixing is to bring the system into line with your mental model of how it's supposed to function. The goal of diagnosing is to bring your mental model into line with the way the system is actually behaving.
Parallel Data Fetching
(martinfowler.com)

The second pattern in Juntao Qiu's series on data fetching is on how to avoid the dreaded Request Waterfall. Parallel Data Fetching queries multiple sources in parallel, so that the latency of the overall fetch is largest of the queries rather than the sum of them.

more…
Data Fetching Patterns in Single-Page Applications
(martinfowler.com)

Juntao Qiu is a thoughtful front-end developer experienced with the React programming environment. He's contributed a couple of useful articles to this site, describing helpful patterns for front-end programming. In this article he describes patterns for how single-page applications fetch data. This first installment describes how asynchronous queries can be wrapped in a handler to provide information about the state of the query.

more…
Light speed with Python and JS
(bitecode.dev)

We've gone to plaid!
sphinxcontrib-sqltable 2.1.1 - db cursor fix
(doughellmann.com)

What’s new in 2.1.1? Access cursor before closing the db connection (contributions by dopas21) use the readthedocs.org theme for docs
Going to the cinema is a data visualization problem
(tonsky.me)

How I build a website for choosing movies
photostream 131
(martinfowler.com)
Weekly Update 399
(troyhunt.com)

Presently sponsored by: Kolide is an endpoint security solution for teams that want to meet SOC2 compliance goals without sacrificing privacy. Learn more here.
The Post Millennial breach in this week's video is an interesting one, most notably because of the presence of the mailing lists. Now, as I've said in every piece of communication I've put out on this incident, the lists are what whoever defaced the
Should I learn Computer Science?
(petemillspaugh.com)
Devoting time to self-study Computer Science is a classic debate for the self-taught, bootcamp grad programmer like me. I’ve kicked around versions of this question in my head before, but what really got me thinking critically about it recently is Oz Nova’s teachyourselfcs.com.

I am familiar with Oz from his podcast CS Primer with Charlie Harrington. I really like their show—they record highly casual, meandering conversations related to programming education.

The full curriculum

The teachyourselfcs.com suggested curriculum is no joke. 100-200 hours for each of nine topics. Even if you did 100 hours per subject at a fairly dedicated four hours per week, that would take almost five years! Depending on your job, it’s likely you have more than four hours free per week, but so much competes for those hours: side projects, other learning, writing, reading, cooking, exercising, time with friends, etc.

Tim Urban’s essay on how to pick a career visualizes the time we have and how we spend it (see "Time Breakdown of a Long Human Life" diagram under subheading "The Cook and the Chef"). I like using that framework on a smaller scale of days and weeks, counting out how many free hours I actually have to allocate.

Without a full time job, the math changes quite a bit, obviously. Like, taking n months between jobs and dedicating a large portion of that time to CS sounds more palatable. But I think if I did that I’d rather spend the lion’s share of my time building an ambitious project. The kind of side project you dream about but there simply aren’t enough weekends for. Or join a program like Recurse, where I suppose you could do both.

Starting smaller

Oz notes that if the full curriculum is too much then you should start with two books:
- Computer Systems: A Programmer’s Perspective
- Designing Data-Intensive Applications
Those are 562 pages and 978 pages, respectively. That might not intimidate a prolific reader, but I am not a prolific reader. I like to read, and I’ve read a couple thousand(ish) page books, but I’ve also failed to read more than a couple. And it takes a while! Who is a prolific reader? I heard Retool CEO David Hsu on a podcast say that he reads one book a week. He didn’t say audiobook, but maybe that’s it? I do worry about sacrificing understanding and retention with audiobooks, though.

And of course there are countless other well regarded, thick books outside of the nine Oz recommends. Like Artificial Intelligence: A Modern Approach (podcast rec from James Brady who leads engineering at Elicit), which is also over a thousand pages.

All that said, I do think reading those books would be worthwhile. Designing Data-Intensive Applications has a 4.71 on Goodreads! That’s an outstanding rating. I read a handful of chapters and did find it interesting, although as a frontend-leaning web developer at the time it felt less relevant to me than it probably would have for someone gearing up to be a CTO at a growing startup.

The value of pure learning

Pure, academic learning—as opposed to practical, applied learning on actual projects—is engaging but can feel like a waste of precious time. Part of my motivation is to uncover unknown unknowns, i.e. things I don’t know that I don’t know but may come in handy down the road. I don’t feel restricted by my lack of CS knowledge as a web developer right now, but (1) there may come a day when I do feel that limitation, and (2) I might be applying that knowledge day-to-day in ways I can’t even see right now.

My overarching learning strategy on the scale of years and decades is to build a broad base of STEM fundamentals. And not just for work. Learning about human biology and nutrition is useful as a runner, for example. Learning about food chemistry is useful as a casual chef. Learning some mechanics might come in handy when I need to fix my bike. Brilliant is great for this sort of learning.

My intention is to allocate most time to short- and medium-term things and smaller chunks of time to this sort of pure learning. Maybe 80/20. And even if certain knowledge doesn’t come in handy at work or in personal pursuits, the act of learning itself is pleasurable.

Anki

It’s important how you learn, too. Without applying new knowledge, most of what I wrote down while reading Kleppmann’s book has been slowly pruned from my brain. But that was before I started using Anki, which can make this kind of pure learning more durable.

I like Michael Nielsen’s approach described in his essay Augmenting Long-Term Memory. He advises reading thick papers outside of your domain expertise in phases and layers: do a quick first read, looking things up now and again but mostly skipping sections that are beyond your depth; make Anki cards for the new concepts you can understand relatively quickly; after building a foundation of knowledge from that paper and related study, do a second pass at the paper; and then a third pass, and so on.

I like this a lot. It’s all about deciding where to draw the line in the sand for abstractions in your mental models. Writing Anki cards helps me with this. I find myself saying in my head, “ok I get the basics, but I don’t know how [that] would be implemented, and I can’t reconcile [this other thing], but that would be too big a rabbit hole for now.”

Learning CS by learning Rust

The middle ground I’ve settled on for now is to learn some Computer Science concepts while also learning a practical skill—Rust. I’ve just started learning Rust, and my hope is that I’ll naturally pick up some CS fundamentals as I go along.

After floating that idea in my head, I stumbled upon a testimony of sorts from Paul Butler on devtools.fm (starts around 28:00):

It kind of teaches you Computer Science almost as you do it. Like, even though there’s a learning curve with it I always feel like I’ve learned something not just about Rust but about computation when I get past one of those hurdles. So I like that about Rust.”

The Rust book itself even advertises a similar goal in learning the language:

Rust is for students and those who are interested in learning about systems concepts. Using Rust, many people have learned about topics like operating systems development.

I’ll start with Rust and go from there.
Protobuffed contracts
(rednafi.com)

People typically associate Google’s Protocol Buffer1 with gRPC2 services, and rightfully so. But things often get confusing when discussing protobufs because the term can mean different things: A binary protocol for efficiently serializing structured data. A language used to specify how this data should be structured. In gRPC services, you usually use both: the protobuf language in proto files defines the service interface, and then the clients use the same proto files to communicate with the services.
A decade of code
(joshcollinsworth.com)

A personal (read: meandering) post inspired by the realization that I first began to learn HTML and CSS exactly ten years ago, reflecting on the lucky turning points that brought me to where I am today.
Bringing supply chain security to PyCon US 2024
(sethmlarson.dev)
Bringing supply chain security to PyCon US 2024

About • Blog • Newsletter • Links

Bringing supply chain security to PyCon US 2024

Published 2024-05-10 by Seth Larson
Reading time: minutes
This critical role would not be possible without funding from the Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

Next week is PyCon US 2024, one of my favorite times of year. If you'll also be in Pittsburgh, reach out to me on Signal (sethmlarson.99) and we'll meet up sometime during the conference.

Here's where you'll find me during the week:

Secure snek 🐍🛡️
- Talk on "State of Supply Chain Security" with Michael Winser of Alpha-Omega.
- Open space on Vulnerability Disclosure and Management with Madison Oliver of GitHub Security.
- Blogger for the Python Language Summit.
- Spreading security knowledge along with Mike Fiedler, the PyPI Safety and Security Engineer.
- Working in-person with Python core developers during sprints.
I'll also be bringing along some exclusive "secure snek" stickers, so if you see me at the conference ask about those while supplies last!

If you're interested in security, I recommend considering these other tutorials and talks:
- Tutorial: PyO3 101: Writing Python modules in Rust by Cheuk Ting Ho
- Tutorial: Fundamentals of API security in Python by Jose Haro Peralta
- Talk: Making Python safer than ever by Cheuk Ting Ho
- Talk: Building a Rusty path validation library for PyCA Cryptography by William Woodruff
- Talk: How Python harnesses Rust through PyO3 by David Hewitt
That's all for this week! 👋 If you're interested in more you can read last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0
Using the Page Visibility API
(developer.mozilla.org)

This post takes a look at what page visibility is, how you can use the Page Visibility API in your applications, and describes pitfalls to avoid if you build features around this functionality.
Browsertech Digest: SF&LA events, Kyle Barron interview, Arrow/Parquet
(digest.browsertech.com)

Hey folks, welcome to the digest!

Events

I'll be in SF & LA in a few weeks and hosting some Browsertech events. On May 22 in SF we're co-hosting a happy hour with Krea. Then on May 23 we're hosting the first Browsertech LA. If you're in the SF or LA area, I hope to see you there!

Kyle Barron interview

On the podcast, I talked with Kyle Barron of Development Seed about his work in open-source geospatial tooling in the browser. Kyle is a prolific developer who has worked on bindings for Apache Arrow, Parquet, GeoArrow, and Deck.gl, culminating in Lonboard.

We talked about why he needed to look beyond GeoJSON and Leaflet to scale to millions of points, where Arrow and Parquet fit into the picture, and optimizing the path of data from a server-side Python DataFrame to the GPU.

The ubiquity of Arrow / Parquet

Something that stood out as I was editing this episode is that the sibling Apache projects Parquet and Arrow keep coming up on the podcast.

We get into them on the episode with Kyle, but the tl;dr (tl;dl?) is that they are both data formats for generic, typed tabular data. Parquet optimizes for space, so is often used on disk and on the wire. Arrow is an in-memory format that optimizes for fast random access. They are an almost equivalent data model, so a common approach is to read Parquet from disk/network directly into Arrow in memory.

Episode 2 guests Andrew and Eric also talked about how they use Arrow in Prospective. In episode 3, Ben Schmidt of Nomic talked about his use of Arrow as well.

A common theme across these episodes is eliminating transformations from data as it travels through the system. A naive approach to browser-based apps is to go from an in-memory representation on the server, to JSON on the wire, to being parsed into in-memory JavaScript objects, and then the relevant attributes are packed into attribute buffers on the GPU (often with one draw call for each object).

To be clear, this works for a lot of things! If you just need to annotate a map with a few hundred points, you won't notice the overhead.

At scale, though, each of those transformations takes linear time, so as your data grows, so does the compute time. The Arrow approach of using a singular data representation (with Parquet as a cheap compression step across the wire) means apps can work with orders of magnitude more data without sacrificing responsiveness.

Until next time,

Paul
It's always TCP_NODELAY. Every damn time.
(brooker.co.za)
It’s always TCP_NODELAY. Every damn time.

It's not the 1980s anymore, thankfully.

The first thing I check when debugging latency issues in distributed systems is whether TCP_NODELAY is enabled. And it’s not just me. Every distributed system builder I know has lost hours to latency issues quickly fixed by enabling this simple socket option, suggesting that the default behavior is wrong, and perhaps that the whole concept is outmoded.

First, let’s be clear about what we’re talking about. There’s no better source than John Nagle’s RFC896 from 1984¹. First, the problem statement:

There is a special problem associated with small packets. When TCP is used for the transmission of single-character messages originating at a keyboard, the typical result is that 41 byte packets (one byte of data, 40 bytes of header) are transmitted for each byte of useful data. This 4000% overhead is annoying but tolerable on lightly loaded networks.

In short, Nagle was interested in better amortizing the cost of TCP headers, to get better throughput out of the network. Up to 40x better throughput! These tiny packets had two main causes: human-interactive applications like shells, where folks were typing a byte at a time, and poorly implemented programs that dribbled messages out to the kernel through many write calls. Nagle’s proposal for fixing this was simple and smart:

A simple and elegant solution has been discovered.

The solution is to inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged.

When many people talk about Nagle’s algorithm, they talk about timers, but RFC896 doesn’t use any kind of timer other than the round-trip time on the network.

Nagle’s Algorithm and Delayed Acks

Nagle’s nice, clean, proposal interacted poorly with another TCP feature: delayed ACK. The idea behind delayed ACK is to delay sending the acknowledgement of a packet at least until there’s some data to send back (e.g. a telnet session echoing back the user’s typing), or until a timer expires. RFC813 from 1982 is that first that seems to propose delaying ACKs:

The receiver of data will refrain from sending an acknowledgement under certain circumstances, in which case it must set a timer which will cause the acknowledgement to be sent later. However, the receiver should do this only where it is a reasonable guess that some other event will intervene and prevent the necessity of the timer interrupt.

which is then formalized further in RFC1122 from 1989. The interaction between these two features causes a problem: Nagle’s algorithm is blocking sending more data until an ACK is received, but delayed ack is delaying that ack until a response is ready. Great for keeping packets full, not so great for latency-sensitive pipelined applications.

This is a point Nagle has made himself several times. For example in this Hacker News comment:

That still irks me. The real problem is not tinygram prevention. It’s ACK delays, and that stupid fixed timer. They both went into TCP around the same time, but independently. I did tinygram prevention (the Nagle algorithm) and Berkeley did delayed ACKs, both in the early 1980s. The combination of the two is awful.

As systems builders this is should be a familiar situation: two reasonable features of the system that interact to create an undesirable behavior. This kind of interaction is one of the things that makes protocol design so hard.

Is Nagle blameless?

Unfortunately, it’s not just delayed ACK². Even without delayed ack and that stupid fixed timer, the behavior of Nagle’s algorithm probably isn’t what we want in distributed systems. A single in-datacenter RTT is typically around 500μs, then a couple of milliseconds between datacenters in the same region, and up to hundreds of milliseconds going around the globe. Given the vast amount of work a modern server can do in even a few hundred microseconds, delaying sending data for even one RTT isn’t clearly a win.

To make a clearer case, let’s turn back to the justification behind Nagle’s algorithm: amortizing the cost of headers and avoiding that 40x overhead on single-byte packets. But does anybody send single byte packets anymore? Most distributed databases and systems don’t. Partially that’s because they simply have more to say, partially its because of additional overhead of protocols like TLS, and partially its because of encoding and serialization overhead. But mostly, they have more to say.

The core concern of not sending tiny messages is still a very real one, but we’ve very effectively pushed that into the application layer. Sending a byte at a time wrapped in JSON isn’t going to be very efficient, no matter what Nagle’s algorithm does.

Is Nagle needed?

First, the uncontroversial take: if you’re building a latency-sensitive distributed system running on modern datacenter-class hardware, enable TCP_NODELAY (disable Nagle’s algorithm) without worries. You don’t need to feel bad. It’s not a sin. It’s OK. Just go ahead.

More controversially, I suspect that Nagle’s algorithm just isn’t needed on modern systems, given the traffic and application mix, and the capabilities of the hardware we have today. In other words, TCP_NODELAY should be the default. That’s going to make some “write every byte” code slower than it would otherwise be, but those applications should be fixed anyway if we care about efficiency.

Footnotes
1. I won’t got into it here, but RFC896 is also one of the earliest statements I can find of metastable behavior in computer networks. In it, Nagle says: “This condition is stable. Once the saturation point has been reached, if the algorithm for selecting packets to be dropped is fair, the network will continue to operate in a degraded condition.”
2. As this has gone around the internet, a number of folks have asked about TCP_QUICKACK. I don’t tend to reach for it for a few reasons, including lack of portability, and weird semantics (seriously, read the man page). The bigger problem is that TCP_QUICKACK doesn’t fix the fundamental problem of the kernel hanging on to data longer than my program wants it to. When I say write(), I mean write().
Podcast appearance: The Debrief from Incident.io
(blog.danslimmon.com)

I’m so grateful to Incident.io for the opportunity to shout from their rooftop about Clinical troubleshooting, which I firmly believe is the way we should all be diagnosing system failures. Enjoy the full episode!
Looking at People
(chriscoyier.net)

At one time I got interested in the “eye contact”, or lack of it, on video calls. I was going to present at an online conference, and I wanted people watching to have it look like I was looking right at them. Plus, possibly even more importantly, one-on-one calls with people. I ended up buying […]
Things I found interesting in Bonnie Nardi’s “A Small Matter of Programming”
(petemillspaugh.com)

I read A Small Matter of Programming by Bonnie Nardi with some others in the Future of Coding book club.

It’s amazing how current this book reads given it was published over 30 years ago in 1993. That was only two years after the world’s first website and two years before JavaScript was invented.

On what end users care about (xii, 5)

[End users] have little or no training in computer science, and often little interest in computers per se. Most end users regard computers as useful—even indispensable—machines capable of helping them work more productively, creatively, and with greater pleasure. This is not the same as finding computers themselves intrinsically interesting—as a computer scientist or engineer would—and it dictates an entirely different set of tools for end users.

This is a timeless reminder from Nardi that users don’t care about our tech stack. They don’t care that I’m using Next instead of Remix, or Svelte or Astro or whatever. The thing that always sticks in my mind is Jerome Hardaway saying that users only care if your website (1) works, (2) is fast, and (3) looks good. In that order, too.

A user of a software system was heard to remark, "I didn’t have much time, so I did it the long way." ... The converse of the statement, putatively from a collection of MIT graffiti, goes: “I would rather write programs to help me write programs than write programs.”

End users like computers to get work done. Programmers like computers because programming is fun and interesting.

I feel this when showing websites to friends and family. The little touches and polish that satisfy me often go unnoticed by others. The way I navigate websites can be quite different from the way someone who doesn’t create websites for a living navigates websites.

That’s not to say end users aren’t sophisticated, though.

End users are not "casual", "novice" or "naive" users; they are people such as chemists, librarians, teachers, architects, and accountants, who have computational needs and want to make serious use of computers, but who are not interested in becoming professional programmers.

Nardi nicely tees up the main point of the book with this sentence. My paraphrased tl;dr of that main point is: end-user software should be easy enough for a user to do something useful within a couple hours of learning, but advanced enough that a user can become an expert over time, without limitation.

That quotation about end users’ Serious Use of Computers is the most inspiring tidbit of the whole book for me. If you the programmer build powerful tools for your end users, they will build really impressive things. We as programmers have a high leverage opportunity to arm lots of smart, motivated people with the tools that will allow them to build sophisticated programs. Observable’s collection of examples is a powerful illustration of this concept.

On design (3, 6, 132)

I like the idea of designing and writing software for one user, to start. But if your goal is for others to use your software, inevitably you’ll run into the trap that your wishes won’t perfectly match those of your users. Nardi’s vision of end-user programming accounts for emergent and surprising use cases of your software by offering flexibility to users.

On anticipating those user wishes, Nardi writes:

As has been shown time and again, no matter how much designers and programmers try to anticipate and provide for what users will need, the effort always falls short because it is impossible to know in advance what may be needed.

And a few pages later, my favorite quotation from the entire book, hands down:

Of course design still is, and almost certainly always will be, a black art whose most crucial elements remain an incalculable mix of imagination, intuition, and intellectual interaction with one’s fellows.

This really resonates. You have to hold the thing in your hands, collaborate with teammates, and talk to users to find the right design.

More on that toward the end of the book:

The chief point is that application development merges with user interface design: one does not build "the application" and then tack on a user interface; rather, the two processes are more closely interwoven, and the user interface, by being subsumed in the application development process, comes to be a critical aspect of organizing and presenting application semantics—not an afterthought.

On natural language-based human computer communication (10, 15, 83)

We begin, in chapter 2, by asking why we need end user programming systems at all. Aren’t natural language-based systems that will allow us to just tell the computer what we want right around the corner? We...argue that conversation is not the best model for human-computer communication.

If by right around the corner Nardi—or really the crowd she’s speaking on behalf of—meant November 2022, then yes natural language conversation with computers was just around the corner in 1993. This section of the book reads as the AI chapter and Nardi as the AI skeptic.

I think she nails the idea that language models enable a programmer to iterate more quickly but don’t (yet) take the programmer out of the loop:

While one can conceive of a system that could automatically generate programs for end users, the probability of such a system producing right-the-first-time programs appears to be low. There may well be a role for high-level programming languages and environments that enable users to specify programs, evaluate them, and iterate toward desirable ones. These will probably not deliver the miracle of "automatic programming" that one might hope for, however.

I like the label right-the-first-time programs. And even with powerful language models now available for workflows like generating code, brainstorming, and researching, a lot of work is left to be done for computer agents to act on on our natural language wishes. Nardi discusses context as the missing piece.

The computer fails to understand and to speak because much of what is understood in a conversation attaches to the context in which the participants of a conversation find themselves, and of which they have knowledge.

She provides the example of asking a computer to distribute the latest draft to the group and let [me] know when they’ve read it. That simple-for-a-human command is full of context and would require complex logic for an agent to complete.

On alternatives to natural language (24)

Nardi also makes a compelling case that, depending on the activity, we already use many forms of communication better suited to their given task than natural language. Driving a car is a great (and funny) example. A steering wheel is way better suited to the task of driving than talking to your car would be.

This reminds me of something I read recently in Make It Stick—that there is no empirical support for the claim that instruction in one’s preferred learning style improves learning outcomes. For example, we often hear people (myself included) say "I am a visual learner," but as yet there’s no evidence to back it up. From the book:

The popular notion that you learn better when you receive instruction in a form consistent with your preferred learning style, for example as an auditory or visual learner, is not supported by the empirical research...It is more important that the mode of instruction match the nature of the subject being taught: visual instruction for geometry and geography, verbal instruction for poetry, and so on.

Nardi hits on that point:

For many problems, a graphical representation is much the most "natural"...Again, the problem is one of matching the practical problem at hand to the design of the technology, rather than assuming a priori the primacy of one form of communication, i.e. conversation.

It’s the subject matter that dictates the right medium of communication.

On baseball (28, 30-33, 85)

Nardi argues that formal languages and notations are widely used by people in all walks of life. My favorite example she goes into is baseball, which is full of specialized communication: catchers signaling to pitchers, base coaches signaling to runners and batters, fans and statisticians keeping scorecards, etc. She quotes from Fred Schwed, Jr. in Baseball by the Rules:

The baseball box score is the pithiest form of written communication in America today. It is two or three hours...of complex activity, virtually inscribed on the head of a pin.

Baseball score keeping is a particularly fascinating example because it’s common for fans to fill out scorecards while attending a game, purely for fun. My grandfather actually used to do it when I went to Baltimore Orioles games with him. Fans are motivated to learn a fairly complex set of rules and notation to fill out scorecards, without any requirement or monetary incentive to do so.

On knitting (34)

Nardi follows her explanation of baseball notation with a similar exploration of knitting notation. This back-to-back sequence produced an interesting effect for me as the reader.

As a casual baseball fan who hasn’t ever kept scorecards, it was fun and engaging to read and think about baseball score keeping and notation. But as a knitting novice, reading knitting shorthand produced something similar to the eyes-glaze-over effect you’d see when showing code to a non-coder or math formulas to a non-mathematician. Knitting instructions are intimidating to me because of my lack of familiarity with the domain whereas baseball scorecards are interesting like a puzzle or game.

On end user programming as good fun (35)

As with baseball scorecards or knitting, end user programs should be fun and satisfying.

In addition to being useful, formal systems have an appeal all their own; people naturally gravitate to them. Games are perhaps the clearest example of the way we spontaneously engage in interaction with formal systems.

Board games are another fantastic example. Playing Catan requires learning a non-trivial set of rules, for example.

But it’s not just leisurely activities that can be fun—or maybe satisfying is a better word for it. When I worked as an investment banker, I found great satisfaction in working on spreadsheets. Nardi makes the point that only when people have a particular interest or motivation in something—be it baseball, knitting, or forecasting financials—will they learn a formal language to accomplish their goal.

And familiarity with a domain doesn’t just lead to fun or satisfaction in learning and using a formal notation. It also unlocks a better ability and confidence to learn that formal skill. Nardi cites studies that show participants' math and logic skills are much better when problems are framed in things they are familiar with—like cars—than when framed abstractly as pure math.

On UI density (65, 85-87)

One major challenge with visual programming and visual end user programs is information density in the UI. Text often wins out because its density can be hard to match, but there are counterexamples, like spreadsheets organizing computation in a grid. The book argues that a hybrid approach with text and graphics mixed together—like in spreadsheets—makes the most sense, and that it’s unrealistic to disregard text.

Sometimes a few words are worth a thousand pictures.

Nardi notes that spreadsheet users have a strong preference for seeing as much data as possible without scrolling. I like to view large code files and long documents on a vertical monitor for the same reason.

On spreadsheets

The more Nardi discusses spreadsheets, the more in awe I am of their elegance, which I did not fully appreciate when I used to spend ~12-16 hours per day working in Excel. I am always impressed by UIs that balance density and polish. But what I think is most elegant about spreadsheets is that they are simple enough to be useful to a beginner within the first hour of use and powerful enough for experts to perform sophisticated tasks over hundreds of hours.

Nardi describes this concept as layers:

...the system must incorporate the advanced features that make sophisticated customizations and extensions possible in the first place. But...the system should also be modular...thought of as being in layers, such that an end user may access a first layer that is easy to use and learn, while succeeding layers house modules for more advanced use.

Spreadsheets are not without shortcomings, of course. For example, I am floored in retrospect by how bad our version control system was in Excel at my old job. We basically just versioned up the file whenever it felt like a good breaking point, so you’d end up with files like shopify_v346.xlsx. Even worse, two people couldn’t reasonably work on the same model simultaneously without git-style branching available. Version control with git was one of my happiest discoveries when I started coding.

On associating information with its physical location (87)

Nardi includes conversations from ethnographic studies of spreadsheet users who talk about knowing where to find stuff in their models. One user knows that if she’s in the middle of her spreadsheet around the Municipal Bonds section then the Preferred Stocks section must be above that.

Remembering where some information is in a spreadsheet has a similar feeling for me to remembering where I read something in a physical book. I can remember that some piece of information was in a smaller paragraph around the bottom of a page on the left of the book, for example, then scan through the book until I find it. I don’t often experience that kind of feel with a codebase.

On modern end-user programming applications and research

Nardi includes something of a call for research in the beginning of the book:

One hope for this book is that it will stimulate further empirical research on existing end user software systems so that we can learn more about what works and what doesn’t work.

I am new to this field, so I’d appreciate any recommendations on papers to read or examples to check out.

For the research side, I’d like to spend some time using Elicit as a tool to survey the end-user programming research in the last 30 years. So far, other than this book I’ve read the Ink & Switch essay on end-user programming and Steve Krouse’s essay on end-programmer programming, and I’ve listened to the Future of Coding podcast episode about this book (discussion of the book starts around the 20m mark).

As for modern examples of end-user programming in the wild, Observable and Notion are two programs I’ve actually used that come to mind. Then there are apps I haven’t used like Zapier, iOS custom shortcuts, and website builders (e.g. Bubble, Webflow).

Steve—in collaboration with Ellen Chisa and Paul Biggar when they were working on Dark—created The Whole Code Catalog. That is a great list, but it is now 5 years old, and it includes some more programmer-oriented tools like Glitch and Zeit (Vercel) that I don’t think of as end-user tools (although I know the lines are blurry). I’m curious to learn about the best modern examples of end-user programs, so feel free to reply below or on Twitter/wherever if you have examples to share.
Programming mantras are proverbs
(lukeplant.me.uk)

Proverbs are supposed to encapsulate a bit of wisdom, but you still need know when to apply it.
Paradigms succeed when you can strip them for parts
(buttondown.email)
I'm speaking at DDD Europe about Empirical Software Engineering!¹ I have complicated thoughts about ESE and foolishly decided to update my talk to cover studies on DDD, so I'm going to be spending a lot of time doing research. Newsletters for the next few weeks may be light.

The other day I was catching myself up on the recent ABC conjecture drama (if you know you know) and got reminded of this old comment:

The methods in the papers in question can be used relatively quickly to obtain new non-trivial results of interest (or even a new proof of an existing non-trivial result) in an existing field. In the case of Perelman’s work, already by the fifth page of the first paper Perelman had a novel interpretation of Ricci flow as a gradient flow which looked very promising, and by the seventh page he had used this interpretation to establish a “no breathers” theorem for the Ricci flow that, while being far short of what was needed to finish off the Poincare conjecture, was already a new and interesting result, and I think was one of the reasons why experts in the field were immediately convinced that there was lots of good stuff in these papers. — Terence Tao

"Perelman's work" was proving the Poincaré conjecture. His sketch was 68 pages of extremely dense math. Tao's heuristic told him it was worth working through is that it had interesting ideas that were useful to other mathematicians, even if the final paper was potentially wrong.

I use this idea a lot in thinking about software: the most interesting paradigms are ones you can "scavenge" ideas from. Even if the whole paradigm isn't useful to you, you can pull out some isolated ideas, throw away the rest, and still benefit from it. These paradigms are the ones that are most likely to spread, as opposed to paradigms that require total buyin.

Let's take as an example functional programming (FP). ML/Haskell-style FP has a lot of interconnected ideas: higher-order functions, pattern matching, algebraic data types, immutable data structures, etc. But if you ignore all of that away and stick to writing everything in idiomatic Java/Python/C, but the only change you make is writing more pure functions, then you've already won.

This has several positive effects on FP adoption. It immediately demonstrates that FP has good ideas and is not just some weird cult practice. It also means people can "dip their toes in" and benefit without having to go all in. Then they can judge if they're satisfied or want to go further into learning FP. And if they do go further, they can learn new ideas gradually in stages.

By contrast, compare array programming languages (APLs). There's lot of cool ideas in APL, but they're so interconnected it's hard to scavenge anything out. If I want to use multidimensional arrays in Python I can't just whip it up myself, I have to import an external library like numpy and learn a whole bunch of new ideas at once. So most of the ideas in APL stay in APLs.³

A related idea how techniques are only adopted if you can "half-ass" them. TDD is useful even if you only use it sometimes and phone in the "green-refactor" steps, which helped adoption. Cleanroom depends on everybody doing it properly all the time, which hindered adoption.

Scavenging from formal methods

Part of the reason I'm thinking about this is that business has been real slow lately and I'm thinking about how I can make formal methods (FM) more popular. What ideas can we scavenge from writing specifications?

Obvious ones are predicate logic and the implication operator, but they're both poorly supported by existing programming languages, and I'm not sure how to best pitch them.² Also, they're "too big", and it takes a lot of practice to wrap your mind around using quantifiers and implication.

Sets, definitely. Sets are unordered, unique collections of objects. Lots of things we store as lists could be sets instead. I use sets way more after learning some FM.

There's also a lot of high-level ideas in FM that usefully apply to all other code. System invariants are things that should always hold in every possible state. More broadly, we can categorize properties: "safety" is roughly "bad things don't happen", "reachability" is "good things always could happen", and "liveness" is "good things always do happen". Fairness, maybe. Refinement is definitely too in-the-weeds to be scavengable.

I'd also call out decision tables as a particular technique that's extractable. You can teach decision tables in five minutes to a completely nontechnical person.

As I said, short newsletters for the rest of this month. Have a great week!

Things I'm curious about
- Logic Programming: What ideas are scavengable from LP? I know that pattern matching originated in LP and found its way to FP from there, but that's associated with FP more now.
- Nontechnical People: What's a good word for them? "Nondev" seems like it leaves out data scientists, DBAs, etc. "Normie" and "Muggle" are too offensive. "Layfolk" just sounds weird.
- TLA+ workshops: Would people be interested in an "intermediate" TLA+ workshop, for people who've already written production specs but want to get better? I'm thinking half a day, 6-8 people, with people bringing up what they want to learn about in advance. If there's enough people who'd want this I can plan something out.
1. Domain-Driven Design. ↩
2. This is one big reason my attempt to write a book on predicate logic has been in a long stall. ↩
3. One idea that mathematicians scavenged from APLs is the Iverson bracket: [P] = P ? 1 : 0. ↩
Testing with Python (part 4): why and what to test?
(bitecode.dev)

Delaying the chores by second-guessing all your decisions
The World Record for Loneliness
(blog.danslimmon.com)

What's the farthest any person has been from the nearest other person?
Backup Game Boy ROMs and saves on Ubuntu
(sethmlarson.dev)
Backup Game Boy ROMs and saves on Ubuntu

About • Blog • Newsletter • Links

Backup Game Boy ROMs and saves on Ubuntu

Published 2024-05-06 by Seth Larson
Reading time: minutes
I'm a big fan of retro video games, specifically the Game Boy Color, Advance, and GameCube collections. The physicality of cartridges, link cables, and accessories before the internet was widely available for gaming has a special place in my heart.

With the recent changes to the App Store to allow emulators (and judging by the influx of issues opened on the Delta emulator GitHub repo) there is a growing interest in playing these games on mobile devices.

So if you're using Ubuntu like me, how can you backup your ROMs and saves?

Using GB Operator with my copy of Pokémon FireRed

What you'll need

To backup data from Game Boy cartridges I used the following software and hardware:
- Game Boy, GBC, or GBA cartridge
- GB Operator from Epilogue ($50 USD + shipping)
- Playback software from Epilogue
- Epilogue includes a USB-C to USB-A connector, so an adapter may be needed
There are other options for backing up Game Boy ROMs and saves, some of which are less expensive than the GB Operator, but I went with the GB Operator because it explicitly listed Linux support and from watching reviews appeared to provide a streamlined experience.

Getting started with GB Operator and Playback

Download the Playback AppImage for Linux and the CPU architecture you have. Make the AppImage file executable with:
```
$ chmod a+x ./Playback.AppImage
```
If you try to execute this file ($ ./Playback.AppImage) and receive this error:
```
dlopen(): error loading libfuse.so.2

AppImages require FUSE to run. 
You might still be able to extract the contents of this AppImage 
if you run it with the --appimage-extract option. 
See https://github.com/AppImage/AppImageKit/wiki/FUSE 
for more information
```
You'll need to install FUSE on Ubuntu:
```
$ sudo apt-get install libfuse2
```
After this you should be able to run Playback:
```
$ ./Playback.AppImage
```
From here the application should launch, but even if you have your GB Operator plugged in there's a chance it won't be detected. There's a good chance that your current user doesn't have access to the USB device. Epilogue provides some guides on how to enable access.

After following the above guide and logging in and out for the changes to take effect your GB Operator should be detected. Connecting a cartridge and navigating to "Data" in the menus provides you with options to "Backup Game" and "Backup Save".

Selecting these options might trigger a crash with the following error when starting the export process:
```
(Playback.AppImage:44475): Gtk-WARNING **: 15:05:20.886: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
**
Gtk:ERROR:../../../../gtk/gtkiconhelper.c:494:ensure_surface_for_gicon: assertion failed (error == NULL): Failed to load /usr/share/icons/Yaru/16x16/status/image-missing.png: Unrecognized image file format (gdk-pixbuf-error-quark, 3)
Bail out! Gtk:ERROR:../../../../gtk/gtkiconhelper.c:494:ensure_surface_for_gicon: assertion failed (error == NULL): Failed to load /usr/share/icons/Yaru/16x16/status/image-missing.png: Unrecognized image file format (gdk-pixbuf-error-quark, 3)
Aborted (core dumped)
```
The fix that worked for me came from a Redditor who talked with Epilogue support and received the following answer:
```
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-svg.so \
  ./Playback.AppImage
```
Running the AppImage with the LD_PRELOAD value set fixed my issue, I've since added this shim to an alias, so I don't have to remember it. Hopefully in a future version of Playback this won't be an issue.

From here backing up your ROMs and saves should work as expected. Happy gaming!

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0
sphinxcontrib-sqltable 2.1.0 - SQLAlchemy 2.0 support
(doughellmann.com)

What’s new in 2.1.0? update packaging to use pyproject.toml Update sqltable.py to support SQLAlchemy 2.0 (contributions by Gabriel Gaona) Pin SQLAlchemy>=2.0 to prevent backwards incompatibility with changes to the query execution interface. (contributions by Gabriel Gaona) fix up python version support list Fix the obsolete link to bitbucket (contributions by kbaikov) find the sample schema file depending on how sphinx is invoked move sample database to /tmp for rtd.org
(chriscoyier.net)

If you long for a web of yore were things were better, somehow: The thing is: none of this is gone. Nothing about the web has changed that prevents us from going back. If anything, it’s become a lot easier. We can return. Better, yet: we can restore the things we loved about the old web while incorporating […]
Weekly Update 398
(troyhunt.com)

Presently sponsored by: Kolide is an endpoint security solution for teams that want to meet SOC2 compliance goals without sacrificing privacy. Learn more here.
How many different angles can you have on one data breach? Facial recognition (which probably isn't actual biometrics), gambling, offshore developers, unpaid bills, extortion, sloppy password practices and now, an arrest. On pondering it more after today's livestream, it's the unfathomable stupidity of publishing
A year of publishing the MDN Blog
(developer.mozilla.org)

We've been writing about web development and the web platform on the MDN Blog since May 2023. Here's our highlights and top posts along with our favorites.
What's up Python? Lots of Wasm, Pydantic on fire, and Guido in a crossword...
(bitecode.dev)

April, 2024
Isolating risk in the CPython release process
(sethmlarson.dev)
Isolating risk in the CPython release process

About • Blog • Newsletter • Links

Isolating risk in the CPython release process

Published 2024-05-02 by Seth Larson
Reading time: minutes
This critical role would not be possible without funding from the Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

The first stage of the CPython release process produces source and docs artifacts. In terms of "supply chain integrity", the source artifacts are the most important artifact produced by this process. These tarballs are what propagates down into containers, pyenv, and operating system distributions, so reducing the risk that these artifacts are modified in-flight is critical.

A few weeks ago I published that CPythons' release process for source and docs artifacts was moved from developers machines onto GitHub Actions, which provides an isolated build environment.

This already reduces risk of artifacts being accidentally or maliciously modified during the release process. The layout of the build and release process before used a build script which built the software from source, built the docs, and then ran tests all in the same isolated job. This was totally fine on a developers' machine where there isn't any isolation possible between different stages.

Before and after splitting up build stages

With GitHub Actions we can isolate each stage from the others and remove the need to install all dependencies for all jobs into the same stage. This drastically reduces the number of dependencies, each representing a small amount of risk, for the stages that are critical for supply chain security of CPython (specifically the building of source artifacts).

Above you can see on the left the previous process which pulls all dependencies into the same job (represented as a gray box) and the right being the new process having split up the builds and testing and the source and docs builds.

After doing this split the "Build Source" task only needs ~170 dependencies instead of over 800 dependencies (mostly for documentation LaTeX and PDFs) and all of those dependencies either come with the operating system and thus can't be reduced further or are pinned in a lock file.

The testing stage still has access to the source artifacts, but only after they've been uploaded to GitHub Action Artifacts and aren't able to modify them.

I plan to write a separate in-depth article about dependencies, pinning, and related topics, stay tuned for that.

SOSS Community Day 2024 recordings

The recordings for my talk and the OpenSSF tabletop session have been published to YouTube:

Embrace the Differences: Securing open source software ecosystems where they are

In the talk I discuss the technical and also social aspects of why it's sometimes difficult to adopt security changes into an open source ecosystem. Ecosystem-agnostic work (think memory safety, provenance, reproducible builds, vulnerabilities) tends to operate at a much higher level than the individual ecosystems where the work ends up being applied.

OpenSSF Tabletop Session

The tabletop session had many contributors representing all the different aspects of discovering, debugging, disclosing, fixing, and patching a zero-day vulnerability in an open source component that's affecting production systems.

Tabletop Session moderated by Dana Wang

Mentoring for Google Summer of Code

Google Summer of Code 2024 recently published its program and among the projects and contributors accepted was CPython's project for adopting the Hardened Compiler Options Guide for C/C++. I'm mentoring the contributor through the process of contributing to CPython and hopefully being successful in adopting hardened compiler options.

Other items
- Windows artifacts on python.org will have Software Bill-of-Materials documents after the next CPython release. Final step of uploading the documents to python.org has been merged.
- SBOM generation has been added to the Windows build scripts for core developers using Windows.
- Working on PyCon US 2024 talk slides with Michael Winser.
- Co-authoring the "Trusted Publishing for All Package Repositories" guide for the OpenSSF Securing Software Repos WG.
- Submitted a few bugfixes and released Truststore v0.9.0, upgraded pip's vendored copy.
- Created the April report for Alpha-Omega
That's all for this week! 👋 If you're interested in more you can read next week's report or last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0
"Integration tests" are just vibes
(buttondown.email)
New blog post! Software Friction is about how all the small issues we run into developing software cause us to miss deadlines, and how we can address some of them. Patreon here.

"Integration tests" are just vibes

You should write more unit tests than integration tests. You should write more integration tests than unit tests. "Integrated tests" are a scam.

But what exactly is an integration test? Is it tests that cross module boundaries? Is it tests that depend on several pieces of behavior, even in one module? Is it anything that tests "emergence"¹? Anything with side effects? If get_active_users() runs a database query, are all tests against it integration tests?

No wonder both Twitter and Google have (at one point) claimed that "integration test" wasn't a useful term. There are just too many different things people mean "integration".²

Even if we can't define integration test, we still have recognize them. There are tests that are bigger than our "unit tests" and smaller than an end-to-end test, and people have strong opinions about writing more or less of them. We can have a sense of something being an "integration test" even if we can't strictly define it.

I think it's more useful to think of integration tests in terms of family resemblances. There are certain measurable properties of tests, and integration tests measure relatively different than a unit test. One quality is test speed: unit tests are "fast", integration tests are "slow". This does not mean all integration tests are slower than any unit test (unit property tests can be very slow). But slow tests are more likely to be integration tests than unit tests.

"Integration tests" are a vibe we get based on how we perceive a test's properties. Some of these properties are:
1. The number of lines of production code executed as part of the test.
2. Number of boundaries crossed, module/object or otherwise.
3. Number of unit tests you can decompose the test into.
4. Number of external services required to run the test.
5. Amount of engineering effort to write/run the tests. Does it need separate files, a separate language, setting up lots of docker instances?
6. Nonlocality of discovered bugs: if the test fails, how many lines you have to look at to find the problem.
7. Likelihood a test failure is due to "emergence": things are correct in isolation but interact in a problematic way.
The more of these have high values, the more likely a test is to be an integration test.

Miscellaneous Thoughts

A test that hits the database (external service) can still be a unit test if it runs quickly and covers a small amount of code. Tests that make HTTP calls are less likely to be fast, so more likely to be integration tests.

Mocks and pact-style testing push some of these values down, by reducing boundaries crossed and lines of code executed at the cost of increased test infrastructure. Contracts and assertions increase bug locality, since the test fails the moment a pre/postcondition is broken instead of when the test completes.

If you push the values high enough, do you get end-to-end tests instead of integration tests? Or are E2E tests fundamentally different from integration tests in some way?

Is it possible for a unit test to uncover emergence, or is emergence a sufficient condition to say that something is definitely an integration test?

Things I'm curious about

Gonna make this a weekly thing maybe (or whenever I have enough new stuff to ask about).
- Sokoban. Sokoban is "easier" than Chess or GO (PSPACE vs EXPTIME), but Sokoban AIs regularly fail at simple puzzles that humans can do (ref ex1 ex2). What makes Sokoban "so much harder" for AIs? I don't think it's just a lack of interest because I've found Google and Deepmind papers on solving sokoban (and not making much progress).
- NFCs: I impulse-bought a bunch of NFC chips. I'm already using them to open train trackers, what else can I do with them? I'm not a mobile dev but can whip up Taskers and server code pretty quickly.
If you know something about either of these, feel free to share! Or write about them publicly and share the link with me. You can just reply directly to this email.
1. Geeze I can't believe I tried to make "automanual" a thing ↩
2. "Unit test" is also fuzzy but we at least agree that add(2, 3) == 5 is a unit test. ↩
Strum Machine
(chriscoyier.net)

I mostly play old time and bluegrass music. Particularly in old time and fiddle songs, the songs have a fairly rigid and repetitious structure. That’s not to say players don’t do interesting things with it, but unless a song is intentionally crooked, the structure of the song remains the same throughout. A typical structure is […]
Humble Chronicles: Shape of the Component
(tonsky.me)

Looking for an ergonomic way to define components
Humble Chronicles: The Inescapable Objects
(tonsky.me)

Why we need OOP, even in Clojure
Forbidden Links
(chriscoyier.net)

Malcolm Coles: 10+ years ago I created an annual list of websites that FORBADE you from linking to them, DEMANDED you write to ask for permission or LIMITED links to only their home page. Royal Mail even promised to post me a paper licence. Now a decade has passed, let’s see who’s still doing it … And […]
Weekly Update 397
(troyhunt.com)

Presently sponsored by: Kolide is an endpoint security solution for teams that want to meet SOC2 compliance goals without sacrificing privacy. Learn more here.
Banks. They screw us on interest rates, they screw us on fees and they screw us on passwords. Remember the old "bank grade security" adage? I took this saying to task almost a decade ago now but it seems that at least as far as password advice goes,
TypeIs does what I thought TypeGuard would do in Python
(rednafi.com)

The handful of times I’ve reached for typing.TypeGuard in Python, I’ve always been confused by its behavior and ended up ditching it with a # type: ignore comment. For the uninitiated, TypeGuard allows you to apply custom type narrowing1. For example, let’s say you have a function named pretty_print that accepts a few different types and prints them differently onto the console: from typing import assert_never def pretty_print(val: int | float | str) -> None: if isinstance(val, int): # assert_type(val, int) print(f"Integer: {val}") elif isinstance(val, float): # assert_type(val, float) print(f"Float: {val}") elif isinstance(val, str): # assert_type(val, str) print(f"String: {val}") else: assert_never(val) If you run it through mypy, in each branch, the type checker automatically narrows the type and knows exactly what the type of val is.
Feedbin Email Newsletter… Emails
(chriscoyier.net)

Feedbin has custom email addresses now so you can use a unique email address for each newsletter and not worry about a spam leak.
Testing with Python (part 3): pytest setup
(bitecode.dev)

Testing your patience
Podcast: Компьютеры не особо рассчитаны на людей @ Думаем дальше
(tonsky.me)

Илья рассказывает, как мы просрали многозадачность, а Никита ругает консольный интерфейс Гита.
MemoryDB: Speed, Durability, and Composition.
(brooker.co.za)
MemoryDB: Speed, Durability, and Composition.

Blocks are fun.

Earlier this week, my colleagues Yacine Taleb, Kevin McGehee, Nan Yan, Shawn Wang, Stefan Mueller, and Allen Samuels published Amazon MemoryDB: A fast and durable memory-first cloud database¹. I’m excited about this paper, both because its a very cool system, and because it gives us an opportunity to talk about the power of composition in distributed systems, and about the power of distributed systems in general.

But first, what is MemoryDB?

Amazon MemoryDB for Redis is a durable database with microsecond reads, low single-digit millisecond writes, scalability, and enterprise security. MemoryDB delivers 99.99% availability and near instantaneous recovery without any data loss.

or, from the paper:

We describe how, using this architecture, we are able to remain fully compatible with Redis, while providing single-digit millisecond write and microsecond-scale read latencies, strong consistency, and high availability.

This is remarkable: MemoryDB keeps compatibility with an existing in-memory data store, adds multi-AZ (multi-datacenter) durability, adds high availability, and adds strong consistency on failover, while still improving read performance and with fairly little cost to write performance.

How does that work? As usual, there’s a lot of important details, but the basic idea is composing the in-memory store (Redis) with our existing fast, multi-AZ transaction journal² service (a system we use in many places inside AWS).

Composition

What’s particularly interesting about this architecture is that the journal service doesn’t only provide durability. Instead, it provides multiple different benefits:
- durability (by synchronously replicating writes onto storage in multiple AZs),
- fan-out (by being the replication stream replicas can consume),
- leader election (by having strongly-consistent fencing APIs that make it easy to ensure there’s a single leader per shard),
- safety during reconfiguration and resharding (using those same fencing APIs), and
- the ability to move bulk data tasks like snapshotting off the latency-sensitive leader boxes.
Moving these concerns into the Journal greatly simplifies the job of the leader, and minimized the amount that the team needed to modify Redis. In turn, this makes keeping up with new Redis (or Valkey) developments much easier. From an organizational perspective, it also allows the team that owns Journal to really focus on performance, safety, and cost of the journal without having to worry about the complexities of offering a rich API to customers. Each investment in performance means better performance for a number of AWS services, and similarly for cost, and investments in formal methods, and so on. As an engineer, and engineering leader, I’m always on the look out for these leverage opportunities.

Of course, the idea of breaking systems down into pieces separated by interfaces isn’t new. It’s one of the most venerable ideas in computing. Still, this is a great reminder of how composition can reduce overall system complexity. The journal service is a relatively (conceptually) simple system, presenting a simple API. But, by carefully designing that API with affordances like fencing (more on that later), it can remove the need to have complex things like consensus implementations inside its clients (see Section 2.2 of the paper for a great discussion of some of this complexity).

As Andy Jassy says:

Primitives, done well, rapidly accelerate builders’ ability to innovate.

Distribution

It’s well known that distributed systems can improve durability (by making multiple copies of data on multiple machines), availability (by allowing another machine to take over if one fails), integrity (by allowing machines with potentially corrupted data to drop out), and scalability (by allowing multiple machines to do work). However, it’s often incorrectly assumed that this value comes at the cost of complexity and performance. This paper is a great reminder that assumption is not true.

Let’s zoom in on one aspect of performance: consistent latency while taking snapshots. MemoryDB moves snapshotting off the database nodes themselves, and into a separate service dedicated to maintaining snapshots.

This snapshotting service doesn’t really care about latency (at least not the sub-millisecond read latencies that the database nodes worry about). It’s a throughput-optimized operation, where we want to stream tons of data in the most throughput-efficient way possible. By moving it into a different service, we get to avoid having throughput-optimized and latency-optimized processes running at the same time (with all the cache and scheduling issues that come with that). The system also gets to avoid some implementation complexities of snapshotting in-place. From the paper, talking about the on-box BGSave snapshotting mechanism:

However, there is a spike on P100 latency reaching up to 67 milliseconds for request response times. This is due to the fork system call which clones the entire memory page table. Based on our internal measurement, this process takes about 12ms per GB of memory.

and things get worse if there’s not enough memory for the copy-on-write (CoW) copy of the data:

Once the instance exhausts all the DRAM capacity and starts to use swap to page out memory pages, the latency increases and the throughput drops significantly. […] The tail latency increases over a second and throughput drops close to 0…

the conclusion being that to avoid this effect database nodes need to keep extra RAM around (up to double) just to support snapshotting. An expensive proposition in an in-memory database! Moving snapshotting off-box avoids this cost: memory can be shared between snapshotting tasks, which significantly improves utilization of that memory.

The upshot is that, in MemoryDB with off-box snapshotting, performance impact is entirely avoided. Distributed systems can optimize components for the kind of work they do, and can use multi-tenancy to reduce costs.

Conclusion

Go check out the MemoryDB team’s paper. There’s a lot of great content in there, including a smart way to ensure consistency between the leader and the log, a description of the formal methods the team used, and operational concerns around version upgrades. This is what real system building looks like.

Bonus: Fencing

Above, I mentioned how fencing in the journal service API is something that makes the service much more powerful, and a better building block for real-world distributed systems. To understand what I mean, let’s consider a journal service (a simple ordered stream service) with the following API:
write(payload) -> seq read() -> (payload, seq) or none
You call write, and when the payload has been durably replicated it returns a totally-ordered sequence number for your write. That’s powerful enough, but in most systems would require an additional leader election to ensure that the writes being sent make some logical sense.

We can extend the API to avoid this case:
write(payload, last_seq) -> seq read() -> (payload, seq) or none
In this version, writers can ensure they are up-to-date with all reads before doing a write, and make sure they’re not racing with another writer. That’s sufficient to ensure consistency, but isn’t particularly efficient (multiple leaders could always be racing), and doesn’t allow a leader to offer consistent operations that don’t call write (like the in-memory reads the MemoryDB offers). It also makes pipelining difficult (unless the leader can make an assumption about the density of the sequences). An alternative design is to offer a lease service:
try_take_lease() -> (uuid, deadline) renew_lease(uuid) -> deadline write(payload) -> seq read() -> (payload, seq) or none
A leader who believes they hold the lease (i.e. their current time is comfortably before the deadline) can assume they’re the only leader, and can go back to using the original write API. If they end up taking the lease, they poll read until the stream is empty, and then can take over as the single leader. This approach offers strong consistency, but only if leaders absolutely obey their contract that they don’t call write unless they hold the lease.

That’s easily said, but harder to do. For example, consider the following code:
if current_time < deadline: <gc or scheduler pause> write(payload)
Those kinds of pauses are really hard to avoid. They come from GC, from page faults, from swapping, from memory pressure, from scheduling, from background tasks, and many many other things. And that’s not even to mention the possible causes of error on local_time. We can avoid this issue with a small adaptation to our API:
try_take_lease() -> (uuid, deadline) renew_lease(uuid) -> deadline write(payload, lease_holder_uuid) -> seq read() -> (payload, seq) or none
If write can enforce that the writer is the current lease holder, we can avoid all of these races while still allowing writers to pipeline things as deeply as they like. This still-simple API provides an extremely powerful building block for building systems like MemoryDB.

Finally, we may not need to compose our lease service with the journal service, because we may want to use other leader election mechanisms. We can avoid that by offering a relatively simple compare-and-set in the journal API:
set_leader_uuid(new_uuid, old_uuid) -> old_uuid write(payload, leader_uuid) -> seq read() -> (payload, seq) or none
Now we have a super powerful composable primitive that can offer both safety to writers, and liveness if the leader election system is reasonably well behaved.

Footnotes
1. To appear at SIGMOD’24.
2. The paper calls it a log service, which is technically correct, but a term I tend to avoid because its easily confused with logging in the observability sense.
Open Source Summit North America 2024
(sethmlarson.dev)
Open Source Summit North America 2024

About • Blog • Newsletter • Links

Open Source Summit North America 2024

Published 2024-04-24 by Seth Larson
Reading time: minutes
This critical role would not be possible without funding from the Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

Last week I attended SOSS Community Day and OSS Summit. It was great to catch up with friends and to meet new people for the first time at a cross-ecosystem open source event.

I gave a talk "Embrace the Differences: Securing software ecosystems where they are" which funnily enough had a complementary talk about the ways software repositories can collaborate for security.

My talk focused on how security standards and tools typically want to operate across software ecosystems and differences in standards, tools, maintainers, and user expectations between ecosystems can make that difficult.

You can download my slides and the recording will be available eventually on YouTube.

OpenSSF Tabletop Session

I also participated in the first OpenSSF Tabletop Session organized and hosted by Dana Wang. I played the role of "open source maintainer" and represented how an exploited zero-day vulnerability would appear from the perspective of an open source project.
I emphasized the realities of vulnerability disclosure to open source projects like under-resourcing, most maintainers being volunteers, and stress caused during times of crisis.

Cast of the tabletop session!

So many people!
I also met up with many folks doing open source security, maintenance, and funding:
- Met with many folks from the Alpha-Omega cohort. I'm looking forward to having more cross-functional discussions about new approaches to securing open source.
- Met with Michael Winser from Alpha-Omega to work on our PyCon US 2024 talk State of Supply Chain Security for Python.
- Met with my friend William Woodruff from Trail of Bits and discussed the system TLS proposal and build provenance for Homebrew (and what could be learned for Python).
- Met with Samuel Giddins and Martin Emde from the Ruby ecosystem to discuss shared challenges for introducing security into an ecosystem.
- Met Lauren Hanford from Tidelift to discuss supporting and funding maintainers.
- Met Mirko from Sovereign Tech Fund and discuss their program for hiring open source maintainers.
- Attended the talk by Kara Sowles from GitHub on the state of open source funding and learned about "downturn-resilient" funding.
- Many folks who asked me about security initiatives happening in the Python ecosystem.
Other items
- Participating in the pre-PEP discussion for reviving PEP 543 (system TLS).
- Created a proposal for allowing any release manager to create a CPython release.
- Created a pull request for uploading Windows SBOMs to python.org/downloads.
- Merging and backporting the upgrade to libexpat 2.6.2
- Chose which proposal(s) I would be willing to mentor for Google Summer of Code 2024 (thanks to the folks who submitted!)
- Triaged reports and fixes for the PSRT.
- I wasn't involved, but PyPI has added Trusted Publisher support for ActiveState, GitLab, and Google Cloud Build.
Note that I've been summoned for jury duty starting next week, so expect fewer updates over the next two weeks depending on how that goes.

That's all for this week! 👋 If you're interested in more you can read next week's report or last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0
Hello, Rust
(petemillspaugh.com)

I just wrote my first Rust program. Er, it’s not really even my program—I just read an example from a book, tried to implement it from memory a few minutes later, ran cargo run, then used hints from the Rust compiler to fix my code. First impression…the hints are really good!

Here’s the function:

fn gcd(mut n: u64, mut m: u64) -> u64 { assert!(n != 0 && m != 0); while m != 0 { if m < n { let t = m; m = n; n = t; } m = m % n; } n }

I’m starting with O’Reilly’s Programming Rust, but I plan to go through The Book in parallel and see what sticks.

I also ~~wrote~~ copied my first test in Rust, which quickly caught a bug in my greatest common divisor implementation.

#[test] fn test_gcd() { assert_eq!(gcd(14, 15), 1); assert_eq!(gcd(2 * 3 * 5 * 11 * 13, 3 * 7 * 11 * 19), 3 * 11); }

It's pretty nice that cargo includes tests out of the box without having to install a dependency.
Megan Marie Myers
(chriscoyier.net)

In Bend, you can’t miss the illustration work of Megan Marie Myers. It’s absolutely everywhere. There are murals. There are calendars for sale everywhere. Prints are hung up at coffeeshops, sometimes as whole exhibitions. She’s got books, stickers, hats, postcards, notecards, and heck, even neck gaiters. If her art wasn’t as charming and well done, […]
"Testing can show the presence of bugs but not the absence"
(buttondown.email)
Program testing can be used to show the presence of bugs, but never to show their absence! — Edgar Dijkstra, Notes on Structured Programming

Dijkstra was famous for his spicy quips; he'd feel right at home on tech social media. He said things he knows aren't absolutely true but will get people listening to him. In the same essay he discusses how testing could, in specific cases, show the absence of bugs:
- If a function has a finite set of inputs, like floor(float), you could prove correctness by simply testing all possible inputs.
- Even if a function has infinite inputs, tests prove the absence of bugs for tested inputs. You could say "date_of_easter works for any date within 10,000 years of today" and have "proven the absence of bugs" for the subset of dates people care about.¹
- You can use testing to complete a proof. For example, you can divide inputs into "equivalence classes", prove that if the function is correct for one value it will also be correct for all values in that class, and then test one value from each class.
In the majority case, however, testing cannot show the absence of bugs. Testing f(16) does not give you any guarantees about all x. f(x).

There are two possible takeaways. The reasonable one is that we shouldn't be blindly confident in testing, should use a variety of approaches as part of defense in depth, and always be careful about what we claim. The unreasonable takeaway is that testing is an inferior verification method, and we should use more "rigorous" approaches instead of testing.

It's unreasonable because methods that show the existence of bugs are more robust than methods that show the absence of them, and this makes them more useful more of the time.

Comparing tests and proof

Let's say you have a pseudocode function like this:
```
fun elem_before_x(s: int[], x: int) returns (r: int | NULL) {
    let i: int = index(s, x);
    if (i == NULL) {
        return NULL;
    } else return s[i-1];
}

Property:
 if r == NULL then x notin s
 else some i in 0..<len(s): s[i] == r && s[i+1] == x
```
The bug here is that we're out-of-bounds if x is the first element of the array. Testing will find this bug if you remember to test that case. Otherwise, we'd have a bug and a passing test suite. Tests can't verify the absence of bugs.

Now imagine we do find it and fix it:
```
else if (i == 0) {
        return s.last;
    } 
```
Modifying the property is left as an exercise to the reader. Once again, the passing test suite means we haven't verified the absence of the bug. Now, how does formal verification do on the same problem? I wrote a Dafny version of the same problem, here, both with and without the fix. Looking at v1, we see it finds the s[0] error:
```
>   var v := index.Extract();
>   return Some(s[v-1]);

(19,16): Error: index out of range    
   |
19 |     return Some(s[v-1]);
```
This error also goes away in v2: the bug is proved to be absent. But v2 adds a new error:
```
> ensures if r.None? then x !in s
>           else if s[|s|-1] == r.value then s[0] == x
>           else s[IndexOf(s, r.value)+1] == x

(7,20): Error: index out of range
  |
7 |             else if s[|s|-1] == r.value then s[0] == x
  |                      ^^^^^^^
```
When would s[|s|-1] be an index out of range? When the list is empty. But we know that's not possible: if we return a non-null value then the list must not be empty, because we found x in it! But the prover doesn't know that, it can't infer the relationship between the returned value and what we know about the list.² We have to annotate our code with proof hints for it to recognize that if we return a number, the list is nonempty.

In short: the prover definitely finds the out-of-bounds bug, but also raises a spurious "empty list" bug. The test might not find the former but also won't raise the latter. Tests can have false negatives but not false positives, while provers can have false positives but not false negatives.

(I'd say "formal verification can prove the absence of a bug but never the presence", but that's not true. Dafny can sometimes give you a concrete counterexample to a property, and type systems detect true positives like "you passed an array into an Int -> Int function" all the time.)

Knowing there "might" be a bug is not as useful as knowing there is one. Empirically speaking people mostly ignore compiler warnings and plenty of retrospectives find people ignore "might possibly be a bug" in even mission-critical software. Formal verification often becomes an "all or nothing" deal, either you totally prove the absence of bugs in programs or you get no benefits at all.

And that sucks! It takes a lot of effort to prove something correct, magnitudes more than running some tests. And that's why I consider testing "more robust": even a little bit of testing may find a true bug, while a little bit of verification doesn't get you much.

Correctness is not a thing

I'll end this by getting on my soapbox. Here's something that's really important to understand about "proving the absence of bugs":

You do not prove something's correct. You prove conformance to specification.

It's possible for code to conform to the specification and still have bugs. Two common cases are:
1. The spec is wrong or incomplete, like v2.dfy is in the above example. Spoiler in the footnote.³
2. The spec is based on an assumption that's not true in practice, like "no other program is writing to the same file."
This is also important because it puts the specification front-and-center. You can write tests without having a specification in mind, but you need a spec to do any kind of formal method. Doing FM trains you to think about specs, which in turn helps you write better tests.

Things I am interested in
1. Crossovers from fields besides classical engineering, who worked professionally as something else before switching to software. What ideas from your old job as a journalist/librarian/social worker/acrobat could be useful in software, and vice versa?
2. Wargaming in software. Have you designed or participated in a devops/security/system design/whatever simulated game? Was it helpful?
If you have insights into either of these, feel free to share! Or write about them publicly and share the link with me. You can just reply directly to this email.
1. Though you could argue this isn't sufficient for security. If easter(-1e8) causes a memory overflow then that's a bug even if a normal user won't ever call it. ↩
2. We're lucky that the Dafny standard library defines a postcondition for index functions saying the output is less than the length of the list. Otherwise the prover would raise a potential out-of-bounds error on s[IndexOf(s, x)]! ↩
3. s[|s|-1] == r.value then s[0] == x isn't true for the input [a, r, x, r]. The good news is that Dafny refuses to verify the code because it didn't match the spec. I only realized this when I was error-checking this piece. (This doesn't refute the point I made above about false positives: the false positive was that s[|s|-1] was out of bounds!) ↩
Weekly Update 396
(troyhunt.com)

Presently sponsored by: Kolide is an endpoint security solution for teams that want to meet SOC2 compliance goals without sacrificing privacy. Learn more here.
"More Data Breaches Than You Can Shake a Stick At". That seems like a reasonable summary and I suggest there are two main reasons for this observation. Firstly, there are simply loads of breaches happening and you know this already because, well, you read my stuff! Secondly, There
lilos v1.0 released
(cliffle.com)
After five years of development, something like seven art projects, one commercial product, and many changes to the dark corners of the Rust language, I’ve decided lilos is ready for a 1.0 release!

Some parts I’m excited about include:
- As of this release, the lilos APIs are entirely cancellation-safe.
- This release contains contributions from five other people, bringing the total number of contributors to seven! (Want to be number eight? Come say hi!)
- Thanks to one of those contributors, the operating system tests are now running in CI on QEMU!
(For anyone who’s new, lilos is a tiny embedded operating system that uses Rust async to allow complex multitasking on very limited microcontrollers without requiring dynamic memory allocation. Read more about lilos on my project page, where I link to the docs and provide a curated collection of blog posts on the topic.)

See the release notes if you’re curious about what’s changed. If you’ve got firmware written for an earlier version of lilos (particularly the 0.3.x series) and would like to update (you don’t have to!), those release notes will guide you through the process. There have been some breaking API changes, but I promise they’re all improvements.
Setting up service workers on Vultr
(developer.mozilla.org)

This guide introduces you to service workers and their lifecycle. Learn how to deploy a project using service workers with HTTPS on Vultr.
(chriscoyier.net)

Most internet travels by wire. Straight through the dang ocean. Josh Dzieza in a feature for The Verge: These fragile wires are constantly breaking — a precarious system on which everything from banks to governments to TikTok depends. But thanks to a secretive global network of ships on standby, every broken cable is quickly fixed. […]

What I look for in empirical software papers

The high-level view

Quantitative Metrics

Clinical Trials

Other Quantitative Research

Qualitative Studies

Reading the Paper

Threats to validity

References

Things I'm curious about

Implementing MVCC and major SQL transaction isolation levels

Why do we need transaction isolation?

Locks? MVCC?

Scaffolding

Transaction management

Get, set, delete

Isolation levels and MVCC visibility rules

Read Uncommitted

Read Committed

Repeatable Read

Snapshot Isolation

Serializable Isolation

Production-quality testing

Vacuuming and cleanup

Conclusion

Ask questions first, shoot later

Parallel Data Fetching

Data Fetching Patterns in Single-Page Applications

Light speed with Python and JS

sphinxcontrib-sqltable 2.1.1 - db cursor fix

Going to the cinema is a data visualization problem

photostream 131

Weekly Update 399

Should I learn Computer Science?

The full curriculum

Starting smaller

The value of pure learning

Anki

Learning CS by learning Rust

Protobuffed contracts

A decade of code

Bringing supply chain security to PyCon US 2024

Bringing supply chain security to PyCon US 2024

Using the Page Visibility API

Browsertech Digest: SF&LA events, Kyle Barron interview, Arrow/Parquet

Events

Kyle Barron interview

The ubiquity of Arrow / Parquet

It's always TCP_NODELAY. Every damn time.

It’s always TCP_NODELAY. Every damn time.

Podcast appearance: The Debrief from Incident.io

Looking at People

Things I found interesting in Bonnie Nardi’s “A Small Matter of Programming”

On what end users care about (xii, 5)

On design (3, 6, 132)

On natural language-based human computer communication (10, 15, 83)

On alternatives to natural language (24)

On baseball (28, 30-33, 85)

On knitting (34)

On end user programming as good fun (35)

On UI density (65, 85-87)

On spreadsheets

On associating information with its physical location (87)

On modern end-user programming applications and research

Programming mantras are proverbs

Paradigms succeed when you can strip them for parts

Scavenging from formal methods

Things I'm curious about

Testing with Python (part 4): why and what to test?

The World Record for Loneliness

Backup Game Boy ROMs and saves on Ubuntu

Backup Game Boy ROMs and saves on Ubuntu

What you'll need

Getting started with GB Operator and Playback

sphinxcontrib-sqltable 2.1.0 - SQLAlchemy 2.0 support

Weekly Update 398

A year of publishing the MDN Blog

What's up Python? Lots of Wasm, Pydantic on fire, and Guido in a crossword...

Isolating risk in the CPython release process

Isolating risk in the CPython release process