Wisest Commit in the Kernel

2025-07-30
Transmission 2

Let's break from the drudgery and pursue something whimsical. We are not machines. I have a question for you, and by the end of this we shall have an answer: What is the wisest commit in the Linux kernel's git repo?

It depends on what is meant by 'wise'. (Un)fortunately, computer programmers are not known for their robust philosophy, so we'll go with 'a commit is wise if it changes few lines of code, but has many lines of commit message to explain why'. In other words, the commit with the greatest ratio of explanation required to amount changed.

We can start with downloading the git histories.

$ # Download from github to abuse microsoft's bandwidth
$ git clone https://github.com/torvalds/linux
$ cd linux

We'll only look at the master branch.

$ # Log all hashes to a file
$ git log --format=format:%H > hashes
$ head -n 5 hashes
5f33ebd2018ced2600b3fad2f8e2052498eb4072
327579671a9becc43369f7c0637febe7e03d4003
4bb01220911d2dd6846573850937e89691f92e20
bef3012b2f6814af2b5c5abd6b5f85921dbb8a01
c37495fe3531647db4ae5787a80699ae1438d7cf
$ wc -l hashes
1369380 hashes
That's a lot of commits.

We must get the commit message and stats of a commit to calculate how wise it is.

Let's look at a random hash: e4d2878369d590bf8455e3678a644e503172eafa

$ hash=e4d2878369d590bf8455e3678a644e503172eafa
$ git log --format=%B -n 1 $hash
rxrpc: Fix irq-disabled in local_bh_enable()

The rxrpc_assess_MTU_size() function calls down into the IP layer to find
out the MTU size for a route.

-- snip --
$ git show $hash --shortstat --format=""
 3 files changed, 4 insertions(+), 4 deletions(-)

With these two commands, we can have all of the information we need to find the wisest commit.

But first, we should turn this boring, drab text into glamorous, colorful numbers. We need the length of the commit message, and the total number of lines changed.

$ len=$(git log --format=%B -n 1 $hash | wc -c)
$ echo $len
2135
$ # ...and the numbers from the stats
$ # for our purposes, we add insertions and deletions together
$ num_lines=$(git show $hash --shortstat --format="" | cut -f2,3 -d, | cut -f2,4 -d ' ' | sed 's/ /+/' | bc)
$ echo $num_lines
8
$ # get the ratio
$ wiseness=$(echo "scale=2; $len / $num_lines" | bc)
$ echo $wiseness
266.87
This commit has a wiseness score of 266.87.

Now, we need only a script to generate wiseness for every hash.

$ for hash in $(cat hashes); do
>     len=$(git log --format=%B -n 1 $hash | wc -c);
>     num_lines=$(git show $hash --shortstat --format="" | cut -f2,3 -d, | cut -f2,4 -d ' ' | sed 's/ /+/' | bc);
>     wiseness=$(echo "scale=2; $len / $num_lines" | bc);
>     echo "$wiseness $hash";
> done > hashes_wise
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
(standard_in) 2: syntax error
^C^C^C^C^C^C^C
Uh-oh.

There are at least two errors:

  • Some commits apparently change 0 lines.
  • Output in some cases is different than usual.

I could run this script again with debug information or stderr appended to find out which hashes are problematic. But it takes a lot of time and battery for my laptop to run, and luckily I don't have to. Bash will continue to print past errors, which means the hashes I'm looking for will have lines where wiseness is an empty string.

$ grep '^ ' hashes_wise 
 a4bd43d6f7b72b90e064eb8c22c720126cfc1525
 ddfd1f30b5badcd06c199fa519bec5f0f54892e0
 cb6749b961b684701a5d4bc905c4923407017c88
 ecdb0b32e518a69a724d3cb7a6e5a4d2be2db8a1
 d2cf8ccf5a1871058a083c00efe37d7eb91bf6bd
 51bd73d92f89173e3394276f4b840eed361f11b5

The first five will have zero lines changed. The last will be weird.

$ git show a4bd43d6f7b72b90e064eb8c22c720126cfc1525 --shortstat
commit a4bd43d6f7b72b90e064eb8c22c720126cfc1525
Author: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date:   Fri Apr 25 15:13:39 2025 +0800

    scripts/lib/kdoc: change mode to 0644
    
    The script library here contain just classes. Remove execution
    permission.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>
    Message-ID: <be0b0a5bde82fa09027a5083f8202f150581eb4e.1745564565.git.mchehab+huawei@kernel.org>

 3 files changed, 0 insertions(+), 0 deletions(-)

It looks like this commit changes file permissions, which is why no lines are changed.

The next four are file renames. And the last?

$ git show 51bd73d92f89173e3394276f4b840eed361f11b5 --shortstat
commit 51bd73d92f89173e3394276f4b840eed361f11b5
Merge: d47c670061b5 13368df520f1
Author: Christian Brauner <brauner@kernel.org>
Date:   Thu Feb 27 11:33:06 2025 +0100

    Merge branch 'vfs-6.15.shared.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
    
    Pull in the VFS changes shared with xfs for making RWF_DONTCACHE work
    with buffered writes for xfs.
    
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Do you see that? No stats summary.

$ git show 51bd73d92f89173e3394276f4b840eed361f11b5 --shortstat --format=""
$ # crickets...

Let's not ponder too deeply how a git commit does not change any files, or why this is the only one that does this.

(EDIT: There are multiple merge commits that do this in the kernel repo. It seems to have something to do specifically with merging tags.)

The fixed script, now with fewer divisions by zero:

$ for hash in $(cat hashes); do
>     len=$(git log --format=%B -n 1 $hash | wc -c);
>     num_lines=$(git show $hash --shortstat --format="" | cut -f2,3 -d, | cut -f2,4 -d ' ' | sed 's/ /+/' | bc);
>     if [ ! -n "$num_lines" ] || [ "$num_lines" -eq "0" ]; then
>         num_lines=1;
>     fi;
>     wiseness=$(echo "scale=2; $len / $num_lines" | bc);
>     echo "$wiseness $hash";
> done > hashes_wise

...After about two days of processing (and nearly losing all progress in an unrelated system crash), we have a complete wiseness index of every hash in the linux kernel git repo.

Let's have a poke around, shall we?

$ wc -l hashes
1369380 hashes
$ wc -l hashes_wise
1369381 hashes_wise
Apparently my antics have spontaneously generated an additional hash.

Initially, I couldn't figure out why there was an additional line and tried regex-ing for malformed lines.

In reality, this is because wc counts newlines, and the hashes file does not have the implicit newline (0x0a) most unix programs leave at the end of files:

$ printf "\n" | xxd
00000000: 0a
$ # last byte of the file is...
$ tail -c 1 hashes | xxd
00000000: 32
$ tail -c 1 hashes_wise | xxd
00000000: 0a

This is what the file looks like, by the way:

$ head hashes_wise -n 5
51.63 5f33ebd2018ced2600b3fad2f8e2052498eb4072
16.52 327579671a9becc43369f7c0637febe7e03d4003
153.33 4bb01220911d2dd6846573850937e89691f92e20
21.67 bef3012b2f6814af2b5c5abd6b5f85921dbb8a01
41.45 c37495fe3531647db4ae5787a80699ae1438d7cf

This command finds the average wiseness:

$ cat hashes_wise | cut -f1 -d ' ' | paste -sd+ | sed 's/^/(/' | sed 's|$|)/1369381|' | bc
88

And finally! What are the most wise commits? Since the wiseness score is the first part of a line, we can simply sort numerically.

$ cat hashes_wise | sort -nr | head
15277.00 7c4e39f9d2af4abaf82ca0e315d1fd340456620f
14719.50 11933cf1d91d57da9e5c53822a540bbdc2656c16
13193.00 7e0e63d09516e96994c879f07c5a3c3269d7015e
13062.00 29d47d00e0ae61668ee0c5d90bef2893c8abbafa
11257.00 74472233233f577eaa0ca6d6e17d9017b6e53150
11231.00 61ab9efddf51cbc0d57356a4d650785cf5721fbe
10943.00 6da3700c98cdc8360f55c5510915efae1d66deea
10464.00 ebad8e731c1c06adf04621d6fd327b860c0861b5
10326.00 19391a2ca98baa7b80279306cdf7dd43f81fa595
9919.00 e9bb18c7b95d4dcf8c7f0e14f920ca6f03109e75

Here it is! The wisest commit:

$ git show 7c4e39f9d2af4abaf82ca0e315d1fd340456620f
commit 7c4e39f9d2af4abaf82ca0e315d1fd340456620f
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Nov 15 11:29:21 2024 +0000

    btrfs: ref-verify: fix use-after-free after invalid ref action

    At btrfs_ref_tree_mod() after we successfully inserted the new ref entry
    (local variable 'ref') into the respective block entry's rbtree (local
    variable 'be'), if we find an unexpected action of BTRFS_DROP_DELAYED_REF,
    we error out and free the ref entry without removing it from the block
    entry's rbtree. Then in the error path of btrfs_ref_tree_mod() we call
    btrfs_free_ref_cache(), which iterates over all block entries and then
    calls free_block_entry() for each one, and there we will trigger a
    use-after-free when we are called against the block entry to which we
    added the freed ref entry to its rbtree, since the rbtree still points
    to the block entry, as we didn't remove it from the rbtree before freeing
    it in the error path at btrfs_ref_tree_mod(). Fix this by removing the
    new ref entry from the rbtree before freeing it.

    Syzbot report this with the following stack traces:

       BTRFS error (device loop0 state EA):   Ref action 2, root 5, ref_root 0, parent 8564736, owner 0, offset 0, num_refs 18446744073709551615
          __btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
          update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
          btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
          btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
          btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
          btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4314
          btrfs_insert_empty_item fs/btrfs/ctree.h:669 [inline]
          btrfs_insert_orphan_item+0x1f1/0x320 fs/btrfs/orphan.c:23
...skipping...
        entry_SYSCALL_64_after_hwframe+0x77/0x7f

       Memory state around the buggy address:
        ffff888042d1ae00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
        ffff888042d1ae80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
       >ffff888042d1af00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                                               ^
        ffff888042d1af80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
        ffff888042d1b000: 00 00 00 00 00 fc fc 00 00 00 00 00 fc fc 00 00

    Reported-by: syzbot+7325f164162e200000c1@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/linux-btrfs/673723eb.050a0220.1324f8.00a8.GAE@google.com/T/#u
    Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool")
    CC: stable@vger.kernel.org # 4.19+
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>

diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c
index 9522a8b79d22..2928abf7eb82 100644
--- a/fs/btrfs/ref-verify.c
+++ b/fs/btrfs/ref-verify.c
@@ -857,6 +857,7 @@ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info,
 "dropping a ref for a root that doesn't have a ref on the block");
                        dump_block_entry(fs_info, be);
                        dump_ref_action(fs_info, ra);
+                       rb_erase(&ref->node, &be->refs);
                        kfree(ref);
                        kfree(ra);
                        goto out_unlock;

Congratulations to Filipe! A fix to a use-after-free is perfectly wise.

It is a little disappointing to find that wiseness is mostly about putting a stack trace in your commit message, but we knew the measure was arbitrary going into this. In fact, most of the other wise commits have the output of various programs pasted in.

Except the second-most wise commit, which is actually a rather in-depth explanation of some details relating to mounting devices and clearly had a lot of care put into it.

Perhaps having a lot to say about little loosely correlates with wiseness after all...

Reference