Wisest Commit in the Kernel
Let's break from the drudgery and pursue something whimsical. We are not machines. I have a question for you, and by the end of this we shall have an answer: What is the wisest commit in the Linux kernel's git repo?
It depends on what is meant by 'wise'. (Un)fortunately, computer programmers are not known for their robust philosophy, so we'll go with 'a commit is wise if it changes few lines of code, but has many lines of commit message to explain why'. In other words, the commit with the greatest ratio of explanation required to amount changed.
We can start with downloading the git histories.
$ # Download from github to abuse microsoft's bandwidth
$ git clone https://github.com/torvalds/linux
$ cd linux
We'll only look at the master branch.
$ # Log all hashes to a file
$ git log --format=format:%H > hashes
$ head -n 5 hashes
5f33ebd2018ced2600b3fad2f8e2052498eb4072
327579671a9becc43369f7c0637febe7e03d4003
4bb01220911d2dd6846573850937e89691f92e20
bef3012b2f6814af2b5c5abd6b5f85921dbb8a01
c37495fe3531647db4ae5787a80699ae1438d7cf
$ wc -l hashes
1369380 hashes
We must get the commit message and stats of a commit to calculate how wise it is.
Let's look at a random hash: e4d2878369d590bf8455e3678a644e503172eafa
$ hash=e4d2878369d590bf8455e3678a644e503172eafa
$ git log --format=%B -n 1 $hash
rxrpc: Fix irq-disabled in local_bh_enable()
The rxrpc_assess_MTU_size() function calls down into the IP layer to find
out the MTU size for a route.
-- snip --
$ git show $hash --shortstat --format=""
3 files changed, 4 insertions(+), 4 deletions(-)
With these two commands, we can have all of the information we need to find the wisest commit.
But first, we should turn this boring, drab text into glamorous, colorful numbers. We need the length of the commit message, and the total number of lines changed.
$ len=$(git log --format=%B -n 1 $hash | wc -c)
$ echo $len
2135
$ # ...and the numbers from the stats
$ # for our purposes, we add insertions and deletions together
$ num_lines=$(git show $hash --shortstat --format="" | cut -f2,3 -d, | cut -f2,4 -d ' ' | sed 's/ /+/' | bc)
$ echo $num_lines
8
$ # get the ratio
$ wiseness=$(echo "scale=2; $len / $num_lines" | bc)
$ echo $wiseness
266.87
Now, we need only a script to generate wiseness for every hash.
$ for hash in $(cat hashes); do
> len=$(git log --format=%B -n 1 $hash | wc -c);
> num_lines=$(git show $hash --shortstat --format="" | cut -f2,3 -d, | cut -f2,4 -d ' ' | sed 's/ /+/' | bc);
> wiseness=$(echo "scale=2; $len / $num_lines" | bc);
> echo "$wiseness $hash";
> done > hashes_wise
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
Runtime error (func=(main), adr=13): Divide by zero
(standard_in) 2: syntax error
^C^C^C^C^C^C^C
There are at least two errors:
- Some commits apparently change 0 lines.
- Output in some cases is different than usual.
I could run this script again with debug information or stderr appended to find out which hashes are problematic. But it takes a lot of time and battery for my laptop to run, and luckily I don't have to. Bash will continue to print past errors, which means the hashes I'm looking for will have lines where wiseness is an empty string.
$ grep '^ ' hashes_wise
a4bd43d6f7b72b90e064eb8c22c720126cfc1525
ddfd1f30b5badcd06c199fa519bec5f0f54892e0
cb6749b961b684701a5d4bc905c4923407017c88
ecdb0b32e518a69a724d3cb7a6e5a4d2be2db8a1
d2cf8ccf5a1871058a083c00efe37d7eb91bf6bd
51bd73d92f89173e3394276f4b840eed361f11b5
The first five will have zero lines changed. The last will be weird.
$ git show a4bd43d6f7b72b90e064eb8c22c720126cfc1525 --shortstat
commit a4bd43d6f7b72b90e064eb8c22c720126cfc1525
Author: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Fri Apr 25 15:13:39 2025 +0800
scripts/lib/kdoc: change mode to 0644
The script library here contain just classes. Remove execution
permission.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <be0b0a5bde82fa09027a5083f8202f150581eb4e.1745564565.git.mchehab+huawei@kernel.org>
3 files changed, 0 insertions(+), 0 deletions(-)
It looks like this commit changes file permissions, which is why no lines are changed.
The next four are file renames. And the last?
$ git show 51bd73d92f89173e3394276f4b840eed361f11b5 --shortstat
commit 51bd73d92f89173e3394276f4b840eed361f11b5
Merge: d47c670061b5 13368df520f1
Author: Christian Brauner <brauner@kernel.org>
Date: Thu Feb 27 11:33:06 2025 +0100
Merge branch 'vfs-6.15.shared.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull in the VFS changes shared with xfs for making RWF_DONTCACHE work
with buffered writes for xfs.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Do you see that? No stats summary.
$ git show 51bd73d92f89173e3394276f4b840eed361f11b5 --shortstat --format=""
$ # crickets...
Let's not ponder too deeply how a git commit does not change any files, or why this is the only one that does this.
(EDIT: There are multiple merge commits that do this in the kernel repo. It seems to have something to do specifically with merging tags.)
The fixed script, now with fewer divisions by zero:
$ for hash in $(cat hashes); do
> len=$(git log --format=%B -n 1 $hash | wc -c);
> num_lines=$(git show $hash --shortstat --format="" | cut -f2,3 -d, | cut -f2,4 -d ' ' | sed 's/ /+/' | bc);
> if [ ! -n "$num_lines" ] || [ "$num_lines" -eq "0" ]; then
> num_lines=1;
> fi;
> wiseness=$(echo "scale=2; $len / $num_lines" | bc);
> echo "$wiseness $hash";
> done > hashes_wise
...After about two days of processing (and nearly losing all progress in an unrelated system crash), we have a complete wiseness index of every hash in the linux kernel git repo.
Let's have a poke around, shall we?
$ wc -l hashes
1369380 hashes
$ wc -l hashes_wise
1369381 hashes_wise
Initially, I couldn't figure out why there was an additional line and tried regex-ing for malformed lines.
In reality, this is because wc
counts newlines, and the hashes
file does not have the implicit newline (0x0a) most unix programs leave at the end of files:
$ printf "\n" | xxd
00000000: 0a
$ # last byte of the file is...
$ tail -c 1 hashes | xxd
00000000: 32
$ tail -c 1 hashes_wise | xxd
00000000: 0a
This is what the file looks like, by the way:
$ head hashes_wise -n 5
51.63 5f33ebd2018ced2600b3fad2f8e2052498eb4072
16.52 327579671a9becc43369f7c0637febe7e03d4003
153.33 4bb01220911d2dd6846573850937e89691f92e20
21.67 bef3012b2f6814af2b5c5abd6b5f85921dbb8a01
41.45 c37495fe3531647db4ae5787a80699ae1438d7cf
This command finds the average wiseness:
$ cat hashes_wise | cut -f1 -d ' ' | paste -sd+ | sed 's/^/(/' | sed 's|$|)/1369381|' | bc
88
And finally! What are the most wise commits? Since the wiseness score is the first part of a line, we can simply sort numerically.
$ cat hashes_wise | sort -nr | head
15277.00 7c4e39f9d2af4abaf82ca0e315d1fd340456620f
14719.50 11933cf1d91d57da9e5c53822a540bbdc2656c16
13193.00 7e0e63d09516e96994c879f07c5a3c3269d7015e
13062.00 29d47d00e0ae61668ee0c5d90bef2893c8abbafa
11257.00 74472233233f577eaa0ca6d6e17d9017b6e53150
11231.00 61ab9efddf51cbc0d57356a4d650785cf5721fbe
10943.00 6da3700c98cdc8360f55c5510915efae1d66deea
10464.00 ebad8e731c1c06adf04621d6fd327b860c0861b5
10326.00 19391a2ca98baa7b80279306cdf7dd43f81fa595
9919.00 e9bb18c7b95d4dcf8c7f0e14f920ca6f03109e75
Here it is! The wisest commit:
$ git show 7c4e39f9d2af4abaf82ca0e315d1fd340456620f
commit 7c4e39f9d2af4abaf82ca0e315d1fd340456620f
Author: Filipe Manana <fdmanana@suse.com>
Date: Fri Nov 15 11:29:21 2024 +0000
btrfs: ref-verify: fix use-after-free after invalid ref action
At btrfs_ref_tree_mod() after we successfully inserted the new ref entry
(local variable 'ref') into the respective block entry's rbtree (local
variable 'be'), if we find an unexpected action of BTRFS_DROP_DELAYED_REF,
we error out and free the ref entry without removing it from the block
entry's rbtree. Then in the error path of btrfs_ref_tree_mod() we call
btrfs_free_ref_cache(), which iterates over all block entries and then
calls free_block_entry() for each one, and there we will trigger a
use-after-free when we are called against the block entry to which we
added the freed ref entry to its rbtree, since the rbtree still points
to the block entry, as we didn't remove it from the rbtree before freeing
it in the error path at btrfs_ref_tree_mod(). Fix this by removing the
new ref entry from the rbtree before freeing it.
Syzbot report this with the following stack traces:
BTRFS error (device loop0 state EA): Ref action 2, root 5, ref_root 0, parent 8564736, owner 0, offset 0, num_refs 18446744073709551615
__btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4314
btrfs_insert_empty_item fs/btrfs/ctree.h:669 [inline]
btrfs_insert_orphan_item+0x1f1/0x320 fs/btrfs/orphan.c:23
...skipping...
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff888042d1ae00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888042d1ae80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff888042d1af00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff888042d1af80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
ffff888042d1b000: 00 00 00 00 00 fc fc 00 00 00 00 00 fc fc 00 00
Reported-by: syzbot+7325f164162e200000c1@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/673723eb.050a0220.1324f8.00a8.GAE@google.com/T/#u
Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c
index 9522a8b79d22..2928abf7eb82 100644
--- a/fs/btrfs/ref-verify.c
+++ b/fs/btrfs/ref-verify.c
@@ -857,6 +857,7 @@ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info,
"dropping a ref for a root that doesn't have a ref on the block");
dump_block_entry(fs_info, be);
dump_ref_action(fs_info, ra);
+ rb_erase(&ref->node, &be->refs);
kfree(ref);
kfree(ra);
goto out_unlock;
Congratulations to Filipe! A fix to a use-after-free is perfectly wise.
It is a little disappointing to find that wiseness is mostly about putting a stack trace in your commit message, but we knew the measure was arbitrary going into this. In fact, most of the other wise commits have the output of various programs pasted in.
Except the second-most wise commit, which is actually a rather in-depth explanation of some details relating to mounting devices and clearly had a lot of care put into it.
Perhaps having a lot to say about little loosely correlates with wiseness after all...
Reference
- https://stackoverflow.com/questions/3357280/print-commit-message-of-a-given-commit-in-git#3357357
- https://stackoverflow.com/questions/53563344/how-to-make-git-show-stat-listing-only-the-number-of-files-changed-insertions#53563472
- https://stackoverflow.com/questions/30398014/divide-two-variables-in-bash#30398256