I stood before a well and wondered how deep it would go.
All I could see was blackness.
Well, I didn't actually leave my cellar and instead stood before Forgejo's git history.
But still, with the 24 thousand commits Forgejo's history felt just as opaque.
Forgejo is a code forge like GitHub or GitLab.
Forgejo is a place to (collaboratively) develop software and is itself fully Open-Source, so of course I love it!
I've already explained at length why we benefit greatly from Open-Source.
What I'm interested in is Forgejo's publish-subscribe pattern.
It allows packages to publish messages grouped by topics.
Other packages subscribe to those topics and get called when a message for a topic of interest arrives.
forgejo.org/services/notify.Notify is the broker, which facilitating all this.
I've created a visualization of this pub-sub pattern.
Packages are Forgejo-orange and topics Codeberg-blue.
Every arrow goes in the direction messages flow; from sender to receiver.
Please use a device with a large screen like a laptop; it doesn't work on a smartphone.
The list on the right shows all commits in Forgejo's history that change something about the pub-sub pattern. Select a commit see how Forgejo looked like back then. Once you've selected a commit you can use the arrow keys, too.
How did I create this visualization? Firstly, let's take a look at how Forgejo's pub-sub pattern works. Every participating package defines a notifier struct and uses the broker like this:
// Define the notifier.type actionsNotifier struct { notify_service.NullNotifier}
// Ensure that this struct fulfills the Notifier interface.var _ notify_service.Notifier = &actionsNotifier{}
// Declare functions for all topics the package is interested in.func (n *actionsNotifier) NewIssue(/* --snip-- */) {// --snip--
// Tell the broker there's a new notifier to be notified.notify_service.RegisterNotifier(&actionsNotifier{})
// send a message to some topicnotify_service.PullReviewDismiss(ctx, doer, review, comment)So what I had to do was find these places in Forgejo's source code and compile that data into a usable format.
I thought about doing this with grep and simple string matching but figured something more robust wouldn't hurt.
Therefore, I used Go's abstract syntax tree (AST) directly.
The AST is an intermediate state in the Go compiler; the compiler generates the AST from the source code and then the machine instructions from the AST.
To my delight, there is the go/ast package to walk through the AST and go/types for type checking.
There even is the convenient golang.org/x/tools/go/packages package to tie it all together.
Take a look at my parser main.go if you're interested in some details.
This script spits out a single JSON.
For the pretty visualization I used D3.js's force simulation.
It simulates repelling forces between all nodes and attracting forces between connected ones.
Take a look at the viewer source code for more on that.
Now that I had a way to extract this data for one state of Forgejo's code base, a bigger idea came to me. Why not do this for every commit in Forgejo's history? How difficult could it be? As you can see above, that did work out but not without hiccups. So, I invite you to come on a little journey down Forgejo's history: Let's throw a light down that well!
I started in front of the well:
A few days ago I created a Bash script that runs the parser script, checks out the next-older commit and repeats the process.
This script ran for days creating thousands of JSON files.
I created another Go script to clean up this mess e.g. by deleting duplicates.
At 346f87d I let my script run and fall into the darkness below.
I didn't know what it would uncover and if it would encounter some hurdles.
Let's say I started at ground level and every commit I go down is another meter below the surface.
This was my first hiccup; my Go script crashed. Apparently this commit removed a type of token that my script didn't expect (a SelectorExpr with a Selector that doesn't map to any object). So, all commits older than this one crash my script. I added a nil check and dropped it back in the well.
pkg.TypesInfo.Uses[f.Sel] != nilAnd I encountered another hiccup; in March Forgejo changed the Go module path from code.gitea.io/gitea to forgejo.org.
After all, Forgejo is a fork of Gitea.
I adjusted my Go script and made the viewer remove the forgejo.org and code.gitea.io/gitea prefixes.
And whush, I just dropped through to before the Forgejo fork.
Here the Gitea developers changed the pub-sub broker package from code.gitea.io/gitea/modules/notification/base.Notifier to code.gitea.io/gitea/services/notify.Notifier.
I adjust the script and continued.
My script crashed just below this commit, which upgrades from go1.17 to go1.18.
Apparently golang.org/x/tools/go/packages requires go mod tidy to run on a go1.17 project.
I added go mod tidy to my script and let it run for the night.
When I woke up the next morning I noticed that my Nextcloud instance was down.
After quickly logging into my server I realized it entirely ran out of disk space, whups.
The ~/go directory grew so large with Forgejo's many old dependencies it completely crippled my server.
Running my script in a Docker container didn't help here.
Now my script auto-deletes the ~/go dir when it gets too large.
Here a problem occurred when I re-ran my script with older versions of Go (more info on that below).
Somehow go1.14 failed to go1.14 mod tidy below this commit.
Though, my up-to-date go1.25.3 did work so now my script uses the new Go version whenever the old one fails.
It's a hack, but it works.
My script crashed again, this time for a most peculiar reason; go mod tidy failed.
I had been using the current go1.25.3 tools.
Now that I got so far down, so far back to when Forgejo used go1.12 that go1.25.3 had no idea what to make of the old project.
Therefore, I had to adjust my script to look for the version of Go Forgejo used at that time, install and use that instead.
I find this so very fascinating. Say you're writing a script that parses some questionnaires filled out by a lot of people. Then every person's questionnaire will have the same structure. Apart from a few outliers, who spilled their coffee over the questionnaire, you implement one parser for the entire data-set and all is well. But here, with history, every assumption you make based on the newest version doesn't have to hold for all older versions. Like in this case where I expected the source code to have changed from commit to commit. But I didn't expect the Go tooling to have changed, too.
Btw, I forgot what commit this problem occurred on but did remember that it happened with the switch from go1.12 to go1.13.
What commit is the first that uses go1.13?
git bisect is a great tool to answer questions like these.
You simply give it one commit that is definitely old (i.e., uses go1.12 or older) and one commit that is definitely new (i.e., uses go1.13 or newer).
Then git bisect spits you out at a commit somewhere in the middle and asks you, "Is this commit old or new?"
You tell it git bisect old or git bisect new and it spits you out in a new place.
You perform a binary search that lets you come through thousands of commits in a breeze.
If you're working on a code base you don't know the authors of, this is a great tool for figuring out what explanations the authors left in their commit messages.
Especially when you have a question git blame can't simply answer, git bisect is a great tool.
git bisect startgit bisect old d77176912bccf1dc0ad93366df55f00fee23b498git bisect new forgejocat go.mod | grep -P '^go ' | cut -d ' ' -f2# 1.20git bisect new# --snip--This commit moves the code base from code.gitea.io/git to code.gitea.io/gitea.
Uff, okay, I added another rename option.
And another rename.
This commit moves the broker struct from modules/notification/base/base.go to modules/notification/base/notifier.go.
There are a lot of problems down here at 16km below the surface.
Now there's a problem with the xorm dependency.
This commit updates to xorm v0.7.4 but the old v0.7.3-0.20190620151208-f1b4f8368459 fails with go mod tidy.
I don't know why but my fix is as hacky as it is simple:
sed -i 's#github.com/go-xorm/xorm v0.7.3-0.20190620151208-f1b4f8368459#github.com/go-xorm/xorm v0.7.3#' go.modBtw, this is the old xorm repo on GitHub before it got moved to a Gitea server.
I'm glad the old repo is only archived and not deleted.
Otherwise, I'd have to do more work here.
Come to think of it, there are a lot of old dependency versions we rarely care about.
If they are lost, we won't be able to build old versions of our software.
That might be real trouble if we want to reproduce some problem with outdated software.
We are so far below the surface we don't even have a go.mod file anymore.
Gitea used Gopkg.lock before this commit.
Now my script creates its own go.mod if required.
if [ ! -f go.mod ]; then go mod initfiHere, my script threw its final error: Empty output.
There's no more pub-sub pattern to look for below this.
This commit is where the pub-sub pattern was created.
This commit is the bottom of the well our light just hit and thus the end of our pub-sub journey.
Alas, we could still check how much more rock there is below the well.
And this is it, the very first commit in Forgejo's history. We are so far back, the project isn't named Forgejo or even Gitea anylonger; down here the project goes by the name Gogs. Now there really isn't any deeper to go.
Let's climb back out of the well and recollect what we've encountered.
Firstly, there used to be a topic no package ever sent a message to, NotifyPullRevieweDismiss.
It was removed in June and we can see that in the visualization.
Then there's my own contribution to Forgejo: the ActionRunNowDone topic.
The visualization shows how I implemented the topic without any receiving packages at first.
In a second commit I attached the services/mailer and in a third the services/webhook package.
Oh and I found some very strange behaviour from December 2022 to September 2023:
c53f802 removed the Notify prefix from all topics.
On the very next day, however, a89b399 reverted that.
This goes back and forth a lot, which explains the jarring visualization around this time.
In the end 540bf9f removed the Notify prefix for the last time.
Maybe this is a broken bit in the Forgejo git history?
I colored these commits red in the visualization.
Finally, as of 9524b8c there's still some dead code:
The Run topic is never sent a message to and no one listens to the DeleteIssue topic.
Maybe those topics should be removed.
One could automate a check like this, maybe in CI.
This really felt like diving into the abyss. I never knew what lies below and when my script would fail for the next time. It gave me chills, working code every day that has such history. There is so much knowledge we stand upon. And there are few general things I learned:
I created this visualization for my Forgejo Actions Notification Development article and Forgejo Architecture Deep Dive talk.
I used these commands to generate the "depth" and date for each commit:
TZ=UTC0 git show --no-patch --date=local '--format=%ci' 16dbc0efd350cdc15760c2e40346c1e9fbb0bd01git rev-list --count 16dbc0efd350cdc15760c2e40346c1e9fbb0bd01..346f87d7a26d7c3678867961c74487e5b759cbf0