How a Go project lives and dies

Leandro Santiago

April 29th 2025

And how I am trying to revive it

Disclaimer

  • This talk is not really about Go.

All starts a long time ago

The year is…

computer.png

2019

An idea

promise.png

Lifetime

history.png

Current status

---
Language         files   blank  comment    code
---
Go                 387   10282     3234   39579
Vuejs Component     32     669      221    6890
PO File             11     734      838    1770
JavaScript          14     204      109    1141
SVG                 13       1       42     964
Markdown            20     511       32     962
Text                13     177        0     873
YAML                 3      24        6     323
Python               5     129       49     254
XML                  1       6        0     195
CSS                  1      30        0     141
make                 1      64        8     125
Bourne Shell         9      42       21     110
Properties           5      26       25      30
Dockerfile           1      11        1      26
HTML                 1       1        2      17
---
SUM:               525   12911     4588  102061
---

Developer Traction

$ git shortlog -s -n
1012  Leandro Santiago
 301  Sam Tuke
 266  Marcel Edmund Franke
 173  suela
  41  suelaP
  14  Weblate Admin
  11  nico
   5  Kristina Qejvanaj
   5  Weblate
   2  Lightmeter Weblate Bot
   2  Slavomir
   1  Dirk Weise
   1  Ergis
   1  N J
   1  Suela
   1  Suela Palushi

User traction

docker-count.png

Project activity

issues.png

Me in 2019

  • No idea about e-mail hosting
  • previous experience mostly on embedded and C++
  • no experience on large Go codebases

Why Go

  • R is nice for experimenting and prototyping
  • Large prototype docker image (~300MB?)
  • Go seemed a better option for production though

Why Go

  • Static linking + CGO + musl
  • Simple deployment
  • Small docker image
docker.png

Chosen tech stack

  • Go for backend
  • Vue and Javascript for front-end
  • SQLite as data storage
  • Definitely not cloud friendly

Why Postfix?

  • largest market share on e-mail self hosting
  • previous experience in the team
  • we need to start from somewhere

All about logs

Jan 20 19:48:05 srv0 postfix/smtpd[2467304]: connect from unknown[1.2.3.4]
Jan 20 19:48:04 srv0 postfix/smtpd[2467304]: B9996EABB6: client=unknown[1.2.3.4], sasl_method=PLAIN, sasl_username=sender@internal.org
Jan 20 19:48:05 srv0 postfix/cleanup[2467311]: B9996EABB6: message-id=<h-74f3afb0208ad285a794d760c8feb0eee631@internal.org>
Jan 20 19:48:05 srv0 opendkim[12864]: B9996EABB6: DKIM-Signature field added (s=key, d=internal.org)
Jan 20 19:48:05 srv0 postfix/qmgr[1937864]: B9996EABB6: from=<sender@internal.org>, size=2239, nrcpt=2 (queue active)
Jan 20 19:48:05 srv0 postfix/smtpd[2467304]: disconnect from unknown[1.2.3.4] ehlo=1 auth=1 mail=1 rcpt=2 data=1 quit=1 commands=7
Jan 20 19:48:07 srv0 postfix/smtp[2467312]: B9996EABB6: to=<recipient1@external.org>, relay=example-com.mail.protection.outlook.com[12.11.12.13]:25, delay=2.7, delays=1.3/0.06/0.33/1, dsn=2.0.0, status=sent (250 2.0.0 OK  1642704487 v125si7680590wme.216 - smtp)
Jan 20 19:48:07 srv0 postfix/smtp[2467312]: B9996EABB6: to=<recipient2@external.org>, relay=example-com.mail.protection.outlook.com[13.11.12.13]:25, delay=2.7, delays=1.3/0.06/0.33/1, dsn=2.0.0, status=sent (250 2.0.0 OK  1642704487 v125si7680590wme.216 - smtp)
Jan 20 19:48:07 srv0 postfix/qmgr[1937864]: B9996EABB6: removed

Challenges

  • We must track the state and properties of each message
  • Large diversity of ways to run Postfix

It’s all log parsing, right?

Postfix log tracking is painful

paper.png

Why not simply use grok patterns for parsing?

  • Performance
  • We use Ragel to generate parsing code at build time
  • Downsides: complex build system and maintenance cost of Ragel code
ragel.png

Why not simply use ElasticSearch or Logstash?

  • Why write a crappy log management system from scratch?
  • Why use a relational database instead of a TSDB?
elasticsearch.png
elastic-search-forbes.png
forbes-lightmeter.png
hacker-news-trademark.png

My traumas

  • complicated deployments on embedded systems
  • complicated database migrations
  • slow process startups (dozens of seconds)
  • multi-gigabyte docker images causing systems running out of disk space.
  • Complex update due to dynamically linked code

Basic Usage

./lightmeter -workspace \
  /var/lib/lightmeter/workspace \
  -watch_dir /var/log/

Beliefs/Approach

small system

Fast startup, small download.

monolith first

First rule of distributed systems: Do not distribute.

simple deployment and setup

  • Single statically linked binary, distribution independent
  • Docker image published for convenience
  • (although some users were independently deploying it with Nix)

reusability as emergent property

  • When design is simple enough, reusability emerges naturally

computers are fast, first try to scale vertically

Optimize for single node: memory is much faster than network

network is liability

Prefer local IPC (including go channels) and Unix domain sockets for communication

crash first

$ rg errorutil.MustSucceed -g '!*_test.go' | wc -l
86
$ rg '\brecover\(\)' -g '*.go'
deliverydb/recover_release.go
16:     if r := recover(); r != nil {

deliverydb/recover_dev.go
16:     if r := recover(); r != nil {

else-less code

$ rg '\selse\s' -g '*.go' 
recommendation/cmd/gen.go
86: } else if resp.StatusCode >= 400 {

tracking/tracking_test.go
248: // as there are use cases where a message is trriggered via something else than authenticated SMTP (local scripts, for instance)

dashboard/dashboard.go
142: case d.direction when @Outbound then d.sender_local_part     else d.recipient_local_part     end as local_part,
143: case d.direction when @Outbound then d.sender_domain_part_id else d.recipient_domain_part_id end as domain_part_id,

tools/poutil/poutil.go
56: } else {

tools/go2po/main.go
167: } else if ident.Name == FuncI18n {
205: } else {
237: } else {
240: } else {

notification/email/email_test.go
99:  Recipients:   "recipient@example2.com, Someone else <recipient2@some.other.address.com>, a.third.one@lala.com",
130: Recipients:   "recipient@example2.com, Someone else <recipient2@some.other.address.com>, a.third.one@lala.com",
163: So(msg.Header.Get("To"), ShouldEqual, "recipient@example2.com, Someone else <recipient2@some.other.address.com>, a.third.one@lala.com")

Why SQLite?

Being SQLite specific allows using it in its full power instead of the “least common denominator” of SQL.

Why SQLite?

  • No support for stored procedures (and why this is a good thing)
  • Stable data format

Why no ORM (personal opinion)

  • ORM makes writting simple queries simple, but won’t help with complex queries.
  • Especially when using SQLite, having less abstraction layers give you more power.
  • Manually preparing queries.

Where is the database?

“microservices” inside a monolith?

Where is the database?

  • Each component inside the application manages its own goroutines.
  • Each of them owns and encapsulates its own data, having its own SQLite database.
  • Components communicate with each other via messaging (using Go channels)
$ ls /var/lib/lightmeter/workspace/
auth.db
auth.db-shm
auth.db-wal
connections.db
connections.db-shm
connections.db-wal
http_sessions
insights.db
insights.db-shm
insights.db-wal
intel-collector.db
intel-collector.db-shm
intel-collector.db-wal
logs.db
logs.db-shm
logs.db-wal
logtracker.db
logtracker.db-shm
logtracker.db-wal
master.db
master.db-shm
master.db-wal
rawlogs.db
rawlogs.db-shm
rawlogs.db-wal

SQLite

  • SQLite is not a database server (it’s an alternative to fopen(3)).
  • No need to handle database users and permissions.
  • No need to worry about connection timeouts: crash if anything goes wrong.

SQLite

  • It runs in the same process and the rest of your application.
  • You need to handle the changes in the database
  • It can write very few transactions per second
  • Not designed for horizontal scaling

How to make SQLite fast

TL;DR; have a single writer and an an arbitrary number of readers.

Solution

  • Separate connections in two discinct types:
    • readers
    • writers

Writers

  • Each database has only one writer goroutine that is responsible for receiving “actions”
  • done via channel, and writes are done inside transactions

The writer connection or transactions or are never exposed, to prevent misuse.

Not really a new idea.

Apple introduced similar idea the Grand Central Dispatch in 2007.

package postfix

type Record struct {
  Time     time.Time
  Header   parser.Header
  Location RecordLocation
  Payload  parser.Payload
  Line     string
  Sum      Sum
}

type Publisher interface {
  Publish(Record)
}
// Publisher
type publisher struct {
  // A queue of actions, which are simply functions
  actions chan<- dbrunner.Action
}

func (pub *publisher) Publish(r postfix.Record) {
  // Queue new action
  pub.actions <- func(_ *sql.Tx, stmts dbconn.TxPreparedStmts) error {
    return stmts.Get(insertLogLineKey).Exec(r.Time.Unix(), r.Sum, r.Line)
  }
}

Writer implementation here.

Known Issues

  • Lot of pressure on the GC on hot loops
  • Fire and forget async, with no synchronization point
  • If any errors happen during writing, the application crashes

Readers

  • Connection Pool
  • Live in a different “universe” as Writers

Speed

  • Up to hundreds of thousands (or more) writes per second.
  • No limit on readers.
  • Your hardware is the limit.

No stored procedures

You can define custom functions and they execute in the same process as the rest of your application.

Definition

Registration

Hairy SQL

No stored procedures

Uncle Ben’s quote goes here.

Deterministic functions as mathematically pure.

Otherwise bad things happen, especially if you use to create indexes on expressions.

sqlite-pure.png

CGO can have big performance overhead on calling custom functions in SQLite.

Maybe C?

If it’s too much, you should probably write your functions in C/C++/Rust/Zig

Backups, which backups?

High level architecture

arch.png

Denial and death

history.png

Denial and death

...
release/1.9.0 Mon Oct  4 15:49:52 2021 +0000
release/1.9.1 Mon Oct 11 12:54:13 2021 +0000
release/2.0.0-RC1 Dec 11 19:31:31 2021 +0000
release/2.0.0-RC2 Dec 13 19:44:15 2021 +0000
release/2.0.0-RC3 Dec 15 22:10:12 2021 +0000
release/2.0.0-RC4 Jan 21 23:01:12 2022 +0000
release/2.0.0-RC5 Feb 24 19:12:05 2022 +0000
release/2.0.0-RC6 Mar  8 11:14:42 2022 +0000
release/2.0.0-RC7 Mar 16 13:39:51 2022 +0000 --
release/2.0.0-RC8 Feb 28 09:36:09 2023 +0000 --
release/2.0.0-RC9 Aug 22 09:16:35 2023 +0000 --

There has never been a 2.0 release

“Upsell”?

Control Center ultimately became a client for a subscription service we were trying to sell

Users were not happy about it

Users are not customers though

ngi.png
50keur.png

New directions and a new reality

yc-lightmeter.png
lightmeter-today.png

Code decay on the backend code

  • Updating dependencies in the Go backend was almost flawless. A few API breakages. Some unit tests fail.
  • Backend still uses Go 1.18 only features. (no go:embed, no generics, etc.).

Code decay on the front-end code

  • Front-end is completely broken: Vue 2 is no longer supported.
  • Build system no longer works.
  • Acceptance tests no longer run.
  • To summarize: all that needs NPM/Javascript is broken.

Current status

  • I’ve forked the project.
  • I’ve managed to easily get the backend to build and unit tests to run after updating the 3rd party dependencies.
  • The front-end no longer works 100% and it took me multiple days to get it to build again, after migrating the entire build system to Vite and Vue from 2 to 3.

Current status

  • Many dependencies no longer work with Vue 2 and require dependent code to be completely rewritten.
  • User Acceptance Tests are still broken.
  • A handful of unit tests still fail.

Test coverage

  • ~80% on backend code via unit testing
  • Zero unit tests on front-end
  • Basic user acceptance tests (using Taiko+Gauge)

Next steps

  • Finish front-end migration from Vue 2 to Vue 3.
  • Migrate from Javascript to Typescript*.
  • Remove the features that caused many users to move away.
  • Completely rebrand the project and launch it to the world.
  • Implement the long planned UI redesign.
  • Finish multi-node support

Help!

  • I no longer work with e-mail and don’t have much time to maintain the project.
  • If you are interested in e-mail self hosting, any help is welcomed.
  • I want to make it a community driven project.

Why 2025?

  • Most users use GMail and Outlook.

  • There was big demand demand for Lightmeter ControlCenter in 2019.

Why 2025?

2025 has shown us that the world can no longer depend on American companies.

Why 2025?

  • I believe the need to host e-email indepedently will grow again.

  • We (Lightmeter) failed to understand the users’ needs and build a sustainable product to address them.

  • I believe I am in a better position to understand them now.

Thank you

https://gitlab.com/leandrosansilva/controlcenter

  • https://www.zdnet.com/article/lightmeter-will-soon-help-you-tune-up-your-email-server/
  • https://www.heise.de/news/Lightmeter-Neues-Monitoring-Werkzeug-fuer-den-E-Mail-Server-4647424.html
  • https://www.forbes.com/sites/davidjeans/2021/03/01/elastic-war-on-amazon-web-services/

  • https://github.com/tobinjt/ASO

  • https://www.johntobin.ie/publications/sgai-2008.pdf

  • https://sqlite.org/deterministic.html

Any questions/comments/complaints/suggestions?

Leandro Santiago

social@setefaces.org