⇐ Blog

Rewriting our RabbitMQ client: Actix to pure Tokio

Over the last few weeks I finished and shipped a full rewrite of our RabbitMQ pub/sub client at work — replacing the old Actix actor-based implementation with a pure Tokio one. It's been running in production on our pricing platform since, and the results landed better than I'd hoped. The part I like most: the client is generated from schema, so this wasn't one service — the whole fleet gets the same typed, consistent client. The codegen lives in the open as schema-tools Jinja2 templates.

The previous client was built on the Actix actor model: a supervisor plus separate actors for the connection manager, the producer and each consumer. It worked, but every message crossed an actor boundary — boxed Handler trait objects, ResponseActFuture, context switches and supervision machinery on a path that is really just "read a delivery, route it, ack it". I wanted something leaner and more predictable, with a smaller dependency tree and fewer moving parts to reason about at 3am.

The numbers, measured in production — and kept honest, because our database is still the primary bottleneck, so the messaging layer had headroom to give. Consumer throughput went up ~20%, and producer throughput 2–3× in messaging-heavy workloads. Memory dropped 30–40% under heavy real-world load (and 2–3 MB per idle pod). Dropping the Actix dependency tree cut service build times ~10%, which compounds across CI/CD. And the metric I care about most: messaging-related alerts went from recurring nightly incidents to effectively zero since rollout.

A few of the design choices behind that. Connections are read lock-free through ArcSwap — the hot path checks for a ready connection without taking a mutex. Reconnects are funnelled through a single gate, so a blip never triggers a thundering-herd reconnect storm: the first task to notice reconnects, everyone else waits and rechecks. Connections are wrapped in Arc and reused across workers instead of being re-established or cloned around.

The consumer is plain Tokio primitives, no actor runtime. A tokio::select! (biased, so shutdown always wins over the next delivery) drives the loop; a Semaphore sized to the prefetch limit gives natural backpressure so we never spawn unbounded work; in-flight deliveries live on a JoinSet that drains on shutdown. Routing is statically typed through a Handler trait keyed by message type in an FxHashMap that is shrink_to_fit-ed before the workers start. Concurrency is configurable — we run a single consumer channel per pod today, but it scales out when a workload needs it.

On the producer side the big win is batching. A single publish awaits its confirm inline; the batch path publishes many messages and then calls wait_for_confirms once, amortising the confirm round-trip across the whole batch. Payloads are serialised once and the connection manager is shared via Arc. The batch producer is already deployed; a batch consumption API with batched confirms is next — it won't help every use case, but it should give synchronisation jobs and large producers another solid bump.

If you'd rather read the code than my description of it, the templates are public: the new Tokio client and the old Actix one sit side by side, so the difference in approach is easy to read. Generating it from schema means every service gets the same battle-tested client for free — fix it once, everyone benefits. The database is still the bottleneck, and that's fine. The messaging layer just isn't the thing paging me anymore.


© Wojciech Bator