<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>RocketMQ on Liu Bo</title>
        <link>https://csliubo.com/tags/rocketmq/</link>
        <description>Recent content in RocketMQ on Liu Bo</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en-us</language>
        <lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://csliubo.com/tags/rocketmq/index.xml" rel="self" type="application/rss+xml" /><item>
            <title>When Long-Stable Code Suddenly Starts Failing</title>
            <link>https://csliubo.com/p/latent-bug-threadlocal-pollution/</link>
            <pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate>
            <guid>https://csliubo.com/p/latent-bug-threadlocal-pollution/</guid>
            <description>&lt;p&gt;A colleague pulled me aside at 5pm on a Friday with two screenshots open in his IDE. Internal operations users had been reporting that some asynchronous workloads were occasionally failing. The consumer side of the RocketMQ message couldn&amp;rsquo;t find the row the producer had just committed. Manual retry always succeeded. The code in question was something I&amp;rsquo;d originally written years back. Other engineers had touched it since, but the last meaningful change was more than six months earlier, well before the failures started.&lt;/p&gt;&#xA;&lt;p&gt;We read the producer. We read the consumer. The code looked correct. He had already asked Claude Code to review the full path. Claude&amp;rsquo;s review also concluded the code was correct.&lt;/p&gt;&#xA;&lt;p&gt;Three reviewers, two human and one AI, all said: no bug here. But the bug was real. The RocketMQ DLQ was filling with messages that retried successfully on demand.&lt;/p&gt;&#xA;&lt;p&gt;This post is about what we eventually found. But the bug itself isn&amp;rsquo;t the interesting part. The &lt;em&gt;class&lt;/em&gt; of bug is. The code was locally correct. It had been correct for years. It passed both human and AI review on the day it broke. It broke anyway, because something external to the code had changed.&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;ll call this kind of defect a &lt;strong&gt;latent bug&lt;/strong&gt;: a flaw whose harmlessness depends on invariants that nobody wrote down, that nobody knew were load-bearing, until somebody unknowingly broke them.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&#xA;&lt;/h2&gt;&lt;ol&gt;&#xA;&lt;li&gt;A long-standing RocketMQ consumer started intermittently failing to find data the producer had just committed. Manual retry always recovered.&lt;/li&gt;&#xA;&lt;li&gt;Both human and AI code review found nothing. The code, read in isolation, is correct.&lt;/li&gt;&#xA;&lt;li&gt;Audit logs (enabled months earlier for an unrelated project) revealed the smoking gun: the producer wrote to &lt;code&gt;shard_main&lt;/code&gt;; the failing consumer&amp;rsquo;s SELECT was routed to &lt;code&gt;shard_alt_2&lt;/code&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Root cause: &lt;code&gt;ThreadLocal&lt;/code&gt; pollution. An AOP advice wrote the message&amp;rsquo;s target shard into a per-thread &lt;code&gt;ShardContextHolder&lt;/code&gt; but never cleared it. When a thread was reused for a message with no explicit shard, the residue from the previous alt-shard message routed the query wrong.&lt;/li&gt;&#xA;&lt;li&gt;The bug had existed since day one. It was harmless until six months earlier, when tenant sharding shipped — producing the first messages that ever wrote to the holder, and breaking an unwritten invariant the missing &lt;code&gt;finally&lt;/code&gt; had implicitly relied on.&lt;/li&gt;&#xA;&lt;li&gt;Fix: &lt;code&gt;try/finally&lt;/code&gt; with unconditional &lt;code&gt;clear()&lt;/code&gt;. Defend at the entry-point boundary, not at each call site.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;the-failing-path&#34;&gt;The Failing Path&#xA;&lt;/h2&gt;&lt;p&gt;The application is a multi-tenant SaaS. As tenant data grew, a handful of high-volume tenants were migrated to dedicated database shards (&lt;code&gt;shard_alt_1&lt;/code&gt;, &lt;code&gt;shard_alt_2&lt;/code&gt;, and so on). The vast majority of tenants still live on &lt;code&gt;shard_main&lt;/code&gt;. A routing library reads &lt;code&gt;ShardContextHolder.getCurrentShard()&lt;/code&gt;, a &lt;code&gt;ThreadLocal&amp;lt;String&amp;gt;&lt;/code&gt;, and uses it to pick the data source per query. When the holder is empty, queries route to &lt;code&gt;shard_main&lt;/code&gt; by default.&lt;/p&gt;&#xA;&lt;p&gt;A tenant&amp;rsquo;s shard assignment is static: a tenant is either on &lt;code&gt;shard_main&lt;/code&gt; or has been migrated to one of the &lt;code&gt;shard_alt_*&lt;/code&gt; shards. The relevant convention for the bug story: &lt;em&gt;most messages don&amp;rsquo;t carry an explicit shard, because most tenants are on the default shard&lt;/em&gt;. Only messages for alt-shard tenants set the shard in their metadata.&lt;/p&gt;&#xA;&lt;p&gt;The flow, end to end:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;An internal admin endpoint hands the request to &lt;code&gt;JobIntakeService.submit(request)&lt;/code&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Inside a &lt;code&gt;@Transactional&lt;/code&gt; block, the service INSERTs a row into &lt;code&gt;ingest_jobs&lt;/code&gt;. The producer&amp;rsquo;s transaction is already bound to the tenant&amp;rsquo;s correct shard via a separate routing context, so the row lands on the right shard.&lt;/li&gt;&#xA;&lt;li&gt;Before returning, the service registers an &lt;code&gt;afterCommit&lt;/code&gt; callback via &lt;code&gt;TransactionSynchronizationManager&lt;/code&gt;.&lt;/li&gt;&#xA;&lt;li&gt;On commit, the callback sends a RocketMQ message containing the row&amp;rsquo;s ID, plus an optional &lt;code&gt;shard&lt;/code&gt; field set only when the tenant is on an alt shard.&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;JobWorker.onMessage&lt;/code&gt; receives the message and calls &lt;code&gt;findById(jobId)&lt;/code&gt; to fetch the row and start the actual work.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;The failure mode: &lt;code&gt;findById&lt;/code&gt; returns &lt;code&gt;null&lt;/code&gt;, the consumer throws &lt;code&gt;&amp;quot;job not found&amp;quot;&lt;/code&gt;, the message lands in DLQ. Manual replay always succeeds. The row is, in fact, present in the database, on the shard where the producer correctly stored it.&lt;/p&gt;&#xA;&lt;p&gt;The producer code (simplified):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;14&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;15&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;16&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@Transactional&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;submit&lt;/span&gt;(JobIntakeRequest request) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    IngestJob job &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; IngestJob();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    job.&lt;span style=&#34;color:#50fa7b&#34;&gt;setTenantId&lt;/span&gt;(request.&lt;span style=&#34;color:#50fa7b&#34;&gt;getTenantId&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#6272a4&#34;&gt;// ... populate other fields ...&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ingestJobRepo.&lt;span style=&#34;color:#50fa7b&#34;&gt;save&lt;/span&gt;(job);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    TransactionSynchronizationManager.&lt;span style=&#34;color:#50fa7b&#34;&gt;registerSynchronization&lt;/span&gt;(&lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; TransactionSynchronization() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        @Override&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;afterCommit&lt;/span&gt;() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            JobNotification notification &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; JobNotification(job.&lt;span style=&#34;color:#50fa7b&#34;&gt;getId&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            notification.&lt;span style=&#34;color:#50fa7b&#34;&gt;setShard&lt;/span&gt;(shardConfig.&lt;span style=&#34;color:#50fa7b&#34;&gt;shardFor&lt;/span&gt;(request.&lt;span style=&#34;color:#50fa7b&#34;&gt;getTenantId&lt;/span&gt;()));  &lt;span style=&#34;color:#6272a4&#34;&gt;// null for main-shard tenants&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            mq.&lt;span style=&#34;color:#50fa7b&#34;&gt;send&lt;/span&gt;(Topics.&lt;span style=&#34;color:#50fa7b&#34;&gt;JOB_INTAKE&lt;/span&gt;, notification);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    });&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;The consumer:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@RocketMQMessageListener(topic &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; Topics.&lt;span style=&#34;color:#50fa7b&#34;&gt;JOB_INTAKE&lt;/span&gt;, ...)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;JobWorker&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;implements&lt;/span&gt; RocketMQListener&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;JobNotification&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    @Override&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;onMessage&lt;/span&gt;(JobNotification notification) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6272a4&#34;&gt;// MyBatis-Plus repo: findById returns T or null, not Optional&amp;lt;T&amp;gt;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        IngestJob job &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; ingestJobRepo.&lt;span style=&#34;color:#50fa7b&#34;&gt;findById&lt;/span&gt;(notification.&lt;span style=&#34;color:#50fa7b&#34;&gt;getId&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (job &lt;span style=&#34;color:#ff79c6&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            log.&lt;span style=&#34;color:#50fa7b&#34;&gt;error&lt;/span&gt;(&lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;job not found, id = {}&amp;#34;&lt;/span&gt;, notification.&lt;span style=&#34;color:#50fa7b&#34;&gt;getId&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;throw&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; RuntimeException(&lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;job not found&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6272a4&#34;&gt;// ... process ...&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;There&amp;rsquo;s no bug visible in either snippet. The bug isn&amp;rsquo;t in this code.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;what-three-reviewers-missed&#34;&gt;What Three Reviewers Missed&#xA;&lt;/h2&gt;&lt;p&gt;We worked through the obvious hypotheses. Each was easy to rule out, and every dismissal added to the mystery.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Replication lag?&lt;/strong&gt; The natural first guess for &amp;ldquo;consumer can&amp;rsquo;t find what producer just wrote.&amp;rdquo; But the system uses a single MySQL primary per shard, no read replicas, no read/write splitting middleware. Ruled out.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Snapshot isolation?&lt;/strong&gt; If the consumer was running inside a long-lived transaction, its &lt;code&gt;REPEATABLE READ&lt;/code&gt; snapshot might predate the producer&amp;rsquo;s commit. We checked: &lt;code&gt;findById&lt;/code&gt; runs in autocommit mode, no surrounding &lt;code&gt;@Transactional&lt;/code&gt;. Ruled out.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Cache inconsistency?&lt;/strong&gt; No L2 cache on this entity, and MyBatis-Plus L1 cache is session-scoped, fresh session every call. Ruled out.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;RocketMQ delivery weirdness?&lt;/strong&gt; Async dispatch happens after the JDBC commit completes; the data is durable by the time the producer&amp;rsquo;s MQ client invokes send. Ruled out.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Nested transactions in the producer?&lt;/strong&gt; Was the &lt;code&gt;afterCommit&lt;/code&gt; somehow firing before the actual commit? We traced the call graph end-to-end. No nesting, no &lt;code&gt;REQUIRES_NEW&lt;/code&gt; shenanigans. The &lt;code&gt;afterCommit&lt;/code&gt; fires exactly when Spring&amp;rsquo;s &lt;code&gt;AbstractPlatformTransactionManager.processCommit()&lt;/code&gt; says it should: after &lt;code&gt;doCommit()&lt;/code&gt;, while the connection is still thread-bound. &lt;em&gt;(I&amp;rsquo;d written about this exact sequence in a previous post on &lt;code&gt;afterCommit&lt;/code&gt; deadlocks.)&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;So: the producer&amp;rsquo;s &lt;code&gt;INSERT&lt;/code&gt; is durable on its shard before the message ever leaves the producer. The consumer reads from the same shards the producer writes to. There is no caching, no replica, no snapshot.&lt;/p&gt;&#xA;&lt;p&gt;And yet, occasionally, &lt;code&gt;SELECT * FROM ingest_jobs WHERE id = ?&lt;/code&gt; returns zero rows for an ID the producer had just INSERTed seconds earlier.&lt;/p&gt;&#xA;&lt;p&gt;By the time we&amp;rsquo;d exhausted the conventional explanations, we&amp;rsquo;d burned an hour. The code still looked correct. Claude&amp;rsquo;s review still looked correct. The bug was somewhere we weren&amp;rsquo;t looking.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;the-breakthrough-an-audit-log-enabled-for-an-unrelated-project&#34;&gt;The Breakthrough: An Audit Log Enabled for an Unrelated Project&#xA;&lt;/h2&gt;&lt;p&gt;A few months earlier, for a completely separate effort (workload governance for our database tier, where I needed to aggregate SQL patterns across services), I had enabled the cloud provider&amp;rsquo;s MySQL audit log feature. It captures every query against our production databases with timestamps, originating source, target shard, and the SQL text.&lt;/p&gt;&#xA;&lt;p&gt;The audit log had nothing to do with this bug, conceptually. But it was running. I had access to it. So I went looking.&lt;/p&gt;&#xA;&lt;p&gt;I pulled one failing case from the application logs and queried the audit log for any statement touching that row&amp;rsquo;s job ID. Three matches came back:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Timestamp&lt;/th&gt;&#xA;          &lt;th&gt;Source&lt;/th&gt;&#xA;          &lt;th&gt;Target shard&lt;/th&gt;&#xA;          &lt;th&gt;SQL (abbreviated)&lt;/th&gt;&#xA;          &lt;th&gt;Rows&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;16:49:34.000&lt;/td&gt;&#xA;          &lt;td&gt;producer-pod&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;shard_main&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;INSERT INTO ingest_jobs (id, ...) VALUES (...)&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;1&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;16:49:34.558&lt;/td&gt;&#xA;          &lt;td&gt;consumer-pod-A&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;&lt;code&gt;shard_alt_2&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;SELECT * FROM ingest_jobs WHERE id = ?&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;16:52:57.467&lt;/td&gt;&#xA;          &lt;td&gt;consumer-pod-B&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;shard_main&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;SELECT * FROM ingest_jobs WHERE id = ?&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;1&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;Three things jumped out:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;The producer&amp;rsquo;s INSERT correctly landed on &lt;code&gt;shard_main&lt;/code&gt;, the tenant&amp;rsquo;s actual shard.&lt;/li&gt;&#xA;&lt;li&gt;The consumer&amp;rsquo;s failing SELECT, 558 milliseconds later, was routed to &lt;code&gt;shard_alt_2&lt;/code&gt;, a completely different shard used by a different set of tenants.&lt;/li&gt;&#xA;&lt;li&gt;The successful manual retry, three minutes later, was routed to &lt;code&gt;shard_main&lt;/code&gt;, the correct shard.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;The application&amp;rsquo;s source code didn&amp;rsquo;t write &lt;code&gt;USE shard_alt_2&lt;/code&gt; anywhere. Shard selection happened inside the routing library, which reads from the &lt;code&gt;ShardContextHolder&lt;/code&gt; &lt;code&gt;ThreadLocal&lt;/code&gt;. So the audit log was telling me, in five words: &lt;em&gt;the ThreadLocal had the wrong value&lt;/em&gt;.&lt;/p&gt;&#xA;&lt;p&gt;This wasn&amp;rsquo;t a database visibility problem. It wasn&amp;rsquo;t replication. It wasn&amp;rsquo;t isolation. The application was sending the query to the wrong shard entirely.&lt;/p&gt;&#xA;&lt;p&gt;The moment I saw the &lt;code&gt;shard_alt_2&lt;/code&gt; row, I knew where to look. I&amp;rsquo;d hit a structurally identical bug years ago on a Tomcat thread pool: &lt;code&gt;ThreadLocal&lt;/code&gt; state leaking across requests on a reused worker thread, and the shape of the failure was unmistakable.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;the-mechanism-pollution-in-reused-threads&#34;&gt;The Mechanism: Pollution in Reused Threads&#xA;&lt;/h2&gt;&lt;p&gt;The relevant &lt;code&gt;ThreadLocal&lt;/code&gt; lives in a thin wrapper class:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;ShardContextHolder&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;private&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;final&lt;/span&gt; ThreadLocal&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;String&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; CURRENT_SHARD &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; ThreadLocal&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&amp;gt;&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;setCurrentShard&lt;/span&gt;(String shard) { CURRENT_SHARD.&lt;span style=&#34;color:#50fa7b&#34;&gt;set&lt;/span&gt;(shard); }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; String &lt;span style=&#34;color:#50fa7b&#34;&gt;getCurrentShard&lt;/span&gt;()           { &lt;span style=&#34;color:#ff79c6&#34;&gt;return&lt;/span&gt; CURRENT_SHARD.&lt;span style=&#34;color:#50fa7b&#34;&gt;get&lt;/span&gt;(); }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;clear&lt;/span&gt;()                       { CURRENT_SHARD.&lt;span style=&#34;color:#50fa7b&#34;&gt;remove&lt;/span&gt;(); }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;The routing library reads this on every query:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;ShardAwareDataSource&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;implements&lt;/span&gt; DataSource {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    @Override&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; Connection &lt;span style=&#34;color:#50fa7b&#34;&gt;getConnection&lt;/span&gt;() &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;throws&lt;/span&gt; SQLException {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        String shard &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; ShardContextHolder.&lt;span style=&#34;color:#50fa7b&#34;&gt;getCurrentShard&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (shard &lt;span style=&#34;color:#ff79c6&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            shard &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_main&amp;#34;&lt;/span&gt;;  &lt;span style=&#34;color:#6272a4&#34;&gt;// default routing&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;return&lt;/span&gt; shardRegistry.&lt;span style=&#34;color:#50fa7b&#34;&gt;get&lt;/span&gt;(shard).&lt;span style=&#34;color:#50fa7b&#34;&gt;getConnection&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#6272a4&#34;&gt;// ...&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;On the consumer side, all &lt;code&gt;@RocketMQMessageListener&lt;/code&gt; methods are wrapped by an AOP advice that decides the shard context based on the incoming message&amp;rsquo;s metadata:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@Around(&lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;pointCut()&amp;#34;&lt;/span&gt;)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; Object &lt;span style=&#34;color:#50fa7b&#34;&gt;around&lt;/span&gt;(ProceedingJoinPoint pjp) &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;throws&lt;/span&gt; Throwable {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    JobNotification msg &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; (JobNotification) pjp.&lt;span style=&#34;color:#50fa7b&#34;&gt;getArgs&lt;/span&gt;()&lt;span style=&#34;color:#ff79c6&#34;&gt;[&lt;/span&gt;0&lt;span style=&#34;color:#ff79c6&#34;&gt;]&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    String shard &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;getShard&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (shard &lt;span style=&#34;color:#ff79c6&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ShardContextHolder.&lt;span style=&#34;color:#50fa7b&#34;&gt;setCurrentShard&lt;/span&gt;(shard);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;return&lt;/span&gt; pjp.&lt;span style=&#34;color:#50fa7b&#34;&gt;proceed&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#6272a4&#34;&gt;// ❌ no try/finally. nothing clears.&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Read that last comment again. There is no &lt;code&gt;finally&lt;/code&gt;. The advice writes into the &lt;code&gt;ThreadLocal&lt;/code&gt; and never removes what it wrote.&lt;/p&gt;&#xA;&lt;p&gt;The mechanism doesn&amp;rsquo;t require concurrency. A single-threaded executor processing the same mixed message stream sequentially would break in exactly the same way: alt-shard message A writes the holder; main-shard message B with &lt;code&gt;meta.shard == null&lt;/code&gt; reads the residue. The bug&amp;rsquo;s true precondition is &lt;strong&gt;thread reuse across messages with different shard contexts&lt;/strong&gt;. Concurrency just raises the rate at which heterogeneous messages share a thread.&lt;/p&gt;&#xA;&lt;p&gt;Spring&amp;rsquo;s &lt;code&gt;@RocketMQMessageListener&lt;/code&gt; runs on &lt;code&gt;DefaultRocketMQListenerContainer&lt;/code&gt;&amp;rsquo;s &lt;code&gt;ConsumeMessageConcurrentlyService&lt;/code&gt;, a &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; whose threads live for the lifetime of the application. By default (via the &lt;code&gt;rocketmq-spring&lt;/code&gt; wrapper) the pool has 20 to 64 threads named &lt;code&gt;ConsumeMessageThread_&amp;lt;consumerGroup&amp;gt;_&amp;lt;idx&amp;gt;&lt;/code&gt;. Each thread processes thousands of messages over its lifetime. The &lt;code&gt;ThreadLocal&lt;/code&gt; lives with the thread, not the message.&lt;/p&gt;&#xA;&lt;p&gt;The failure plays out like this:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&lt;strong&gt;Message A arrives&lt;/strong&gt;, for a tenant on &lt;code&gt;shard_alt_2&lt;/code&gt;. The advice sees &lt;code&gt;msg.shard = &amp;quot;shard_alt_2&amp;quot;&lt;/code&gt;, calls &lt;code&gt;setCurrentShard(&amp;quot;shard_alt_2&amp;quot;)&lt;/code&gt;. The consumer processes the message successfully. The advice returns. &lt;strong&gt;Nothing clears.&lt;/strong&gt; The thread&amp;rsquo;s &lt;code&gt;ShardContextHolder&lt;/code&gt; still holds &lt;code&gt;&amp;quot;shard_alt_2&amp;quot;&lt;/code&gt;.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Message B arrives&lt;/strong&gt; on the same thread, for a tenant on &lt;code&gt;shard_main&lt;/code&gt;. The producer leaves &lt;code&gt;shard&lt;/code&gt; null for main-shard tenants (main is implicit; alt shards are explicit), so &lt;code&gt;msg.shard == null&lt;/code&gt;. The advice sees &lt;code&gt;shard == null&lt;/code&gt; and &lt;em&gt;doesn&amp;rsquo;t write anything&lt;/em&gt;. &lt;code&gt;proceed()&lt;/code&gt; runs. The consumer calls &lt;code&gt;findById(jobId)&lt;/code&gt;. The routing library reads &lt;code&gt;ShardContextHolder.getCurrentShard()&lt;/code&gt;, sees &lt;code&gt;&amp;quot;shard_alt_2&amp;quot;&lt;/code&gt; (the residue from message A), and sends the query to &lt;code&gt;shard_alt_2&lt;/code&gt;. The row exists on &lt;code&gt;shard_main&lt;/code&gt;. Zero results.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;The query returns zero rows. The consumer throws &amp;ldquo;job not found.&amp;rdquo; The message lands in DLQ.&lt;/p&gt;&#xA;&lt;p&gt;When the operations user hits &amp;ldquo;retry,&amp;rdquo; the replayed message lands on a different thread whose &lt;code&gt;ShardContextHolder&lt;/code&gt; happens to be empty (or, less commonly, has been overwritten by an earlier main-shard-bound code path). The query routes correctly, the message succeeds. The failure looks transient. The data looks like it has a visibility delay. It doesn&amp;rsquo;t. The first query asked the wrong shard.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;why-it-was-harmless-why-it-broke&#34;&gt;Why It Was Harmless, Why It Broke&#xA;&lt;/h2&gt;&lt;p&gt;The system&amp;rsquo;s sharding model is &lt;strong&gt;default plus override&lt;/strong&gt;: every tenant lives on &lt;code&gt;shard_main&lt;/code&gt; unless explicitly migrated. The producer&amp;rsquo;s metadata reflects this. Only alt-shard tenants set &lt;code&gt;notification.shard&lt;/code&gt;. Main-shard tenants leave it null and rely on the routing library&amp;rsquo;s default.&lt;/p&gt;&#xA;&lt;p&gt;In isolation this convention is fine. &lt;code&gt;ShardAwareDataSource&lt;/code&gt; correctly defaults to &lt;code&gt;shard_main&lt;/code&gt; when the holder is empty. The bug isn&amp;rsquo;t in the convention. It&amp;rsquo;s in the implicit assumption that the holder &lt;em&gt;will&lt;/em&gt; be empty when a main-shard message arrives. On a brand-new thread, or after an explicit clear, yes. On a long-lived worker that&amp;rsquo;s been processing messages for hours, only if every previous message either cleared the holder (which the AOP didn&amp;rsquo;t) or never wrote to it (which was true for years, before sharding shipped).&lt;/p&gt;&#xA;&lt;p&gt;That last clause is the load-bearing one. Here&amp;rsquo;s the unwritten invariant the original code depended on:&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;em&gt;No message carries an explicit shard, so the &lt;code&gt;ThreadLocal&lt;/code&gt; is never written, so the missing &lt;code&gt;finally&lt;/code&gt; doesn&amp;rsquo;t matter.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;The invariant had held since the AOP advice was written, years earlier. The &lt;code&gt;if (shard != null)&lt;/code&gt; check was defensive code, written speculatively to support a sharding scheme that didn&amp;rsquo;t yet exist. The condition never fired. Pollution was impossible because there was nothing to leak.&lt;/p&gt;&#xA;&lt;p&gt;Six months ago, the team rolled out tenant sharding to handle data growth. The first migration was a single tenant. Over the following months, more migrated, one or two at a time as the data team identified candidates. Each migration meant some messages started carrying &lt;code&gt;meta.shard != null&lt;/code&gt;. The dormant &lt;code&gt;if&lt;/code&gt; branch finally executed, the &lt;code&gt;ThreadLocal&lt;/code&gt; started getting non-null writes, and the missing &lt;code&gt;finally&lt;/code&gt; became actively dangerous. The mismatch rate grew with the share of alt-shard traffic. When adoption was tiny the leak was invisible; when it crossed the threshold of observable failures, the alerts started.&lt;/p&gt;&#xA;&lt;p&gt;Neither &lt;code&gt;JobWorker&lt;/code&gt; nor the AOP advice changed in this window. What changed was the &lt;em&gt;shape of the message stream feeding them&lt;/em&gt;, and that was enough to wake a bug that had been asleep for years.&lt;/p&gt;&#xA;&lt;p&gt;(A parallel pollution problem exists on the producer side too: the producer&amp;rsquo;s Tomcat threads accumulate stale &lt;code&gt;ShardContextHolder&lt;/code&gt; state from previous requests. The INSERT happens to land on the right shard because the producer&amp;rsquo;s transaction binds to a shard through a different code path. Anywhere &lt;code&gt;ThreadLocal&lt;/code&gt; is the source of truth for routing, the same trap is waiting.)&lt;/p&gt;&#xA;&lt;p&gt;The specifics here are tenant sharding. But the pattern recurs in any system that reads per-thread context to make decisions: logging MDC traceIds, Spring&amp;rsquo;s &lt;code&gt;SecurityContextHolder&lt;/code&gt;, outbound HTTP client auth headers, distributed tracing span context, A/B test bucketing, feature-flag scoping. If a &lt;code&gt;ThreadLocal&lt;/code&gt; informs &lt;em&gt;any&lt;/em&gt; downstream behavior, the same trap is waiting.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;a-minimal-reproduction&#34;&gt;A Minimal Reproduction&#xA;&lt;/h2&gt;&lt;p&gt;To convince myself the mechanism was real, I reproduced the bug in about 40 lines of pure JDK, no Spring, no RocketMQ:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;14&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;15&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;16&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;17&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;18&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;19&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;20&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;21&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;22&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;23&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;24&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;25&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;26&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;27&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;28&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;29&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;30&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;31&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;32&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;33&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;34&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;35&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;36&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;37&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;38&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;39&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;40&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;41&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;42&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;43&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;import&lt;/span&gt; java.util.concurrent.*;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;ShardLeakDemo&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;final&lt;/span&gt; ThreadLocal&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;String&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; CURRENT_SHARD &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; ThreadLocal&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&amp;gt;&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;record&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;Message&lt;/span&gt;(&lt;span style=&#34;color:#8be9fd&#34;&gt;long&lt;/span&gt; jobId, String metaShard, String actualShard) {}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#6272a4&#34;&gt;/** Models the routing library: returns the shard the next query will hit. */&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; String &lt;span style=&#34;color:#50fa7b&#34;&gt;resolveTargetShard&lt;/span&gt;() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        String current &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; CURRENT_SHARD.&lt;span style=&#34;color:#50fa7b&#34;&gt;get&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;return&lt;/span&gt; current &lt;span style=&#34;color:#ff79c6&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;?&lt;/span&gt; current : &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_main&amp;#34;&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;handle&lt;/span&gt;(Message msg) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6272a4&#34;&gt;// Mimics the AOP advice&amp;#39;s conditional set-without-finally pattern.&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;metaShard&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            CURRENT_SHARD.&lt;span style=&#34;color:#50fa7b&#34;&gt;set&lt;/span&gt;(msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;metaShard&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6272a4&#34;&gt;// No finally. Nothing clears.&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        String resolved &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; resolveTargetShard();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#8be9fd&#34;&gt;boolean&lt;/span&gt; leaked &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;metaShard&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;!&lt;/span&gt;resolved.&lt;span style=&#34;color:#50fa7b&#34;&gt;equals&lt;/span&gt;(msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;actualShard&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        System.&lt;span style=&#34;color:#50fa7b&#34;&gt;out&lt;/span&gt;.&lt;span style=&#34;color:#50fa7b&#34;&gt;printf&lt;/span&gt;(&lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;[%s] jobId=%d actualShard=%-12s resolved=%-12s%s%n&amp;#34;&lt;/span&gt;,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            Thread.&lt;span style=&#34;color:#50fa7b&#34;&gt;currentThread&lt;/span&gt;().&lt;span style=&#34;color:#50fa7b&#34;&gt;getName&lt;/span&gt;(), msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;jobId&lt;/span&gt;,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;actualShard&lt;/span&gt;, resolved,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            leaked &lt;span style=&#34;color:#ff79c6&#34;&gt;?&lt;/span&gt; &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;  &amp;lt;-- LEAK (resolved shard inherited from a previous message)&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;static&lt;/span&gt; &lt;span style=&#34;color:#8be9fd&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;main&lt;/span&gt;(String&lt;span style=&#34;color:#ff79c6&#34;&gt;[]&lt;/span&gt; args) &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;throws&lt;/span&gt; InterruptedException {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ExecutorService pool &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; Executors.&lt;span style=&#34;color:#50fa7b&#34;&gt;newSingleThreadExecutor&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        Message&lt;span style=&#34;color:#ff79c6&#34;&gt;[]&lt;/span&gt; msgs &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; Message(1001, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_alt_1&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_alt_1&amp;#34;&lt;/span&gt;),  &lt;span style=&#34;color:#6272a4&#34;&gt;// alt-shard tenant, explicit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; Message(1002, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_alt_2&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_alt_2&amp;#34;&lt;/span&gt;),  &lt;span style=&#34;color:#6272a4&#34;&gt;// alt-shard tenant, explicit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; Message(1003, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_alt_1&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_alt_1&amp;#34;&lt;/span&gt;),  &lt;span style=&#34;color:#6272a4&#34;&gt;// alt-shard tenant, explicit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; Message(2001, &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;,          &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_main&amp;#34;&lt;/span&gt;),   &lt;span style=&#34;color:#6272a4&#34;&gt;// main-shard tenant, implicit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; Message(2002, &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;,          &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_main&amp;#34;&lt;/span&gt;),   &lt;span style=&#34;color:#6272a4&#34;&gt;// main-shard tenant, implicit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; Message(2003, &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;,          &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;shard_main&amp;#34;&lt;/span&gt;),   &lt;span style=&#34;color:#6272a4&#34;&gt;// main-shard tenant, implicit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        };&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;for&lt;/span&gt; (Message m : msgs) pool.&lt;span style=&#34;color:#50fa7b&#34;&gt;submit&lt;/span&gt;(() &lt;span style=&#34;color:#ff79c6&#34;&gt;-&amp;gt;&lt;/span&gt; handle(m));&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        pool.&lt;span style=&#34;color:#50fa7b&#34;&gt;shutdown&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        pool.&lt;span style=&#34;color:#50fa7b&#34;&gt;awaitTermination&lt;/span&gt;(5, TimeUnit.&lt;span style=&#34;color:#50fa7b&#34;&gt;SECONDS&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Run it. The first three messages each write &lt;code&gt;CURRENT_SHARD&lt;/code&gt; to an alt shard. The last three, meant to operate on &lt;code&gt;shard_main&lt;/code&gt;, write nothing, and the resolved shard inherits whichever alt shard the previous message on the same thread last wrote. Every main-shard message logs &lt;code&gt;LEAK&lt;/code&gt;. The same mechanism manifests identically in any long-lived thread pool: bare &lt;code&gt;ThreadPoolExecutor&lt;/code&gt;, Tomcat&amp;rsquo;s &lt;code&gt;http-nio-*-exec-*&lt;/code&gt;, RocketMQ&amp;rsquo;s &lt;code&gt;ConsumeMessageThread_*&lt;/code&gt;, Kafka consumer pools, &lt;code&gt;@Async&lt;/code&gt; and &lt;code&gt;@Scheduled&lt;/code&gt; pools. The framework is incidental.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;the-fix&#34;&gt;The Fix&#xA;&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;14&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;15&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@Around(&lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;pointCut()&amp;#34;&lt;/span&gt;)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; Object &lt;span style=&#34;color:#50fa7b&#34;&gt;around&lt;/span&gt;(ProceedingJoinPoint pjp) &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;throws&lt;/span&gt; Throwable {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    JobNotification msg &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; (JobNotification) pjp.&lt;span style=&#34;color:#50fa7b&#34;&gt;getArgs&lt;/span&gt;()&lt;span style=&#34;color:#ff79c6&#34;&gt;[&lt;/span&gt;0&lt;span style=&#34;color:#ff79c6&#34;&gt;]&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    String shard &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; msg.&lt;span style=&#34;color:#50fa7b&#34;&gt;getShard&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (shard &lt;span style=&#34;color:#ff79c6&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;null&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ShardContextHolder.&lt;span style=&#34;color:#50fa7b&#34;&gt;setCurrentShard&lt;/span&gt;(shard);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;try&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;return&lt;/span&gt; pjp.&lt;span style=&#34;color:#50fa7b&#34;&gt;proceed&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    } &lt;span style=&#34;color:#ff79c6&#34;&gt;finally&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ShardContextHolder.&lt;span style=&#34;color:#50fa7b&#34;&gt;clear&lt;/span&gt;();   &lt;span style=&#34;color:#6272a4&#34;&gt;// ← the missing line&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Two questions worth answering about this fix.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;clear()&lt;/code&gt; unconditionally, even when the message had no explicit shard and we therefore wrote nothing?&lt;/strong&gt; Because we cannot trust that the body of &lt;code&gt;proceed()&lt;/code&gt; didn&amp;rsquo;t write the &lt;code&gt;ThreadLocal&lt;/code&gt; for its own purposes (a downstream &lt;code&gt;@DS&lt;/code&gt;-style annotation, a manual &lt;code&gt;setCurrentShard&lt;/code&gt; inside the consumer body, a library that uses the same holder). Regardless of how the value got onto the thread, &lt;code&gt;clear()&lt;/code&gt; returns the thread to a known-clean state on the way out. The next message starts from zero, just like a brand-new thread would. Anything weaker leaks state across messages, and we&amp;rsquo;ve just spent a hard incident learning what that costs.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;What about restoring the previous value, the way Spring&amp;rsquo;s &lt;code&gt;RequestContextHolder&lt;/code&gt; does with a save/restore?&lt;/strong&gt; Restoration is the right pattern when you have a sensible &amp;ldquo;outer&amp;rdquo; context to restore to: a child operation temporarily overriding its parent&amp;rsquo;s context, for example. Consumer threads don&amp;rsquo;t have an outer context. Their natural state between messages is &lt;em&gt;no context&lt;/em&gt;, and &lt;code&gt;clear()&lt;/code&gt; is what enforces that. If you find yourself reaching for save/restore here, you&amp;rsquo;re probably modeling consumer messages as nested operations of some outer scope they don&amp;rsquo;t actually have.&lt;/p&gt;&#xA;&lt;p&gt;We made matching fixes for the producer-side leak (the HTTP filter now clears &lt;code&gt;ShardContextHolder&lt;/code&gt; in &lt;code&gt;afterCompletion&lt;/code&gt;, instead of relying on whatever happens to come next), and we audited every other thread-pool entry point in the codebase: &lt;code&gt;@Scheduled&lt;/code&gt;, &lt;code&gt;@Async&lt;/code&gt;, custom &lt;code&gt;ExecutorService&lt;/code&gt;s, Feign async callbacks. Several had the same pattern.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;why-static-review-missed-it&#34;&gt;Why Static Review Missed It&#xA;&lt;/h2&gt;&lt;p&gt;When my colleague brought this bug to me, he, Claude, and I all read the same code and all concluded it was correct. We weren&amp;rsquo;t being careless. We were doing exactly what code review asks of us: reading a function, tracing its logic, checking its edge cases.&lt;/p&gt;&#xA;&lt;p&gt;The thing is, every single read of every single function was &lt;em&gt;correct&lt;/em&gt;. The producer correctly INSERTs the row on the right shard. The consumer correctly queries by ID. The AOP advice correctly sets the shard for messages that carry one. The routing library correctly defaults to &lt;code&gt;shard_main&lt;/code&gt; when the holder is empty. The &lt;code&gt;afterCommit&lt;/code&gt; callback correctly fires after the commit. Read in isolation, every component does what it&amp;rsquo;s supposed to do.&lt;/p&gt;&#xA;&lt;p&gt;The defect lives in the space &lt;em&gt;between&lt;/em&gt; components, in an unwritten contract about thread state that gets violated when two pieces of code that each look correct execute in sequence on the same long-lived thread. The bug isn&amp;rsquo;t in the worker, the AOP advice, or the routing library. It&amp;rsquo;s in the relationship: the worker depends on a &lt;code&gt;ThreadLocal&lt;/code&gt; value that &lt;em&gt;some prior caller, on the same thread, possibly minutes earlier&lt;/em&gt; was supposed to have left in a particular state.&lt;/p&gt;&#xA;&lt;p&gt;This is why static review fails for this class. Review reads code. The bug isn&amp;rsquo;t in any single piece of code. It&amp;rsquo;s in the temporal contract across pieces of code, conditioned on the runtime behavior of a thread pool nobody is reading.&lt;/p&gt;&#xA;&lt;p&gt;The audit log cracked the case because it captured &lt;em&gt;runtime&lt;/em&gt; behavior, a specific query routed to a specific shard at a specific moment. Once you can see a query land on &lt;code&gt;shard_alt_2&lt;/code&gt; when it should have hit &lt;code&gt;shard_main&lt;/code&gt;, the rest of the inference is mechanical. No amount of staring at source code would have produced that observation. It had to come from the running system.&lt;/p&gt;&#xA;&lt;p&gt;If a bug depends on cross-call thread state and an invariant that isn&amp;rsquo;t written down, observability, not review, is your primary defense. Code review remains necessary, but it can&amp;rsquo;t be the only line.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;latent-bugs-as-a-class&#34;&gt;Latent Bugs as a Class&#xA;&lt;/h2&gt;&lt;p&gt;The specific mechanism here is incidental. The underlying pattern recurs across many forms:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;A cache key collision that doesn&amp;rsquo;t matter until two tenants are sharded onto the same node.&lt;/li&gt;&#xA;&lt;li&gt;A non-idempotent retry that works fine until a load balancer starts double-delivering.&lt;/li&gt;&#xA;&lt;li&gt;A foreign key constraint that holds until a backfill job runs out of order.&lt;/li&gt;&#xA;&lt;li&gt;A &amp;ldquo;this can never be null&amp;rdquo; assumption that holds until someone adds a new entry point that bypasses validation.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;In every case, the code is locally correct. It becomes wrong because the environment shifts under it: the invariant it implicitly relied on stops holding. Tests don&amp;rsquo;t catch this; the test environment doesn&amp;rsquo;t reproduce the new condition. Review doesn&amp;rsquo;t catch it; review reads code that is, in fact, correct under the &lt;em&gt;old&lt;/em&gt; invariants.&lt;/p&gt;&#xA;&lt;p&gt;Three things help, in increasing order of leverage:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&lt;strong&gt;Write down the load-bearing invariants when you write the code.&lt;/strong&gt; A comment like &lt;code&gt;// assumes: no message carries an explicit shard yet, revisit if sharding ships&lt;/code&gt; would have given whoever later rolled out sharding a chance to ask the right question.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Defend at boundaries, not at each call site.&lt;/strong&gt; One &lt;code&gt;try/finally&lt;/code&gt; in the AOP entry point neutralizes every downstream leak. Reviewing each caller for &lt;code&gt;ThreadLocal&lt;/code&gt; hygiene doesn&amp;rsquo;t scale; making the boundary trustworthy makes the call sites irrelevant.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Invest in observability before you need it.&lt;/strong&gt; The audit logs I used to crack this case were running because of an unrelated project. That wasn&amp;rsquo;t planning. Observability you have &lt;em&gt;before&lt;/em&gt; the incident is worth orders of magnitude more than observability you scramble to add &lt;em&gt;during&lt;/em&gt; one.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;operational-checklist&#34;&gt;Operational Checklist&#xA;&lt;/h2&gt;&lt;p&gt;If you operate any Java service that combines a long-lived thread pool with &lt;code&gt;ThreadLocal&lt;/code&gt;-based context (shard routing, request-scoped auth, MDC for logging, distributed tracing span context, &amp;hellip;), here&amp;rsquo;s the audit I would run before next Monday:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&lt;code&gt;grep -r &amp;quot;ThreadLocal&amp;quot; src/main/java&lt;/code&gt;. List every &lt;code&gt;ThreadLocal&lt;/code&gt; field and every wrapper holder (&lt;code&gt;*ContextHolder&lt;/code&gt;, &lt;code&gt;MDC&lt;/code&gt;, &lt;code&gt;TtlContext&lt;/code&gt;, etc.).&lt;/li&gt;&#xA;&lt;li&gt;For every &lt;code&gt;set&lt;/code&gt; / &lt;code&gt;push&lt;/code&gt; / &lt;code&gt;put&lt;/code&gt;, find the matching &lt;code&gt;remove&lt;/code&gt; / &lt;code&gt;clear&lt;/code&gt;. Is it in the same lexical scope? Is it inside a &lt;code&gt;finally&lt;/code&gt;? If either answer is no, that&amp;rsquo;s a leak candidate.&lt;/li&gt;&#xA;&lt;li&gt;List every long-lived thread-pool entry point: HTTP filters, &lt;code&gt;@RocketMQMessageListener&lt;/code&gt;, Kafka listeners, &lt;code&gt;@Async&lt;/code&gt;, &lt;code&gt;@Scheduled&lt;/code&gt;, custom &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; workers, Feign async callbacks. Each must, at the entry-point boundary, &lt;em&gt;clear&lt;/em&gt; all per-thread state on exit.&lt;/li&gt;&#xA;&lt;li&gt;Prefer unconditional &lt;code&gt;clear()&lt;/code&gt; at entry-point boundaries over &amp;ldquo;restore previous&amp;rdquo;. Restoration is for nested contexts; thread-pool workers don&amp;rsquo;t have an outer context, so the right reset is to nothing.&lt;/li&gt;&#xA;&lt;li&gt;Add a guard at the top of each entry-point advice: if the &lt;code&gt;ThreadLocal&lt;/code&gt; you&amp;rsquo;re about to populate is &lt;em&gt;not&lt;/em&gt; empty on entry, log a warning. This turns leak state into a noisy, debuggable signal long before it causes a production incident.&lt;/li&gt;&#xA;&lt;li&gt;Audit any feature that introduces a new &lt;em&gt;category&lt;/em&gt; of inbound traffic, or starts exercising a previously-dormant branch. Ask explicitly: &lt;em&gt;which invariants did the previous traffic pattern satisfy, and does my new category break any of them?&lt;/em&gt; Write the invariants down in the same commit.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;The code wasn&amp;rsquo;t wrong. The contract it depended on had stopped holding. That contract, not the source, is what you have to keep current.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;p&gt;&lt;em&gt;References:&lt;/em&gt;&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/transaction/support/TransactionSynchronizationManager.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;&#xA;    &gt;Spring Framework Javadoc — &lt;code&gt;TransactionSynchronizationManager&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://shardingsphere.apache.org/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;&#xA;    &gt;Apache ShardingSphere&lt;/a&gt; — one representative tenant-aware sharding library. The mechanism described in this post applies to any router whose decision is read from per-thread state.&lt;/li&gt;&#xA;&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/apache/rocketmq-spring&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;&#xA;    &gt;RocketMQ Spring — &lt;code&gt;DefaultRocketMQListenerContainer&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Companion post: &lt;a class=&#34;link&#34; href=&#34;https://csliubo.com/p/spring-aftercommit-deadlock/&#34; &gt;&lt;em&gt;When Does Spring Actually Release Your Database Connection?&lt;/em&gt;&lt;/a&gt; — the other half of &amp;ldquo;the bug is in the runtime, not the code&amp;rdquo; pair.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr&gt;&#xA;&lt;p&gt;&lt;em&gt;A note on this post&amp;rsquo;s code samples: identifiers, table and shard names, and some structural details have been abstracted from the actual incident for desensitization. The production code differs in its specific routing setup. The underlying mechanism (&lt;code&gt;ThreadLocal&lt;/code&gt; pollution at a thread-pool entry-point boundary, woken by a feature that introduced a previously-dormant code path) and the timeline (the bug had been latent for years; the triggering feature shipped six months earlier; the failure rate climbed as adoption grew) are accurate to what happened in production.&lt;/em&gt;&lt;/p&gt;&#xA;</description>
        </item></channel>
</rss>
