<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Debezium on Liu Bo</title>
        <link>https://csliubo.com/tags/debezium/</link>
        <description>Recent content in Debezium on Liu Bo</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en-us</language>
        <lastBuildDate>Wed, 22 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://csliubo.com/tags/debezium/index.xml" rel="self" type="application/rss+xml" /><item>
            <title>Why Is My Flink TaskManager Eating 8 GB on a 10K-Row UPDATE?</title>
            <link>https://csliubo.com/p/flink-cdc-source-queue-oom/</link>
            <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
            <guid>https://csliubo.com/p/flink-cdc-source-queue-oom/</guid>
            <description>&lt;blockquote&gt;&#xA;&lt;p&gt;Measured: 2 GiB TaskManager survives a 10K-row UPDATE under a specific config, on a fat-row-rich slice where 13% of rows are 800 KiB+. Extrapolated via the Flink memory-model formula to a 12 GiB production TM. Every claim below is tagged measured vs extrapolated.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&#xA;&lt;/h2&gt;&lt;p&gt;A stock-config Flink CDC pipeline OOMs when one MySQL transaction is large and the UPDATE hits an id range clustered with fat rows. The root cause is two &lt;strong&gt;independent&lt;/strong&gt; source-side queues: Debezium&amp;rsquo;s &lt;code&gt;ChangeEventQueue&lt;/code&gt; (default cap: 8192 events) and Flink&amp;rsquo;s &lt;code&gt;FutureCompletingBlockingQueue&lt;/code&gt; (default cap: 2 elements, each holding up to &lt;code&gt;max.batch.size&lt;/code&gt; events). Together they can hold gigabytes of in-flight events. Each fat event also sits on heap at roughly &lt;code&gt;MySQL byte size × 2 (BEFORE+AFTER) × 2 (UTF-16 compact-string fallback)&lt;/code&gt; — about 3.1 MiB for an 800 KiB JSON row.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Fix&lt;/strong&gt; (three config lines, no code change):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;debezium.max.queue.size=100&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;debezium.max.batch.size=50&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;taskmanager.memory.managed.fraction=0.1&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Measured at 2 GiB TM only; production numbers are formula-extrapolated. Uniform-distribution UPDATEs would be far lighter than the fat-rich slice used here.&lt;/p&gt;&#xA;&lt;h2 id=&#34;0-the-setup&#34;&gt;0. The Setup&#xA;&lt;/h2&gt;&lt;p&gt;Our MySQL-to-Doris CDC pipeline (Flink CDC + flink-doris-connector) had a recurring problem: after a large UPDATE on one particular table, the downstream Flink TaskManager would OOM.&lt;/p&gt;&#xA;&lt;p&gt;The table belongs to a &lt;strong&gt;rich-text clinical record system&lt;/strong&gt;. Historical content had been stored as &lt;strong&gt;inline base64-encoded images&lt;/strong&gt; inside a JSON column. A new tool had shipped to rewrite those blobs into object-storage links, but un-migrated document templates still produced the old inline form. A backfill UPDATE to convert legacy records to the new layout is what triggered this OOM.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Fact check (confirmed post-mortem)&lt;/strong&gt;: production had &lt;strong&gt;zero Debezium / Flink CDC tuning&lt;/strong&gt;. &lt;code&gt;max.batch.size=2048&lt;/code&gt;, &lt;code&gt;max.queue.size=8192&lt;/code&gt;, &lt;code&gt;managed.fraction=0.4&lt;/code&gt; were all defaults. TM 12 GiB, JM 24 GiB.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Production&amp;rsquo;s emergency path&lt;/strong&gt; (context only; this investigation didn&amp;rsquo;t drive it):&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Immediately after the OOM, ops &lt;strong&gt;removed the affected table from &lt;code&gt;include-tables&lt;/code&gt;&lt;/strong&gt; so the rest of the tables kept syncing.&lt;/li&gt;&#xA;&lt;li&gt;Then they retried the backfill in smaller batches:&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Month-sized batches: still OOM or not draining.&lt;/li&gt;&#xA;&lt;li&gt;100 rows per UPDATE (app-side loops issuing &lt;code&gt;UPDATE ... WHERE id IN (...)&lt;/code&gt;): still failed.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;50 rows per UPDATE: succeeded.&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;Note that 50 and 100 are &lt;strong&gt;rows per SQL statement at the application layer&lt;/strong&gt;, not Debezium&amp;rsquo;s &lt;code&gt;max.batch.size&lt;/code&gt;. Production&amp;rsquo;s CDC config was never changed.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Questions this investigation set out to answer&lt;/strong&gt;:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;With stock defaults, why does a single large transaction OOM the TaskManager? Where is the root cause?&lt;/li&gt;&#xA;&lt;li&gt;Is there a durable config change, &lt;strong&gt;not requiring application-side batch splitting&lt;/strong&gt;, that lets the pipeline absorb large transactions on its own?&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;Spoiler: yes. See §5. But with capacity caveats.&lt;/p&gt;&#xA;&lt;h2 id=&#34;1-terminology-three-layers-of-event-size&#34;&gt;1. Terminology: three layers of event size&#xA;&lt;/h2&gt;&lt;p&gt;Every &amp;ldquo;bytes per event&amp;rdquo; number in the rest of the post falls into one of these three tiers. I won&amp;rsquo;t mix them.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Layer&lt;/th&gt;&#xA;          &lt;th&gt;Symbol&lt;/th&gt;&#xA;          &lt;th&gt;Typical value&lt;/th&gt;&#xA;          &lt;th&gt;How measured&lt;/th&gt;&#xA;          &lt;th&gt;Meaning&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;MySQL binlog raw bytes&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;R_binlog&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;~135 KiB/event&lt;/strong&gt; (non-fat slice); &lt;strong&gt;~230 KiB/event&lt;/strong&gt; (fat-rich slice, §3.6)&lt;/td&gt;&#xA;          &lt;td&gt;§3.1 probe: 1000-row UPDATE → binlog POS +135 MiB&lt;/td&gt;&#xA;          &lt;td&gt;UTF-8 bytes inside the binlog file&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Debezium &lt;code&gt;DataChangeEvent&lt;/code&gt; on heap&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;R_dbz&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;~230 KiB/event&lt;/strong&gt; (single heap-dump observation)&lt;/td&gt;&#xA;          &lt;td&gt;§3.4 MAT: Debezium &lt;code&gt;ChangeEventQueue&lt;/code&gt; retained 23 MiB / 100 events&lt;/td&gt;&#xA;          &lt;td&gt;Target-table events only (&lt;code&gt;table.include.list&lt;/code&gt; applied; §3.5)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Flink &lt;code&gt;SourceRecord&lt;/code&gt; on heap&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;R_flink&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;~970 KiB/event&lt;/strong&gt; (batch-avg on fat slice); &lt;strong&gt;~14 MiB&lt;/strong&gt; for a single max-fat row&lt;/td&gt;&#xA;          &lt;td&gt;§3.4 MAT: &lt;code&gt;MySqlRecords&lt;/code&gt; retained 97 MiB / 100 events&lt;/td&gt;&#xA;          &lt;td&gt;Target-table only, past the Debezium queue&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;A small naming hazard worth knowing upfront: &lt;strong&gt;&lt;code&gt;SourceRecord&lt;/code&gt; (singular, Kafka Connect) is the event object&lt;/strong&gt;; &lt;strong&gt;&lt;code&gt;SourceRecords&lt;/code&gt; (plural, Flink CDC) is a batch container&lt;/strong&gt; wrapping many &lt;code&gt;SourceRecord&lt;/code&gt;s. The MAT suspect ranking in §3.2 mixes these names, so keep the singular/plural distinction in mind when reading.&lt;/p&gt;&#xA;&lt;p&gt;About the 4× gap between &lt;code&gt;R_dbz&lt;/code&gt; (230 KiB) and &lt;code&gt;R_flink&lt;/code&gt; (970 KiB): the two layers hold the same &lt;code&gt;SourceRecord&lt;/code&gt; objects (Debezium&amp;rsquo;s &lt;code&gt;DataChangeEvent&lt;/code&gt; has just one &lt;code&gt;SourceRecord record&lt;/code&gt; field, no other state), and both queues only contain target-table events. The 4× gap is most likely a single-snapshot timing artifact; §3.11.2 shows the supporting experiment. This detail doesn&amp;rsquo;t affect the §5.2 capacity math, where the worst-case is computed against max-fat rows (14 MiB per event on both layers).&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Unit convention&lt;/strong&gt;: base-2 throughout (KiB / MiB / GiB). &lt;code&gt;information_schema.data_length&lt;/code&gt; is also treated as base-2 per MySQL docs.&lt;/p&gt;&#xA;&lt;h2 id=&#34;2-environment-and-reproduction&#34;&gt;2. Environment and reproduction&#xA;&lt;/h2&gt;&lt;h3 id=&#34;21-cluster-and-target-table&#34;&gt;2.1 Cluster and target table&#xA;&lt;/h3&gt;&lt;ul&gt;&#xA;&lt;li&gt;k8s + Flink Operator + FlinkDeployment CR. &lt;code&gt;namespace=cdc-dev&lt;/code&gt;, standalone deployment named &lt;code&gt;cdc-oom-repro&lt;/code&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Target: &lt;code&gt;app_db.clinical_records&lt;/code&gt;.&#xA;&lt;ul&gt;&#xA;&lt;li&gt;254,832 rows (&lt;code&gt;SELECT COUNT(*)&lt;/code&gt;, with a +17-row drift during our queries; normal dev-db churn).&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;data_length = 16.2 GiB&lt;/code&gt; (from &lt;code&gt;information_schema.tables&lt;/code&gt;: clustered-index columns + InnoDB page overhead + fragmentation; excludes secondary indexes and off-page BLOBs).&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;avg_row_length = 83 KiB&lt;/code&gt; (same scope, all columns).&lt;/li&gt;&#xA;&lt;li&gt;Summing only the five &amp;ldquo;large text columns&amp;rdquo; (&lt;code&gt;content_json&lt;/code&gt; + four others) gives a weighted mean of &lt;strong&gt;65 KiB/row&lt;/strong&gt;. The 18 KiB delta is the other ~22 columns + row overhead. Plausible.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;22-nacos-config-scope-down-to-one-table&#34;&gt;2.2 Nacos config: scope down to one table&#xA;&lt;/h3&gt;&lt;p&gt;Three changes to the pipeline&amp;rsquo;s Nacos &lt;code&gt;[mysql-1]&lt;/code&gt; section:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-diff&#34; data-lang=&#34;diff&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f55&#34;&gt;- include-tables=.*&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#50fa7b;font-weight:bold&#34;&gt;+ include-tables=clinical_records&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f55&#34;&gt;- table-name=^app_db\.(?!undo_log|tmp).*$&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#50fa7b;font-weight:bold&#34;&gt;+ table-name=^app_db\.clinical_records$&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#50fa7b;font-weight:bold&#34;&gt;+ scan.startup.mode=latest-offset&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;h3 id=&#34;23-flinkdeployment-diagnostic-flags&#34;&gt;2.3 FlinkDeployment diagnostic flags&#xA;&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;14&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;15&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;16&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;17&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;18&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;19&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;20&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;21&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;22&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;23&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;24&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;25&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;26&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;flinkConfiguration&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ff79c6&#34;&gt;kubernetes.operator.job.restart.failed&lt;/span&gt;: &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;false&amp;#34;&lt;/span&gt;   &lt;span style=&#34;color:#6272a4&#34;&gt;# keep OOM state; operator won&amp;#39;t rebuild the pod&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ff79c6&#34;&gt;env.java.opts.taskmanager&lt;/span&gt;: &amp;gt;-&lt;span style=&#34;color:#f1fa8c&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f1fa8c&#34;&gt;    -XX:+HeapDumpOnOutOfMemoryError&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f1fa8c&#34;&gt;    -XX:HeapDumpPath=/tmp/dumps&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f1fa8c&#34;&gt;    -XX:+ExitOnOutOfMemoryError&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f1fa8c&#34;&gt;    -Xlog:gc*:file=/tmp/dumps/gc-%p.log:time,uptime,level,tags:filecount=5,filesize=20M&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;jobManager&lt;/span&gt;:           &lt;span style=&#34;color:#6272a4&#34;&gt;# fixed throughout; not varied&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ff79c6&#34;&gt;resource&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;memory&lt;/span&gt;: &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;4g&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;cpu&lt;/span&gt;: &lt;span style=&#34;color:#bd93f9&#34;&gt;1&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;taskManager&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ff79c6&#34;&gt;resource&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;memory&lt;/span&gt;: &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;2g&amp;#34;&lt;/span&gt;    &lt;span style=&#34;color:#6272a4&#34;&gt;# the only knob I varied across runs&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;cpu&lt;/span&gt;: &lt;span style=&#34;color:#bd93f9&#34;&gt;2&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ff79c6&#34;&gt;podTemplate&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;spec&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#ff79c6&#34;&gt;containers&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        - &lt;span style=&#34;color:#ff79c6&#34;&gt;name&lt;/span&gt;: flink-main-container&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#ff79c6&#34;&gt;volumeMounts&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            - { &lt;span style=&#34;color:#ff79c6&#34;&gt;name: dump-volume, mountPath&lt;/span&gt;: /tmp/dumps }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#ff79c6&#34;&gt;volumes&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        - &lt;span style=&#34;color:#ff79c6&#34;&gt;name&lt;/span&gt;: dump-volume&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#ff79c6&#34;&gt;persistentVolumeClaim&lt;/span&gt;: { &lt;span style=&#34;color:#ff79c6&#34;&gt;claimName&lt;/span&gt;: oom-dump-pvc }   &lt;span style=&#34;color:#6272a4&#34;&gt;# 40Gi NFS-SSD&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;code&gt;kubernetes.operator.job.restart.failed: &amp;quot;false&amp;quot;&lt;/code&gt; prevents operator-level pod rebuilds on failure. The PVC ensures the heap dump survives pod termination so I can pull it out later with a dump-reader pod plus &lt;code&gt;kubectl cp&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;h3 id=&#34;24-fixed-workload-and-a-caveat-about-representativeness&#34;&gt;2.4 Fixed workload (and a caveat about representativeness)&#xA;&lt;/h3&gt;&lt;p&gt;Every run: a 10,000-row UPDATE, &lt;code&gt;WHERE id BETWEEN 1870319408066945025 AND 1913089719906832385&lt;/code&gt;, &lt;code&gt;SET modify_time = NOW()&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;The row-size distribution (§3.6) shows 4.1% of the whole table falls in the 500 KiB to 1 MiB &amp;ldquo;fat&amp;rdquo; bucket, but this particular id range is &lt;strong&gt;13.3% fat&lt;/strong&gt;, 3× the table-wide concentration. &lt;strong&gt;Conclusions here do not represent an arbitrary 10k UPDATE&lt;/strong&gt;. They hold only when the workload has this kind of fat-row clustering, which is plausible in production but not the common case.&lt;/p&gt;&#xA;&lt;h3 id=&#34;25-a-structural-dev-vs-prod-gap-i-cant-close&#34;&gt;2.5 A structural dev-vs-prod gap I can&amp;rsquo;t close&#xA;&lt;/h3&gt;&lt;p&gt;My experiments run in a dev cluster with &lt;code&gt;include-tables&lt;/code&gt; narrowed to the single target table. Debezium&amp;rsquo;s &lt;code&gt;table.include.list&lt;/code&gt; contains just that one table, and its queue is nearly empty before each burst.&lt;/p&gt;&#xA;&lt;p&gt;Production is different:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;include-tables=.*&lt;/code&gt;. Debezium listens to tens to hundreds of tables.&lt;/li&gt;&#xA;&lt;li&gt;Real users continuously generate binlog events.&lt;/li&gt;&#xA;&lt;li&gt;Debezium&amp;rsquo;s queue is never empty. At any moment it holds dozens to hundreds of events from other tables.&lt;/li&gt;&#xA;&lt;li&gt;A large transaction&amp;rsquo;s commit burst layers on top of that existing queue, not onto empty.&lt;/li&gt;&#xA;&lt;li&gt;Average event size is smaller in multi-table mode (most tables are narrow), but a fat-table burst produces a heap mixture of many small events plus some fat events.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;This gap is structurally unreproducible in dev: even if I set &lt;code&gt;include-tables=.*&lt;/code&gt; in dev Nacos, the dev MySQL has no real users and negligible background traffic.&lt;/p&gt;&#xA;&lt;p&gt;Implications:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;The upper-bound formulas below (&lt;code&gt;queue.size × R_per_event&lt;/code&gt;, summed across layers) still apply. &lt;code&gt;queue=100&lt;/code&gt; is a hard cap in any workload; the heap ceiling is bounded regardless of event mix.&lt;/li&gt;&#xA;&lt;li&gt;The exact &amp;ldquo;50 rows succeeds, 100 rows fails&amp;rdquo; threshold observed in production is not reproducible in dev. I don&amp;rsquo;t have a data-level explanation. My guess is that background-queue depth plus fat-event arrival patterns interact, but this isn&amp;rsquo;t verified.&lt;/li&gt;&#xA;&lt;li&gt;The proposed fix (&lt;code&gt;queue=100 + fraction=0.1&lt;/code&gt;) holds by design for prod. Why &amp;ldquo;50 vs 100&amp;rdquo; specifically is the production threshold is out of scope here.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;3-investigation&#34;&gt;3. Investigation&#xA;&lt;/h2&gt;&lt;h3 id=&#34;31-ruling-out-partial-flush-during-commit&#34;&gt;3.1 Ruling out &amp;ldquo;partial flush during commit&amp;rdquo;&#xA;&lt;/h3&gt;&lt;p&gt;Early on I carried an assumption: &amp;ldquo;for a very large transaction, MySQL flushes binlog incrementally during execution.&amp;rdquo; Reviewing my own reasoning I noticed this contradicts 2PC semantics, so I probed it:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;8&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;-- Poll SHOW MASTER STATUS once per second while a 1000-row UPDATE runs&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;14:31:57   POS=511,024,257    +2.8 MiB/s background&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;14:31:58   POS=513,860,063&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;14:31:59   POS=515,768,179&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=== FIRE UPDATE @14:32:00.026 ===&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;14:32:00.616   POS=651,155,487   ← +135 MiB in 0.6s&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=== UPDATE RETURNED @14:32:01.595 ===&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;14:32:02.056   POS=652,415,594   +1.2 MiB (back to background rate)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Binlog is flushed &lt;strong&gt;at once during the 2PC commit phase&lt;/strong&gt;, before the client receives its ACK. Consistent with durability-before-ack semantics. &amp;ldquo;Incremental flush&amp;rdquo; was the wrong mental model.&lt;/p&gt;&#xA;&lt;p&gt;(This probe yields an initial &lt;code&gt;R_binlog = 135 KiB/event&lt;/code&gt;, but those 1000 rows fell in a non-fat id range. The 10k fat-rich slice gets recomputed in §3.6.)&lt;/p&gt;&#xA;&lt;p&gt;Side observation: &lt;code&gt;SHOW BINARY LOGS&lt;/code&gt; shows &lt;code&gt;mysql-bin.001440&lt;/code&gt; reached 2.7 GiB, confirming that a big transaction can push a binlog file past &lt;code&gt;max_binlog_size&lt;/code&gt; (MySQL 8.0 default 1 GiB). That 2.7 GiB is our 10k UPDATE transaction plus concurrent background writes — roughly 2.2 GiB for our transaction (slice big-field sum 1.1 GiB × 2 for BEFORE+AFTER, see §3.6) plus background traffic.&lt;/p&gt;&#xA;&lt;h3 id=&#34;32-first-reproduction-2g--stock-defaults&#34;&gt;3.2 First reproduction: 2g + stock defaults&#xA;&lt;/h3&gt;&lt;ul&gt;&#xA;&lt;li&gt;14:17:16 fire 10k UPDATE. InnoDB grinds through PK-ordered pages for ~5m45s (the latency is non-linear vs 1k UPDATE, see Appendix A).&lt;/li&gt;&#xA;&lt;li&gt;14:23:02 MySQL commit.&lt;/li&gt;&#xA;&lt;li&gt;Immediately after commit, TM OOMs. Flink internally restarts it 8 times (taskmanager-1-1 through 1-8) in a replay death spiral until I manually suspend the job.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;First heap dump: &lt;strong&gt;453 MiB&lt;/strong&gt; (triggered by &lt;code&gt;-XX:+HeapDumpOnOutOfMemoryError&lt;/code&gt;; the specific &lt;code&gt;OutOfMemoryError&lt;/code&gt; subclass wasn&amp;rsquo;t captured in logs). MAT Leak Suspects:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Rank&lt;/th&gt;&#xA;          &lt;th&gt;Share&lt;/th&gt;&#xA;          &lt;th&gt;Object&lt;/th&gt;&#xA;          &lt;th&gt;Thread&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;1&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;29%&lt;/strong&gt; / 126 MiB&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;FutureCompletingBlockingQueue&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;Source Data Fetcher&lt;/code&gt;, blocked at &lt;code&gt;FutureCompletingBlockingQueue.java:203 waitOnPut&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;2&lt;/td&gt;&#xA;          &lt;td&gt;18% / 75 MiB&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;MySqlRecords&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;same fetcher, mid-&lt;code&gt;put&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;3&lt;/td&gt;&#xA;          &lt;td&gt;17% / 73 MiB&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;SourceRecords&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;Source: mysql-1&lt;/code&gt; main thread, &lt;code&gt;SourceReaderBase.pollNext:160&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;4&lt;/td&gt;&#xA;          &lt;td&gt;11% / 46 MiB&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;MySqlRecords&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;main thread, &lt;code&gt;SourceReaderBase.pollNext:173&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;&lt;strong&gt;Σ = 75%&lt;/strong&gt;. MAT&amp;rsquo;s &amp;ldquo;related via common path&amp;rdquo; hint notes 1+2 and 3+4 share dominator paths, and MAT doesn&amp;rsquo;t strictly deduplicate retained sizes, so the naive sum is qualitative only.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&amp;ldquo;Flink self-restarts&amp;rdquo; above means Flink&amp;rsquo;s internal &lt;code&gt;restart-strategy&lt;/code&gt; (default &lt;code&gt;FixedDelayRestartBackoffTimeStrategy&lt;/code&gt;, &lt;code&gt;maxAttempts=2,147,483,647&lt;/code&gt;) respawning tasks — a different layer from the &lt;code&gt;kubernetes.operator.job.restart.failed: &amp;quot;false&amp;quot;&lt;/code&gt; in §2.3. The operator flag prevents &lt;strong&gt;pod/deployment-level&lt;/strong&gt; rebuilds; Flink&amp;rsquo;s own scheduler can still restart tasks inside a living TM. This is also why the 4g run below &amp;ldquo;looked fine&amp;rdquo; while actually OOMing mid-drain.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;The decisive finding: &lt;strong&gt;75% of retention is on the Flink CDC source side&lt;/strong&gt;. Doris-sink objects (&lt;code&gt;DorisBatchStreamLoad&lt;/code&gt;, &lt;code&gt;BatchRecordBuffer&lt;/code&gt;) don&amp;rsquo;t even appear in the top 10.&lt;/p&gt;&#xA;&lt;h3 id=&#34;33-heap-ladder-248g-measured-1216g-formula-extrapolated&#34;&gt;3.3 Heap ladder (2/4/8g measured; 12/16g formula-extrapolated)&#xA;&lt;/h3&gt;&lt;p&gt;&lt;code&gt;task.heap&lt;/code&gt; is computed via the Flink 1.18 memory model: &lt;code&gt;flink.memory.process.size&lt;/code&gt; minus metaspace/overhead/managed/network/framework. At the 2g tier the formula matches the TM startup log exactly (validated in §3.10).&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;flink.size  = process.size - metaspace(256 MiB) - overhead(clamp(10% × process.size, 192, 1024))&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;managed     = fraction × flink.size&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;network     = max(10% × flink.size, 64)   # no upper cap in Flink 1.18 by default&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;task.heap   = flink.size - framework.heap(128) - framework.off(128) - managed - network&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Network note: &lt;code&gt;taskmanager.memory.network.max&lt;/code&gt; defaults to &lt;code&gt;MemorySize.MAX_VALUE&lt;/code&gt; (no cap) in Flink 1.18.1. Only JVM-overhead is capped at 1 GiB. Earlier drafts of this post assumed network was capped at 1 GiB too, which was wrong.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;TM process&lt;/th&gt;&#xA;          &lt;th&gt;managed.fraction&lt;/th&gt;&#xA;          &lt;th&gt;managed&lt;/th&gt;&#xA;          &lt;th&gt;network&lt;/th&gt;&#xA;          &lt;th&gt;&lt;strong&gt;task.heap&lt;/strong&gt;&lt;/th&gt;&#xA;          &lt;th&gt;10k UPDATE result&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;2 GiB&lt;/td&gt;&#xA;          &lt;td&gt;0.4 (default)&lt;/td&gt;&#xA;          &lt;td&gt;635&lt;/td&gt;&#xA;          &lt;td&gt;159&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;537 MiB&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;✗ OOM at commit (measured; formula matches TM log)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;4 GiB&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;1,372&lt;/td&gt;&#xA;          &lt;td&gt;343&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;1,459 MiB&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;✗ silent OOM during drain (hprof 1.2 GiB), Flink self-restart masked it&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;8 GiB&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;2,847&lt;/td&gt;&#xA;          &lt;td&gt;712&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;3,302 MiB&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;✓ no OOM, drain 3m33s, container RSS peak 4,680 MiB (includes managed off-heap)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;12 GiB (prod)&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;4,403&lt;/td&gt;&#xA;          &lt;td&gt;1,101&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;5,248 MiB ≈ 5.1 GiB&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;not measured at 12g; formula-extrapolated&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;16 GiB&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;6,042&lt;/td&gt;&#xA;          &lt;td&gt;1,510&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;7,296 MiB ≈ 7.1 GiB&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;not measured at 16g&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;Visually:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;TM process    task.heap  (fraction=0.4 default)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  2 GiB         537 MiB  ▓░░░░░░░░░░░░░░░░░░░░░░  OOM at commit           ✗&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  4 GiB       1,459 MiB  ▓▓▓▓░░░░░░░░░░░░░░░░░░░  silent OOM during drain ✗&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  8 GiB       3,302 MiB  ▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░  no OOM                  ✓&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 12 GiB       5,248 MiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░  formula-extrapolated&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 16 GiB       7,296 MiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░  formula-extrapolated&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;The 4g tier is a trap. Monitoring showed the job &amp;ldquo;finished,&amp;rdquo; but the PVC had a 1.2 GiB hprof sitting in it. The TM actually OOMed mid-drain and Flink&amp;rsquo;s auto-restart masked the OOM. Always check the dump directory, not just the &amp;ldquo;is the job running&amp;rdquo; signal.&lt;/p&gt;&#xA;&lt;h3 id=&#34;34-hypothesis-one-cap-batch-and-queue-to-100&#34;&gt;3.4 Hypothesis one: cap batch and queue to 100&#xA;&lt;/h3&gt;&lt;p&gt;From MAT Suspect #2: the default &lt;code&gt;MySqlRecords&lt;/code&gt; batch averages 75 MiB. Intuition: &amp;ldquo;shrink the batch, problem solved.&amp;rdquo;&lt;/p&gt;&#xA;&lt;p&gt;I changed both knobs at the same time in a single Nacos update (an early draft of this post mentioned only &lt;code&gt;batch&lt;/code&gt;, which was a slip):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;debezium.max.batch.size=100&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;debezium.max.queue.size=100&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Still OOM (2g, hprof 446 MiB). MAT Top Consumers:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;MySqlRecords @0xda4da748        retained 97 MiB   ← one batch, 100 events&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ChangeEventQueue @0xd977d4a0    retained 23 MiB   ← Debezium layer&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;So &lt;code&gt;R_flink = 97 MiB / 100 events ≈ 970 KiB/event retained&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;ChangeEventQueue&lt;/code&gt; dropped from its theoretical &lt;code&gt;8192 × R_dbz ≈ 1.8 GiB&lt;/code&gt; upper bound down to 23 MiB, so &lt;code&gt;queue.size=100&lt;/code&gt; did take effect (otherwise it&amp;rsquo;d still be climbing). But the Flink queue holds 2 × 97 MiB ≈ 194 MiB, plus the main thread holds another batch. Together, still overflowing the 537 MiB &lt;code&gt;task.heap&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;h3 id=&#34;35-the-key-insight-two-independent-queues-two-different-capacity-units&#34;&gt;3.5 The key insight: two independent queues, two different capacity units&#xA;&lt;/h3&gt;&lt;p&gt;A critical question surfaced here, and only the code could answer it: &lt;strong&gt;is &lt;code&gt;FutureCompletingBlockingQueue&lt;/code&gt; sized by &lt;code&gt;max.batch.size&lt;/code&gt;, or is it independent?&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;p&gt;From Apache Flink CDC &lt;code&gt;release-3.2.1&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;MySqlSource.java:167&lt;/code&gt; — the Flink-side queue is constructed with the no-arg form:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;FutureCompletingBlockingQueue&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;RecordsWithSplitIds&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;SourceRecords&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt; elementsQueue &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; FutureCompletingBlockingQueue&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&amp;gt;&lt;/span&gt;();   &lt;span style=&#34;color:#6272a4&#34;&gt;// no-arg → default capacity&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Flink 1.18.1 &lt;code&gt;FutureCompletingBlockingQueue.java:109&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;FutureCompletingBlockingQueue&lt;/span&gt;() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;this&lt;/span&gt;(SourceReaderOptions.&lt;span style=&#34;color:#50fa7b&#34;&gt;ELEMENT_QUEUE_CAPACITY&lt;/span&gt;.&lt;span style=&#34;color:#50fa7b&#34;&gt;defaultValue&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;code&gt;SourceReaderOptions.java:36-40&lt;/code&gt;: &lt;code&gt;ELEMENT_QUEUE_CAPACITY&lt;/code&gt; defaults to &lt;strong&gt;2 elements&lt;/strong&gt; (config key &lt;code&gt;source.reader.element.queue.capacity&lt;/code&gt;).&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;FutureCompletingBlockingQueue.java:193&lt;/code&gt; — &lt;code&gt;put()&lt;/code&gt; bounds by element count, not bytes:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;while&lt;/span&gt; (queue.&lt;span style=&#34;color:#50fa7b&#34;&gt;size&lt;/span&gt;() &lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;=&lt;/span&gt; capacity) {   &lt;span style=&#34;color:#6272a4&#34;&gt;// ★ counts elements, not bytes&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ...&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    waitOnPut(threadIndex);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;code&gt;BinlogSplitReader.java:147-162&lt;/code&gt; — one important detail:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; Iterator&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;SourceRecords&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;pollSplitRecords&lt;/span&gt;() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;final&lt;/span&gt; List&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;SourceRecord&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; sourceRecords &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; ArrayList&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&amp;gt;&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (currentTaskRunning) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        List&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;DataChangeEvent&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; batch &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; queue.&lt;span style=&#34;color:#50fa7b&#34;&gt;poll&lt;/span&gt;();   &lt;span style=&#34;color:#6272a4&#34;&gt;// ← Debezium ChangeEventQueue.poll()&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#ff79c6&#34;&gt;for&lt;/span&gt; (DataChangeEvent event : batch) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (shouldEmit(event.&lt;span style=&#34;color:#50fa7b&#34;&gt;getRecord&lt;/span&gt;())) {      &lt;span style=&#34;color:#6272a4&#34;&gt;// ★ filter happens AFTER poll&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                sourceRecords.&lt;span style=&#34;color:#50fa7b&#34;&gt;add&lt;/span&gt;(event.&lt;span style=&#34;color:#50fa7b&#34;&gt;getRecord&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ...&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;code&gt;StatefulTaskContext.java:139-151&lt;/code&gt; — Debezium&amp;rsquo;s queue is built with:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; ChangeEventQueue.&lt;span style=&#34;color:#50fa7b&#34;&gt;Builder&lt;/span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;DataChangeEvent&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt;()&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        .&lt;span style=&#34;color:#50fa7b&#34;&gt;maxBatchSize&lt;/span&gt;(connectorConfig.&lt;span style=&#34;color:#50fa7b&#34;&gt;getMaxBatchSize&lt;/span&gt;())&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        .&lt;span style=&#34;color:#50fa7b&#34;&gt;maxQueueSize&lt;/span&gt;(queueSize)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        .&lt;span style=&#34;color:#50fa7b&#34;&gt;maxQueueSizeInBytes&lt;/span&gt;(connectorConfig.&lt;span style=&#34;color:#50fa7b&#34;&gt;getMaxQueueSizeInBytes&lt;/span&gt;())  &lt;span style=&#34;color:#6272a4&#34;&gt;// default 0 = no byte cap&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        .&lt;span style=&#34;color:#50fa7b&#34;&gt;build&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Debezium &lt;code&gt;ChangeEventQueue.poll()&lt;/code&gt; (v1.9.8.Final, simplified — actual impl also honors &lt;code&gt;maxQueueSizeInBytes&lt;/code&gt; and a &lt;code&gt;pollInterval&lt;/code&gt; timeout):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;8&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8be9fd;font-style:italic&#34;&gt;public&lt;/span&gt; List&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;T&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#50fa7b&#34;&gt;poll&lt;/span&gt;() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    List&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt;T&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;gt;&lt;/span&gt; records &lt;span style=&#34;color:#ff79c6&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ff79c6&#34;&gt;new&lt;/span&gt; ArrayList&lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&amp;gt;&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ...&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;while&lt;/span&gt; (&lt;span style=&#34;color:#ff79c6&#34;&gt;!&lt;/span&gt;queue.&lt;span style=&#34;color:#50fa7b&#34;&gt;isEmpty&lt;/span&gt;() &lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; records.&lt;span style=&#34;color:#50fa7b&#34;&gt;size&lt;/span&gt;() &lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;&lt;/span&gt; maxBatchSize) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        records.&lt;span style=&#34;color:#50fa7b&#34;&gt;add&lt;/span&gt;(queue.&lt;span style=&#34;color:#50fa7b&#34;&gt;poll&lt;/span&gt;());&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ff79c6&#34;&gt;return&lt;/span&gt; records;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;code&gt;poll()&lt;/code&gt; returns up to &lt;code&gt;maxBatchSize&lt;/code&gt; events; it stops when the queue is empty. With &lt;code&gt;queue.size=100&lt;/code&gt; and &lt;code&gt;batch.size=2048&lt;/code&gt;, &lt;code&gt;poll()&lt;/code&gt; takes at most 100 (capped by current queue depth); no exception.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Risk note&lt;/strong&gt;: Debezium&amp;rsquo;s &lt;code&gt;CommonConnectorConfig.java:344&lt;/code&gt; has a &lt;code&gt;validateMaxQueueSize&lt;/code&gt; assertion:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-java&#34; data-lang=&#34;java&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;if&lt;/span&gt; (maxQueueSize &lt;span style=&#34;color:#ff79c6&#34;&gt;&amp;lt;=&lt;/span&gt; maxBatchSize) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    problems.&lt;span style=&#34;color:#50fa7b&#34;&gt;accept&lt;/span&gt;(field, maxQueueSize, &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;Must be larger than the maximum batch size&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;My &lt;code&gt;queue=100 + batch=2048&lt;/code&gt; (default) configuration &lt;strong&gt;violates&lt;/strong&gt; this invariant. It ran fine in experiment 5 (§4) without complaint. I haven&amp;rsquo;t chased down why (whether Flink CDC&amp;rsquo;s init path skips this validator, or whether it&amp;rsquo;s non-fatal in this version). But Debezium&amp;rsquo;s docs and code both expect &lt;code&gt;queue &amp;gt; batch&lt;/code&gt;. A safer production config sets &lt;code&gt;batch=50&lt;/code&gt; so the invariant holds; see §5.&lt;/p&gt;&#xA;&lt;h4 id=&#34;two-independent-queues-two-independent-caps&#34;&gt;Two independent queues, two independent caps&#xA;&lt;/h4&gt;&lt;p&gt;Both queues hold only target-table events. &lt;code&gt;table.include.list&lt;/code&gt; is threaded from Nacos &lt;code&gt;table-name&lt;/code&gt; through &lt;code&gt;MysqlDatabaseSync.buildCdcSource()&lt;/code&gt; into Debezium (&lt;code&gt;MySqlSourceConfigFactory.java:340-341&lt;/code&gt;).&lt;/p&gt;&#xA;&lt;p&gt;The structure:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;14&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;15&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;16&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;17&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;18&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;19&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;20&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;21&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;22&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;23&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;24&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;25&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;26&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;27&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;28&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;29&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;30&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;31&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;32&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;33&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;34&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;35&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;36&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;37&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;38&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;┌─ Flink TaskManager (task.heap) ────────────────────────────────────────┐&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                                                                        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  Debezium reader thread                                                │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  ┌──────────────────────────────────────────────────────────────────┐  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  │ Debezium ChangeEventQueue                                        │  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  │   capacity = max.queue.size EVENTS (default 8192)                │  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  │   holds DataChangeEvent (thin wrapper around SourceRecord)       │  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  └──────────────────────────────────────────────────────────────────┘  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │ poll() returns up to max.batch.size events        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │          (default 2048)                           │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    ▼                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  BinlogSplitReader.pollSplitRecords():                                 │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│     for each event: if shouldEmit(e) add to sourceRecords              │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│     wrap as one MySqlRecords                                           │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │ put(), blocks when queue.size() &amp;gt;= capacity       │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    ▼                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  ┌──────────────────────────────────────────────────────────────────┐  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  │ Flink FutureCompletingBlockingQueue                              │  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  │   capacity = 2 ELEMENTS (source.reader.element.queue.capacity)   │  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  │   each element = one MySqlRecords (up to max.batch.size events)  │  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  └──────────────────────────────────────────────────────────────────┘  │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │ SourceReaderBase.pollNext()                       │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    ▼                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  Main source thread: holds 1 in-flight batch                           │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    │ filter, serialize, forward                        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                    ▼                                                   │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                ... sink writer                                         │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                                                                        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│  ▲ SIMULTANEOUSLY ALIVE in heap, at worst case:                        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│      Debezium queue             :  max.queue.size × R_per_event        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│   +  Flink queue (capacity = 2) :  2 × max.batch.size × R_per_event    │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│   +  Main-thread in-flight batch:      max.batch.size × R_per_event    │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│                                                                        │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;└────────────────────────────────────────────────────────────────────────┘&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Numerically, at 12g prod with fat-rich workload:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Position&lt;/th&gt;&#xA;          &lt;th&gt;Holds&lt;/th&gt;&#xA;          &lt;th&gt;Upper-bound formula&lt;/th&gt;&#xA;          &lt;th&gt;Default-config estimate&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Debezium &lt;code&gt;ChangeEventQueue&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;DataChangeEvent&lt;/code&gt; wrapping &lt;code&gt;SourceRecord&lt;/code&gt; (target table only)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;max.queue.size × R_per_event&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;batch-avg: 8192 × 230 KiB ≈ &lt;strong&gt;1.8 GiB&lt;/strong&gt;. Pathological: 8192 × 14 MiB ≈ 110 GiB.&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Flink &lt;code&gt;FutureCompletingBlockingQueue&lt;/code&gt; (cap=2)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;MySqlRecords&lt;/code&gt; wrapping &lt;code&gt;SourceRecords&lt;/code&gt; (target table only)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;2 × max.batch.size × R_per_event&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;batch-avg: 2 × 2048 × 970 KiB ≈ &lt;strong&gt;4 GiB&lt;/strong&gt;. Pathological: 2 × 2048 × 14 MiB ≈ 56 GiB.&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Main-thread in-flight batch&lt;/td&gt;&#xA;          &lt;td&gt;same&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;max.batch.size × R_per_event&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;batch-avg: 2048 × 970 KiB ≈ &lt;strong&gt;2 GiB&lt;/strong&gt;. Pathological: 2048 × 14 MiB ≈ 28 GiB.&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;&amp;ldquo;batch-avg&amp;rdquo; is the fat-rich-slice MAT observation from §3.4; &amp;ldquo;pathological&amp;rdquo; is the §5.2 assumption that every event is a max-fat row. Both columns are target-table only; the difference is the event-size assumption.&lt;/p&gt;&#xA;&lt;p&gt;All three theoretically stack on the same &lt;code&gt;task.heap&lt;/code&gt;, but in steady state they sit well below their caps. Runtime event flow is throttled by backpressure; the main thread keeps consuming so the Flink queue usually holds just 1 batch; the Debezium queue stays far below 8192 on average. All three simultaneously near their caps is the extreme instant of &amp;ldquo;main thread can&amp;rsquo;t drain while Debezium is still filling.&amp;rdquo; Not steady state.&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;max.batch.size&lt;/code&gt; alone is insufficient. Shrinking only it trims the lower two rows; the upstream Debezium queue at default &lt;code&gt;queue.size=8192&lt;/code&gt; can still accumulate ~1.8 GiB (batch-avg, fat-rich workload). Production never changed &lt;code&gt;max.batch.size&lt;/code&gt;, but the duality matters: even if someone tried &amp;ldquo;just turn the batch down&amp;rdquo; as a fix in the future, it would still fail because the upstream queue remains unbounded in event count.&lt;/p&gt;&#xA;&lt;h3 id=&#34;36-row-size-long-tail-and-slice-selectivity&#34;&gt;3.6 Row-size long tail and slice selectivity&#xA;&lt;/h3&gt;&lt;p&gt;Whole-table bucket distribution (&lt;code&gt;SELECT ... GROUP BY bucket&lt;/code&gt;, column = sum of five large text columns = &lt;code&gt;content_json + 4 others&lt;/code&gt;):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;8&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| bucket        | rows    | avg_KiB | cumulative_KiB |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| a &amp;lt;10 KiB     | 57,597  |   5.3   |       305,264  |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| b 10-50 KiB   | 180,973 |  38.8   |     7,021,752  |   ← 71%, the main mode&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| c 50-100 KiB  |   5,318 |  56.8   |       302,062  |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| d 100-500 KiB |     337 | 219.3   |        73,926  |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| e 500 KiB-1 MiB| 10,426 | 798.4   |     8,324,118  |   ← 4.1% but 51% of total bytes&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| f 1-5 MiB     |     198 |1,672.9  |       331,234  |   ← 0.08%, max = 3.47 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| total         | 254,849 |         |    16,358,356  |   ≈ 15.97 GiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Weighted mean: &lt;strong&gt;65 KiB/row&lt;/strong&gt; across the five big columns. Compared to &lt;code&gt;avg_row_length = 83 KiB&lt;/code&gt; (all columns + page overhead), the 18 KiB delta is reasonable.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Bimodal distribution&lt;/strong&gt;: 71% is normal 10-50 KiB business rows; 4% is 500 KiB+ fat rows. The 83 KiB average is a mean driven by outliers. Don&amp;rsquo;t let it fool you.&lt;/p&gt;&#xA;&lt;p&gt;Slice-specific distribution (&lt;code&gt;id BETWEEN 1870319408066945025 AND 1913089719906832385&lt;/code&gt;, a snowflake-ID range corresponding to ~4 months of business data):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| a &amp;lt;10 KiB        |  5,170 |   5.3   |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| b 10-50 KiB      |  3,468 |  13.5   |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| c 50-100 KiB     |     29 |  65.9   |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| d 100-500 KiB    |      0 |    -    |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| e 500 KiB-1 MiB  |  1,332 | 801.9   |   ← 13.3%, 3× the table-wide concentration&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| f 1-5 MiB        |      1 |1,518.6  |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| total 10,000 rows, 1,145,780 KiB ≈ 1.1 GiB big-field bytes |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Fat ratio in this slice is 3× the whole-table rate. Id range and fatness are correlated here. Why the correlation exists at the business layer isn&amp;rsquo;t verified; it could be that a certain category of records was predominantly created during that time window, or it could be a snowflake-id-to-business coincidence. &lt;strong&gt;Every OOM conclusion below assumes production will encounter similarly fat-rich UPDATEs&lt;/strong&gt;. Uniformly distributed UPDATEs would be significantly lighter.&lt;/p&gt;&#xA;&lt;p&gt;Weighted binlog bytes per row for this slice: &lt;code&gt;1.1 GiB × 2 (BEFORE+AFTER, binlog_row_image=FULL) / 10000 rows ≈ 230 KiB/row&lt;/code&gt;. (Small columns and binlog event header overhead are &amp;lt;10%, ignored here.) This matches the 2.7 GiB binlog file observation from §3.1 (2.2 GiB our transaction + 0.5 GiB background).&lt;/p&gt;&#xA;&lt;h3 id=&#34;37-hypothesis-two-where-the-heap-bloat-comes-from&#34;&gt;3.7 Hypothesis two: where the heap bloat comes from&#xA;&lt;/h3&gt;&lt;p&gt;Pausing to review the numbers, an instinct flagged something: Java object-header overhead shouldn&amp;rsquo;t be on the order of 7×. The header is tens of bytes, and a hundred-KiB string shouldn&amp;rsquo;t balloon that much. That question pushed me back to pin down the actual source of the bloat. Spoiler: it&amp;rsquo;s not headers; it&amp;rsquo;s two independent 2× factors stacking.&lt;/p&gt;&#xA;&lt;p&gt;Arthas measurements (&lt;code&gt;com.taobao.arthas 4.1.8&lt;/code&gt;, injected via &lt;code&gt;jattach&lt;/code&gt; into a JRE-only container):&lt;/p&gt;&#xA;&lt;p&gt;Verify runtime config is actually applied:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;gt; vmtool --action getInstances --className io.debezium.connector.base.ChangeEventQueue \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    --limit 2 --express &amp;#39;instances[0].maxBatchSize + &amp;#34;|&amp;#34; + instances[0].maxQueueSize&amp;#39;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@String[100|100]   ✓&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Verify schema is shared (rule out &amp;ldquo;each record carries its own Schema&amp;rdquo;):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;gt; vmtool ... ConnectSchema --limit 1 --express &amp;#39;instances.length&amp;#39;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@String[ConnectSchema_count=1]   ✓ one global instance&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Drill into a fat row:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;gt; UPDATE ... WHERE id = 1886996929695858689   (single-column content_json = 801 KiB UTF-8)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;gt; vmtool ... SourceRecord ... --express &amp;#39;instances[0].value.get(&amp;#34;before&amp;#34;).get(&amp;#34;content_json&amp;#34;).length()&amp;#39;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@String[before.content_json.length=817292]   ← 817,292 chars&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;gt; vmtool ... SourceRecord ... --express &amp;#39;instances[0].value.get(&amp;#34;after&amp;#34;).get(&amp;#34;content_json&amp;#34;).length()&amp;#39;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;@String[after.content_json.length=817292]&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Note: the §3.6 &amp;ldquo;e&amp;rdquo; bucket average of 798 KiB is a sum across five columns (&lt;code&gt;content_json&lt;/code&gt; + four others). The 801 KiB here is &lt;code&gt;content_json&lt;/code&gt; alone. They&amp;rsquo;re close on this row only because the other four columns are nearly empty on fat rows; &lt;code&gt;content_json&lt;/code&gt; dominates. The &amp;ldquo;800 KiB&amp;rdquo; used in the breakdown below refers to the measured &lt;code&gt;content_json&lt;/code&gt; value.&lt;/p&gt;&#xA;&lt;p&gt;Analysis:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;MySQL &lt;code&gt;content_json&lt;/code&gt; is 801 KiB (UTF-8 bytes).&lt;/li&gt;&#xA;&lt;li&gt;Java &lt;code&gt;String.length()&lt;/code&gt; = 817,292 chars.&lt;/li&gt;&#xA;&lt;li&gt;The JSON contains non-Latin-1 characters (Chinese clinical text embedded in the values), so JDK&amp;rsquo;s compact-string optimization falls back to UTF-16 encoding. The &lt;code&gt;char[]&lt;/code&gt; is 2 bytes × 817,292 chars ≈ 1.6 MiB on heap.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Bloat chain (this behavior requires English keys plus some non-Latin-1 values; pure-ASCII wouldn&amp;rsquo;t trigger UTF-16 fallback, and pure-Chinese text actually shrinks by ~0.67× going UTF-8 → UTF-16):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt; 9&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;10&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;11&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;12&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;13&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;14&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   MySQL column `content_json`            801 KiB (UTF-8 bytes)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     │ Debezium parses the binlog row into Java&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     ▼&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   Java String (coder=1, UTF-16)        ≈ 1.6 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     │                    (char[] × 2 bytes;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     │                     triggered by any non-Latin-1 char)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     │&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     │ binlog_row_image=FULL → Envelope.before + Envelope.after&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                     ▼&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        BEFORE String    ≈ 1.6 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     +  AFTER  String    ≈ 1.6 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     ──────────────────────────────&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     One fat event on heap ≈ 3.1 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Step-by-step evidence:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Step&lt;/th&gt;&#xA;          &lt;th&gt;Number&lt;/th&gt;&#xA;          &lt;th&gt;Evidence&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;MySQL &lt;code&gt;content_json&lt;/code&gt; UTF-8 bytes&lt;/td&gt;&#xA;          &lt;td&gt;801 KiB&lt;/td&gt;&#xA;          &lt;td&gt;Direct: &lt;code&gt;SELECT LENGTH(content_json)&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Corresponding Java &lt;code&gt;String.length()&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;813,965 to 817,292 chars (varies per row)&lt;/td&gt;&#xA;          &lt;td&gt;Direct: arthas &lt;code&gt;instances[0].value.get(&amp;quot;before&amp;quot;).get(&amp;quot;content_json&amp;quot;).length()&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Java &lt;code&gt;String.coder&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;coder=1&lt;/code&gt; (UTF-16)&lt;/td&gt;&#xA;          &lt;td&gt;Direct (added for this writeup): arthas &lt;code&gt;instances[0].value.get(&amp;quot;before&amp;quot;).get(&amp;quot;content_json&amp;quot;).coder&lt;/code&gt;. BEFORE and AFTER both return 1.&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Java String internal &lt;code&gt;byte[]&lt;/code&gt; length&lt;/td&gt;&#xA;          &lt;td&gt;1,626,476 bytes ≈ 1.55 MiB&lt;/td&gt;&#xA;          &lt;td&gt;Direct: arthas &lt;code&gt;value.length&lt;/code&gt;. Matches &lt;code&gt;chars × 2&lt;/code&gt; to a few bytes (alignment/padding).&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;BEFORE + AFTER are two independent Strings&lt;/td&gt;&#xA;          &lt;td&gt;× 2 ≈ 3.1 MiB&lt;/td&gt;&#xA;          &lt;td&gt;Code fact: &lt;code&gt;binlog_row_image=FULL&lt;/code&gt; carries both images; Debezium&amp;rsquo;s &lt;code&gt;Envelope&lt;/code&gt; stores &lt;code&gt;before&lt;/code&gt; and &lt;code&gt;after&lt;/code&gt; as independent Struct fields.&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Struct / Schema / HashMap fixed overhead&lt;/td&gt;&#xA;          &lt;td&gt;negligible (~KB)&lt;/td&gt;&#xA;          &lt;td&gt;Arthas direct: ConnectSchema global &lt;code&gt;instance_count=1&lt;/code&gt;; Struct internals are &lt;code&gt;Object[]&lt;/code&gt;.&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Single fat event on heap, total estimate&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;≈ 3.1 MiB&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;Sum of the above. Every key step is directly measured (&lt;code&gt;coder=1&lt;/code&gt;, &lt;code&gt;byte[].length&lt;/code&gt;).&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;Cross-check against §3.4&amp;rsquo;s MAT observation (&lt;code&gt;97 MiB / 100 events = 970 KiB/event&lt;/code&gt;): with 30% fat + 70% normal composition (§3.8 derives this), the per-event mean comes to &lt;code&gt;0.30 × 3.1 MiB + 0.70 × 0.05 MiB = 0.965 MiB ≈ 988 KiB&lt;/code&gt;. Off from 970 KiB by 18 KiB, about 1.9%. The bloat chain closes.&lt;/p&gt;&#xA;&lt;p&gt;The &amp;ldquo;Java-headers-shouldn&amp;rsquo;t-be-7×&amp;rdquo; instinct was right. The real bloat sources are BEFORE+AFTER and UTF-16, two independent 2× factors stacked, not object-header overhead. (Both factors are table-shape-dependent.)&lt;/p&gt;&#xA;&lt;p&gt;One more thing: BEFORE lives on the heap through the entire source → queue → main-thread pipeline. The sink-side default &lt;code&gt;ignoreUpdateBefore=true&lt;/code&gt; (&lt;code&gt;DorisExecutionOptions.java:291&lt;/code&gt;) does &lt;strong&gt;not&lt;/strong&gt; help upstream heap occupancy. It only controls whether &lt;code&gt;JsonDebeziumDataChange.extractUpdate&lt;/code&gt; writes BEFORE into the Doris stream-load body. &lt;strong&gt;&lt;code&gt;ignoreUpdateBefore&lt;/code&gt; cannot rescue a source-side OOM.&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;h3 id=&#34;38-back-solving-the-97-mib-mysqlrecords-batch-flink-layer&#34;&gt;3.8 Back-solving the 97 MiB &lt;code&gt;MySqlRecords&lt;/code&gt; batch (Flink layer)&#xA;&lt;/h3&gt;&lt;p&gt;The fat concentration inside one batch can easily exceed the slice&amp;rsquo;s table-wide 13%. Binlog writes events in PK order, and fat rows cluster on adjacent ids, so a fat cluster&amp;rsquo;s worth of them can land in a single batch. Back-solving from the MAT retained:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;R_flink = 97 MiB / 100 events = 970 KiB/event (batch-avg, retained)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Estimate with fat event = 3.1 MiB (from §3.7&amp;rsquo;s direct measurement) and normal event ≈ 0.05 MiB (slice bucket-b average = 13.5 KiB × 4× BEFORE+AFTER+UTF-16 bloat ≈ 54 KiB ≈ 0.05 MiB):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;p × 3.1 MiB + (1 - p) × 0.05 MiB = 0.97 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;→ p ≈ 30%&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;About 30% of the batch is fat rows (about 30 out of 100), vs the slice&amp;rsquo;s overall 13%. A plausible-but-unverified explanation: InnoDB executes UPDATEs in PK order, and binlog writes in PK order too, so if fat rows cluster in id space, individual batches can have higher fat concentration than the slice average. I didn&amp;rsquo;t directly measure event id-vs-size correlation; this is a consistency argument, not proof.&lt;/p&gt;&#xA;&lt;h3 id=&#34;39-live-queue-backpressure-arthas&#34;&gt;3.9 Live-queue backpressure (arthas)&#xA;&lt;/h3&gt;&lt;p&gt;With &lt;code&gt;2g + queue=100 + batch=100&lt;/code&gt;, I re-ran 10k UPDATE and polled both queues every 3 to 6 seconds (sampling cadence misses some instantaneous peaks, but 5 consecutive &lt;code&gt;FlinkQ=2/2&lt;/code&gt; observations are sufficient evidence of saturation):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;8&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:52:30 DebeziumQ=0/100   FlinkQ=0/2      pre-commit&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:52:35 DebeziumQ=47/100  FlinkQ=2/2 (full) burst arrives&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:52:40 DebeziumQ=1/100   FlinkQ=2/2      fetcher blocked on put&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:52:52 DebeziumQ=11/100  FlinkQ=2/2&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:53:02 DebeziumQ=26/100  FlinkQ=2/2&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:53:18 DebeziumQ=77/100  FlinkQ=2/2&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:53:24 arthas SIGKILL (exit 137)  ← TM OOM&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;15:53:28 pod NotFound&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Frame before OOM: &lt;code&gt;2×100 (Flink, SourceRecords inside MySqlRecords) + 100 (main thread, SourceRecords) + 77 (Debezium, DataChangeEvent) ≈ 377 event objects alive simultaneously&lt;/code&gt; (300 SourceRecord + 77 DataChangeEvent; all point to SourceRecord objects, since DataChangeEvent is a thin wrapper). Backpressure is working exactly as designed. &lt;strong&gt;The container heap is simply too small.&lt;/strong&gt; (§3.10 shows &lt;code&gt;managed.fraction=0.1&lt;/code&gt; as a way to reclaim heap without adding RAM.)&lt;/p&gt;&#xA;&lt;h3 id=&#34;310-final-validation-managedfraction01-with-caveats-on-single-diff-coverage&#34;&gt;3.10 Final validation: &lt;code&gt;managed.fraction=0.1&lt;/code&gt; (with caveats on single-diff coverage)&#xA;&lt;/h3&gt;&lt;p&gt;Flink 1.18 defaults &lt;code&gt;taskmanager.memory.managed.fraction=0.4&lt;/code&gt;, reserving 40% of &lt;code&gt;flink.size&lt;/code&gt; for RocksDB. The pipeline uses very little state (binlog offset + small sink state, well under 1 GiB). That 40% is recoverable.&lt;/p&gt;&#xA;&lt;p&gt;At 2g, plug into the formula:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;flink.size      = 2048 - 256(metaspace) - 205(overhead)        = 1587 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;fraction=0.4: managed=635, network=159, task.heap = 1587 - 128 - 128 - 635 - 159 = 537 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;fraction=0.1: managed=159, network=159, task.heap = 1587 - 128 - 128 - 159 - 159 = 1014 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;TM startup log confirms:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;taskmanager.memory.managed.size=166429984b    ≈ 159 MiB   ✓&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;taskmanager.memory.task.heap.size=1063004400b ≈ 1014 MiB  ✓&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;-Xmx1197222128                                ≈ 1142 MiB = task.heap + framework.heap&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;code&gt;task.heap&lt;/code&gt; goes from 537 MiB to 1014 MiB, a &lt;strong&gt;+89%&lt;/strong&gt; gain.&lt;/p&gt;&#xA;&lt;p&gt;Experiment 5 config: &lt;code&gt;2g TM + debezium.max.queue.size=100 + taskmanager.memory.managed.fraction=0.1 + batch.size=2048 (default, capped by queue=100 at runtime)&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Result:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;16:35:54 MySQL commit (10k rows)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;16:36:18 TM mem 1263 → 1670 MiB (+407),  CPU 1046m → 1579m&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;16:36:34 CPU 2000m (2 cores saturated),  mem 1659 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;~3 min period CPU stays at ~2000m,  mem stable 1660-1704 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;16:39:36 CPU drops to 1020m,  mem 1683 MiB   ← processing complete, 3m42s&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;TM stays Running for the following 12 min&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Doris-side consistency: &lt;code&gt;SELECT COUNT(*) WHERE id IN range AND modify_time &amp;gt;= &#39;2026-04-21 16:35:00&#39;&lt;/code&gt; returns &lt;strong&gt;10,000/10,000&lt;/strong&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Methodological caveat (caught in post-hoc review): the transition from experiment 4 to experiment 5 changed two variables (&lt;code&gt;batch=100 → 2048&lt;/code&gt; + &lt;code&gt;fraction=0.4 → 0.1&lt;/code&gt;). No single-variable control was run. Reasoning about each alone:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;fraction=0.1&lt;/code&gt; alone (leave &lt;code&gt;queue.size=8192&lt;/code&gt; default): &lt;code&gt;task.heap&lt;/code&gt; becomes 1014 MiB (formula-matched with TM log). But Debezium queue&amp;rsquo;s theoretical upper bound is &lt;code&gt;max.queue.size × R_per_event = 8192 × 230 KiB ≈ 1.8 GiB&lt;/code&gt;, already exceeding 1014 MiB &lt;code&gt;task.heap&lt;/code&gt;. Whether it actually OOMs depends on whether runtime backpressure lets the queue near that bound. Not measured. &amp;ldquo;Theoretical cap &amp;gt; available heap&amp;rdquo; is not a safe plan.&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;queue=100&lt;/code&gt; alone (leave &lt;code&gt;fraction=0.4&lt;/code&gt;): experiment 4 (&lt;code&gt;2g + queue=100 + batch=100 + fraction=0.4&lt;/code&gt;) measured OOM. &lt;code&gt;task.heap&lt;/code&gt; is only 537 MiB, and MAT data (§3.4) shows Flink queue&amp;rsquo;s two batches + main thread&amp;rsquo;s one batch already eat ~260 MiB + JVM baseline.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;So the provable conclusion: at the 2g tier, experiment 5&amp;rsquo;s combination passed, experiment 4 (queue=100 alone) failed. &amp;ldquo;Both diffs are necessary&amp;rdquo; is an inference from the experiment 5 vs 4 differential. &amp;ldquo;Is &lt;code&gt;fraction=0.1&lt;/code&gt; alone enough?&amp;rdquo; was not directly tested. The 12g/16g extrapolations are formula + assumption, not measured.&lt;/p&gt;&#xA;&lt;h3 id=&#34;311-follow-up-validations&#34;&gt;3.11 Follow-up validations&#xA;&lt;/h3&gt;&lt;p&gt;Two follow-up experiments run after the main story ended. Both are confidence-builders, not new conclusions.&lt;/p&gt;&#xA;&lt;h4 id=&#34;3111-control-non-fat-slice-10k-update-on-bucket-4&#34;&gt;3.11.1 Control: non-fat slice (10k UPDATE on bucket 4)&#xA;&lt;/h4&gt;&lt;p&gt;To confirm the fat-rich slice is a necessary condition for OOM, I ran the same config (&lt;code&gt;2g + queue=100 + fraction=0.1&lt;/code&gt;) against a non-fat slice.&lt;/p&gt;&#xA;&lt;p&gt;First, an id-range distribution scan (table split into 10 id buckets):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| bucket | row_cnt  | fat_pct | min_id               |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| 0,1    | ~1.7k    | 0.0%    | 1838... 1879...      |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| 2      |   3,144  | 24.1%   |                      |  ← densest fat region&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| 3      |  10,807  |  9.3%   |                      |  ← §2.4&amp;#39;s experiment slice is in here&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| 4      | 162,929  |  0.9%   | 1921... 1942...      | ← table body, almost no fat&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| 5      |  13,200  | 13.7%   |                      |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;| 6-9    |  ~63k    | 7-12%   |                      |&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Workload: bucket 4, &lt;code&gt;id BETWEEN 1921717482026164226 AND 1942249500063379458 LIMIT 10000&lt;/code&gt; (hit 10,000 exactly).&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Metric&lt;/th&gt;&#xA;          &lt;th&gt;§3.10 fat-rich slice (13.3%)&lt;/th&gt;&#xA;          &lt;th&gt;&lt;strong&gt;Non-fat bucket 4 (0.9%)&lt;/strong&gt;&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;UPDATE InnoDB runtime&lt;/td&gt;&#xA;          &lt;td&gt;5m45s&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;43 seconds&lt;/strong&gt; (small rows, fast page updates)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Drain CPU saturation duration&lt;/td&gt;&#xA;          &lt;td&gt;~3 min (2000m)&lt;/td&gt;&#xA;          &lt;td&gt;~2 min (2000m)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;TM container memory delta&lt;/td&gt;&#xA;          &lt;td&gt;+407 MiB (1263 → 1670)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;+20 MiB&lt;/strong&gt; (1726 → 1746)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Drain completion&lt;/td&gt;&#xA;          &lt;td&gt;commit + ~3m42s&lt;/td&gt;&#xA;          &lt;td&gt;commit + ~2 min&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;Delta ratio is ~20×, same order of magnitude as per-event heap occupancy (fat 3.1 MiB / normal ~0.08 MiB ≈ 40×). The gap vs 40× is because Flink queue cap = 100 events also throttles in-flight count; drain rate and sink throughput are basically the same, only the event size differs.&lt;/p&gt;&#xA;&lt;p&gt;Baseline note: the fat-slice run started at 1263 MiB; the bucket-4 run started at 1726 MiB. TM wasn&amp;rsquo;t restarted between the two, so old gen had accumulated retained objects. The two baselines aren&amp;rsquo;t fully independent. The 20× delta conclusion still holds: the accumulated retained is a shared baseline drift, not something that distorts the delta comparison. Strict apples-to-apples would require a TM restart between runs; not done here.&lt;/p&gt;&#xA;&lt;p&gt;Implication: §5.2&amp;rsquo;s &amp;ldquo;5.6 GiB worst-case upper bound&amp;rdquo; is a pure theoretical ceiling. Real production heap usage is dictated by whether the UPDATE hits a fat-rich id range. Hitting fat pushes into GB territory; hitting normal rows is nearly a no-op.&lt;/p&gt;&#xA;&lt;h4 id=&#34;3112-simultaneous-two-queue-snapshot&#34;&gt;3.11.2 Simultaneous two-queue snapshot&#xA;&lt;/h4&gt;&lt;p&gt;To stress-test §1&amp;rsquo;s &amp;ldquo;timing-sampling-bias&amp;rdquo; explanation of the 4× &lt;code&gt;R_dbz&lt;/code&gt;-vs-&lt;code&gt;R_flink&lt;/code&gt; gap, I ran a 2000-row UPDATE on bucket 2 (24% fat), with 8 rapid arthas snapshots during drain:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;7&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;8&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;9&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;probe  SR_total  DebeziumQ   FlinkQ   MySqlRecords  DataChangeEvent&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;1      172       1/100       2/2      3             3&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;2       48       1/100       2/2      4             4&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;3       57       1/100       2/2      4             19&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;4       88      21/100       2/2      4             36&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;5      124      16/100       2/2      4             18&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;6      168       0/100       2/2      3             50&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;7      186      13/100       2/2      4             16&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;8      224      51/100       2/2      4             53&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Observations:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;FlinkQ is persistently full at 2/2 (backpressure is stable).&lt;/li&gt;&#xA;&lt;li&gt;DebeziumQ oscillates between 0 and 51 (fetcher tops it up, main thread drains it).&lt;/li&gt;&#xA;&lt;li&gt;SR_total oscillates between 48 and 224, reflecting two-queue + main-thread batch dynamics.&lt;/li&gt;&#xA;&lt;li&gt;The two layers, at any given instant, hold different events. There&amp;rsquo;s no apples-to-apples comparison of per-event retained across them.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;These probes measured queue &lt;strong&gt;depths&lt;/strong&gt; (event counts) only, not per-event size or per-queue fat ratio. The &amp;ldquo;composition is out of sync&amp;rdquo; language below is an interpretation of depth oscillation, not a direct measurement.&lt;/p&gt;&#xA;&lt;p&gt;Revisiting §1&amp;rsquo;s claim: the original OOM hprof&amp;rsquo;s 230 vs 970 KiB gap is most consistently explained by timing-sampling bias. The two queues&amp;rsquo; depths oscillate independently during drain, so it&amp;rsquo;s reasonable to infer their event &lt;strong&gt;composition&lt;/strong&gt; also drifts (otherwise the pipeline would ebb and flow as one unit). No mechanical evidence of a per-layer difference (code confirms identical object shape).&lt;/p&gt;&#xA;&lt;p&gt;Cross-scenario caveat: §3.11.2 was on bucket 2 (24% fat, 2000 rows); the original OOM hprof was on bucket 3 (13% fat, 10k rows). I didn&amp;rsquo;t redo multi-snapshot probes on the original OOM scenario. The conclusion is extrapolation assuming two-queue fluctuation mechanics don&amp;rsquo;t depend on fat density or event count.&lt;/p&gt;&#xA;&lt;h2 id=&#34;4-experiments-recap&#34;&gt;4. Experiments recap&#xA;&lt;/h2&gt;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;#&lt;/th&gt;&#xA;          &lt;th&gt;TM&lt;/th&gt;&#xA;          &lt;th&gt;fraction&lt;/th&gt;&#xA;          &lt;th&gt;batch&lt;/th&gt;&#xA;          &lt;th&gt;queue&lt;/th&gt;&#xA;          &lt;th&gt;Result&lt;/th&gt;&#xA;          &lt;th&gt;Heap peak (task.heap limit)&lt;/th&gt;&#xA;          &lt;th&gt;Notes&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;1&lt;/td&gt;&#xA;          &lt;td&gt;2g&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;2048&lt;/td&gt;&#xA;          &lt;td&gt;8192&lt;/td&gt;&#xA;          &lt;td&gt;✗ OOM&lt;/td&gt;&#xA;          &lt;td&gt;peak ~1,156 MiB; hprof 453 MiB&lt;/td&gt;&#xA;          &lt;td&gt;Flink self-restarted 8×&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;2&lt;/td&gt;&#xA;          &lt;td&gt;4g&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;2048&lt;/td&gt;&#xA;          &lt;td&gt;8192&lt;/td&gt;&#xA;          &lt;td&gt;✗ mid-drain OOM + self-restart&lt;/td&gt;&#xA;          &lt;td&gt;hprof 1.2 GiB&lt;/td&gt;&#xA;          &lt;td&gt;&amp;ldquo;looks like it finished,&amp;rdquo; but PVC has a dump&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;3&lt;/td&gt;&#xA;          &lt;td&gt;8g&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;2048&lt;/td&gt;&#xA;          &lt;td&gt;8192&lt;/td&gt;&#xA;          &lt;td&gt;✓ no OOM&lt;/td&gt;&#xA;          &lt;td&gt;4,680 MiB&lt;/td&gt;&#xA;          &lt;td&gt;drain 3m33s&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;4&lt;/td&gt;&#xA;          &lt;td&gt;2g&lt;/td&gt;&#xA;          &lt;td&gt;0.4&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;✗ OOM&lt;/td&gt;&#xA;          &lt;td&gt;hprof 446 MiB; Debezium q 23 MiB; MySqlRecords 97 MiB&lt;/td&gt;&#xA;          &lt;td&gt;both queue/batch confirmed active&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;2g&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;0.1&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;2048 (default)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;✓ no OOM&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;strong&gt;1,670 MiB&lt;/strong&gt; stable&lt;/td&gt;&#xA;          &lt;td&gt;drain 3m42s; the recommended config&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;6&lt;/td&gt;&#xA;          &lt;td&gt;12g&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;task.heap 5.1 GiB (default) / 8.3 GiB (fraction=0.1)&lt;/td&gt;&#xA;          &lt;td&gt;formula-extrapolated only; see §5&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;7&lt;/td&gt;&#xA;          &lt;td&gt;16g&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;not measured&lt;/td&gt;&#xA;          &lt;td&gt;task.heap 7.1 / 11.5 GiB&lt;/td&gt;&#xA;          &lt;td&gt;formula-extrapolated only&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;&amp;ldquo;N/A heap peak&amp;rdquo; means the TM OOMed before that time and the displayed value is the last monitoring sample, not &amp;ldquo;wasn&amp;rsquo;t sampled.&amp;rdquo;&lt;/p&gt;&#xA;&lt;h2 id=&#34;5-production-recommendations&#34;&gt;5. Production recommendations&#xA;&lt;/h2&gt;&lt;h3 id=&#34;quick-pick&#34;&gt;Quick pick&#xA;&lt;/h3&gt;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;If your current TM is&lt;/th&gt;&#xA;          &lt;th&gt;Apply&lt;/th&gt;&#xA;          &lt;th&gt;Expected behavior&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;16 GiB&lt;/strong&gt; (e.g. already expanded during an incident)&lt;/td&gt;&#xA;          &lt;td&gt;the 3 config lines from the TL;DR&lt;/td&gt;&#xA;          &lt;td&gt;zero additional hardware, full safety margin against max-fat pathological&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;12 GiB&lt;/strong&gt; (original prod spec)&lt;/td&gt;&#xA;          &lt;td&gt;the 3 config lines from the TL;DR&lt;/td&gt;&#xA;          &lt;td&gt;safe for fat-average workload; tight against max-fat pathological (see §5.1.1 for full-margin option)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&amp;lt;12 GiB&lt;/td&gt;&#xA;          &lt;td&gt;run an 8g-scale regression first; this post&amp;rsquo;s numbers don&amp;rsquo;t guarantee safety below 12 GiB&lt;/td&gt;&#xA;          &lt;td&gt;see §3.3 heap ladder&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;Before rolling to prod, run the full combo in dev against the same fat-row UPDATE workload and verify no OOM plus Doris consistency.&lt;/p&gt;&#xA;&lt;p&gt;The rest of §5 explains &lt;em&gt;why these knobs&lt;/em&gt;, &lt;em&gt;which scenarios they cover&lt;/em&gt;, and &lt;em&gt;where the safety margin breaks&lt;/em&gt; — skip to §5.2 if you care about worst-case capacity math.&lt;/p&gt;&#xA;&lt;h3 id=&#34;50-scope-of-applicability&#34;&gt;5.0 Scope of applicability&#xA;&lt;/h3&gt;&lt;p&gt;All estimates below only hold for fat-rich UPDATE workloads (per §2.4 + §3.6: 13% fat in the slice vs 4% table-wide). Uniformly distributed UPDATEs have far lower heap occupancy. Other burst shapes outside the &amp;ldquo;fat-rich failure mode&amp;rdquo; (such as high-concurrency DDL, snapshot phase) have different mechanisms and aren&amp;rsquo;t covered here.&lt;/p&gt;&#xA;&lt;h3 id=&#34;51-minimum-diff-handles-fat-average-tight-on-max-fat-pathological&#34;&gt;5.1 Minimum diff (handles fat-average; tight on max-fat pathological)&#xA;&lt;/h3&gt;&lt;p&gt;Keep TM at 12 GiB (no new hardware); change only two configs:&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Nacos &lt;code&gt;[mysql-1]&lt;/code&gt; section&lt;/strong&gt;:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;debezium.max.queue.size=100&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;debezium.max.batch.size=50     ← satisfies the queue &amp;gt; batch invariant&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;FlinkDeployment YAML &lt;code&gt;flinkConfiguration&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ff79c6&#34;&gt;taskmanager.memory.managed.fraction&lt;/span&gt;: &lt;span style=&#34;color:#f1fa8c&#34;&gt;&amp;#34;0.1&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;What this covers&lt;/strong&gt;: fat-average workloads (the batch-avg 970 KiB/event level observed at MAT in §3.4). Per §5.2&amp;rsquo;s math, &lt;code&gt;task.heap = 8.3 GiB&lt;/code&gt;, fat-avg in-flight steady state ≈ 310 MiB, nominal utilization ~3.7%. Comfortable.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Where it&amp;rsquo;s tight&lt;/strong&gt;: max-fat pathological (every in-flight event is a 3.47 MiB max row × 2 × 2 = 14 MiB). §5.2 gives in-flight ≈ 5.6 GiB / 8.3 GiB task.heap = 67% nominal. But with a G1 empirical-working-set cap of ~70%, effective capacity is ~5.8 GiB, utilization hits ~97%, safety margin just 3%. Probabilistically unlikely to hit (max row is 0.08% of the table; 400 in a row is astronomical), but very little error budget if it did.&lt;/p&gt;&#xA;&lt;p&gt;About &lt;code&gt;batch=50&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Purpose: satisfy Debezium &lt;code&gt;validateMaxQueueSize&lt;/code&gt;&amp;rsquo;s &lt;code&gt;queue &amp;gt; batch&lt;/code&gt; invariant (§3.5).&lt;/li&gt;&#xA;&lt;li&gt;Throughput: &lt;code&gt;ChangeEventQueue.poll()&lt;/code&gt; only waits &lt;code&gt;poll.interval.ms&lt;/code&gt; (default 500 ms) when the queue is empty. During burst the queue is non-empty, so &lt;code&gt;poll()&lt;/code&gt; returns immediately; batch size has negligible effect on steady-state throughput. The main overhead is that each &lt;code&gt;MySqlRecords&lt;/code&gt; holds fewer events, so the main thread&amp;rsquo;s &lt;code&gt;pollNext()&lt;/code&gt; is invoked roughly 2× as often; minor overhead.&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;batch=50&lt;/code&gt; itself wasn&amp;rsquo;t directly measured in dev. Experiment 5 ran with &lt;code&gt;queue=100 + batch=2048(default) + fraction=0.1&lt;/code&gt;. If touching &lt;code&gt;batch&lt;/code&gt; feels risky, an alternative is &lt;code&gt;batch=100 + queue=200&lt;/code&gt; (also satisfies the invariant); also not measured.&lt;/li&gt;&#xA;&lt;li&gt;Before rolling to prod, do a full-combo regression in dev: run the full &lt;code&gt;queue=100 + batch=50 + fraction=0.1&lt;/code&gt; combo on the same fat-rich 10k UPDATE and verify no OOM + Doris consistency. Don&amp;rsquo;t verify &lt;code&gt;batch=50&lt;/code&gt; alone; verify the whole target config.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;511-complete-recommendation-covers-max-fat-pathological-too&#34;&gt;5.1.1 Complete recommendation (covers max-fat pathological too)&#xA;&lt;/h3&gt;&lt;p&gt;If max-fat clustering is a real concern (extreme but not negligible), on top of the minimum diff also bump TM to 16 GiB:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;TM: 16 GiB   ← original prod was 12g; if 16g already from an emergency expansion, no change needed&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;plus the two config diffs from §5.1&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Cost depends on current prod state:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;If prod is still at the original 12 GiB TM (per §0): +4 GiB = +33% TM memory.&lt;/li&gt;&#xA;&lt;li&gt;If prod was already bumped to 16 GiB during the emergency (one version of the FlinkDeployment YAML we saw had &lt;code&gt;memory: &amp;quot;16g&amp;quot;&lt;/code&gt;): zero additional hardware, just two config lines.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;At 16 GiB + &lt;code&gt;fraction=0.1&lt;/code&gt;, &lt;code&gt;task.heap ≈ 11.5 GiB&lt;/code&gt; (§3.3 formula; not measured at 16g):&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Pathological 5.6 GiB / 11.5 GiB ≈ 49% nominal utilization.&lt;/li&gt;&#xA;&lt;li&gt;G1 70%-working-set view: 5.6 / (11.5 × 0.7) = 5.6 / 8.05 ≈ 70%. Comfortable vs 12g&amp;rsquo;s ~97%.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Trade-off:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Minimum diff (12g + 2 config lines)&lt;/strong&gt;: zero new hardware, two lines, safe for fat-avg, tight for max-fat. Good when you&amp;rsquo;re willing to accept &amp;ldquo;max-fat pathological is improbable&amp;rdquo; and want to return TM to its original size.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Complete recommendation (16g + 2 config lines)&lt;/strong&gt;: +4 GiB (+33%) vs 12g, or zero additional if already expanded during the incident. Covers the max-fat envelope with G1 room to spare. Good for conservative deployments or &amp;ldquo;since we already expanded, let&amp;rsquo;s keep it.&amp;rdquo;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Experiments only went up to &lt;code&gt;2g + 2 config lines&lt;/code&gt; (fat-avg passed, max-fat not validated); 12g and 16g are formula-extrapolated. Before production, run a full-config regression in dev or staging either way.&lt;/p&gt;&#xA;&lt;h3 id=&#34;52-12g-prod-capacity-extrapolation-formula-extrapolated-not-measured&#34;&gt;5.2 12g prod capacity extrapolation (formula-extrapolated, not measured)&#xA;&lt;/h3&gt;&lt;p&gt;At 12g + &lt;code&gt;fraction=0.1&lt;/code&gt;, &lt;code&gt;task.heap ≈ 8.3 GiB&lt;/code&gt; (§3.3 formula: flink.size 11008 − framework 256 − managed 1101 − network 1101 ≈ 8550 MiB).&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Debezium queue heap upper bound = 100 × R_per_event(230 KiB)    ≈  23 MiB   ← MAT batch-avg&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Flink queue heap upper bound    = 2 × 100 × R_per_event(970 KiB) ≈ 190 MiB  ← MAT batch-avg&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Main-thread batch upper bound   =     100 × R_per_event(970 KiB) ≈  97 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;──────────────────────────────────────────────────────────────&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;In-flight steady state (fat-rich slice, §3.4 MAT)            ≈  310 MiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;Normal-distribution UPDATE steady state (not measured; extrapolated from §3.6 whole-table avg): using bucket-b &lt;code&gt;avg_KiB=38.8&lt;/code&gt; × 4× bloat (BEFORE+AFTER × UTF-16) ≈ 155 KiB/event, &lt;code&gt;400 in-flight events × 155 KiB ≈ 60 MiB&lt;/code&gt;. About 5× lighter than fat-rich steady state.&lt;/p&gt;&#xA;&lt;p&gt;(Note on &amp;ldquo;normal&amp;rdquo; baseline drift across sections: §3.8 uses 0.05 MiB/event (slice bucket-b, 13.5 KiB × 4); §3.11.1 uses ~0.08 MiB/event (bucket-4 estimate); §5.2 here uses 0.15 MiB/event (whole-table bucket-b, 38.8 KiB × 4). Each is internally consistent for its own slice, but the numbers aren&amp;rsquo;t interchangeable across sections.)&lt;/p&gt;&#xA;&lt;p&gt;In-flight pathological worst case (assume all 400 in-flight events are max rows):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;div style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&#xA;&lt;table style=&#34;border-spacing:0;padding:0;margin:0;border:0;&#34;&gt;&lt;tr&gt;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;1&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;2&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;3&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;4&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;5&#xA;&lt;/span&gt;&lt;span style=&#34;white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f&#34;&gt;6&#xA;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&#xA;&lt;td style=&#34;vertical-align:top;padding:0;margin:0;border:0;;width:100%&#34;&gt;&#xA;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  4 × 100 × 14 MiB = 5.6 GiB&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ↑ 14 MiB/event comes from §3.7&amp;#39;s decomposition:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    3.47 MiB = SQL max for this table; ×2 BEFORE+AFTER is a binlog_row_image=FULL code fact;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ×2 UTF-16 is arthas-confirmed via String.coder=1 + byte[].length directly.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;#34;400 consecutive max rows&amp;#34; is a probabilistically-impossible envelope assumption&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  (max row = 0.08% of the table). Treat this number as an upper-bound envelope, not an expectation.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&#xA;&lt;/div&gt;&#xA;&lt;/div&gt;&lt;p&gt;At 12g + &lt;code&gt;fraction=0.1&lt;/code&gt;, &lt;code&gt;task.heap ≈ 8.3 GiB&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Fat steady state 310 MiB / 8.3 GiB ≈ 3.6% nominal.&lt;/li&gt;&#xA;&lt;li&gt;Pathological worst 5.6 GiB / 8.3 GiB ≈ 67% nominal.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;&lt;strong&gt;What &amp;ldquo;nominal&amp;rdquo; means&lt;/strong&gt;: &lt;code&gt;task.heap&lt;/code&gt; isn&amp;rsquo;t entirely available for in-flight events. The leftover &lt;code&gt;task.heap - in-flight&lt;/code&gt; also has to serve:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;G1 young + survivor + old-gen working set (Flink framework resident objects, RocksDB in-heap part, Flink Pekko actor, serde intermediate objects, and so on).&lt;/li&gt;&#xA;&lt;li&gt;Main thread&amp;rsquo;s per-batch temporaries (JSON serialization, Doris HTTP stream-load body construction).&lt;/li&gt;&#xA;&lt;li&gt;GC headroom (G1 starts concurrent mark around 70–80% heap utilization; beyond that you get mixed GC, then Full GC, and eventually promotion failures).&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;The ~70% effective-working-set target for G1 is empirical (common rule-of-thumb among Java backend teams, not something I measured here). Folding that in:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Effective capacity ≈ 8.3 GiB × 0.7 ≈ 5.8 GiB.&lt;/li&gt;&#xA;&lt;li&gt;5.6 GiB / 5.8 GiB ≈ 97% utilization, 3% margin. Tight.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Or, using a looser 80% working-set assumption (8.3 × 0.8 = 6.64 GiB), 5.6 / 6.64 ≈ 84%, 16% margin.&lt;/p&gt;&#xA;&lt;p&gt;So: the &amp;ldquo;35% safety margin&amp;rdquo; I quoted in early drafts was nominal utilization of whole &lt;code&gt;task.heap&lt;/code&gt;, which is optimistic vs G1&amp;rsquo;s real working-set envelope. A genuine &amp;ldquo;70% utilization, 30% margin&amp;rdquo; promise to production requires the 16g upgrade in §5.1.1. The 12g config only makes sense if you&amp;rsquo;re betting max-fat pathological won&amp;rsquo;t happen.&lt;/p&gt;&#xA;&lt;h3 id=&#34;53-keep-jobmanager-reasonable-empirical-not-load-tested&#34;&gt;5.3 Keep JobManager reasonable (empirical, not load-tested)&#xA;&lt;/h3&gt;&lt;p&gt;Production&amp;rsquo;s JM at 24 GiB is over-provisioned:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;The job has 1 MySqlSource + 1 filter + 1 sink writer + 1 committer. The ExecutionGraph is tiny.&lt;/li&gt;&#xA;&lt;li&gt;Throughout all experiments JM was fixed at 4 GiB (§2.3). Via &lt;code&gt;kubectl top pod&lt;/code&gt; I observed JM peak RSS floated around 800 to 900 MiB (estimated heap usage ~200 to 300 MiB of the &lt;code&gt;-Xmx ≈ 3.2 GiB&lt;/code&gt;; Pekko, Netty, metrics reporters account for the rest of RSS). The observation uses &lt;code&gt;kubectl top&lt;/code&gt;, not proper JMX instrumentation, so precision is limited.&lt;/li&gt;&#xA;&lt;li&gt;4 GiB is plenty.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;I didn&amp;rsquo;t specifically burst-test JM memory. If the job later picks up more tables or larger checkpoint state, re-evaluate.&lt;/p&gt;&#xA;&lt;h3 id=&#34;54-untested-alternative-binlog_row_imageminimal&#34;&gt;5.4 Untested alternative: &lt;code&gt;binlog_row_image=MINIMAL&lt;/code&gt;&#xA;&lt;/h3&gt;&lt;p&gt;In principle, switching MySQL&amp;rsquo;s &lt;code&gt;binlog_row_image&lt;/code&gt; from FULL to MINIMAL would make BEFORE carry only PK + changed columns, roughly halving heap occupancy. But:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;It&amp;rsquo;s a server-level global variable affecting all binlog consumers.&lt;/li&gt;&#xA;&lt;li&gt;On a managed cloud RDS it&amp;rsquo;s risky to change without testing.&lt;/li&gt;&#xA;&lt;li&gt;I haven&amp;rsquo;t verified that the pipeline&amp;rsquo;s semantics stay correct under MINIMAL. Not changing it for this iteration.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Worth validating if a future need arises and MySQL is yours to tune.&lt;/p&gt;&#xA;&lt;h2 id=&#34;6-source-references-pinned-to-tags&#34;&gt;6. Source references (pinned to tags)&#xA;&lt;/h2&gt;&lt;p&gt;Apache Flink CDC &lt;code&gt;release-3.2.1&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/MySqlSource.java:167&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;.../cdc/connectors/mysql/source/reader/MySqlSplitReader.java:105&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;.../cdc/connectors/mysql/debezium/reader/BinlogSplitReader.java:147&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;.../cdc/connectors/mysql/debezium/task/context/StatefulTaskContext.java:139&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;flink-cdc-connect/flink-cdc-source-connectors/flink-connector-debezium/src/main/java/org/apache/flink/cdc/debezium/table/DebeziumOptions.java:25&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Apache Flink &lt;code&gt;release-1.18.1&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/synchronization/FutureCompletingBlockingQueue.java:109&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;.../source/reader/SourceReaderOptions.java:36&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Debezium &lt;code&gt;v1.9.8.Final&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;debezium-core/src/main/java/io/debezium/config/CommonConnectorConfig.java:302&lt;/code&gt; (&lt;code&gt;DEFAULT_MAX_QUEUE_SIZE = 8192&lt;/code&gt;, &lt;code&gt;DEFAULT_MAX_BATCH_SIZE = 2048&lt;/code&gt;)&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;debezium-core/src/main/java/io/debezium/config/CommonConnectorConfig.java:344&lt;/code&gt; (&lt;code&gt;validateMaxQueueSize&lt;/code&gt;)&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;debezium-core/src/main/java/io/debezium/connector/base/ChangeEventQueue.java&lt;/code&gt; (&lt;code&gt;poll()&lt;/code&gt; impl)&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Our fork of doris-flink-connector (based on apache/doris-flink-connector):&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;flink-doris-connector/src/main/java/org/apache/doris/flink/tools/cdc/mysql/MysqlDatabaseSync.java:215-231&lt;/code&gt; (&lt;code&gt;debezium.*&lt;/code&gt; passthrough)&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;flink-doris-connector/src/main/java/org/apache/doris/flink/cfg/DorisExecutionOptions.java:291&lt;/code&gt; (&lt;code&gt;ignoreUpdateBefore = true&lt;/code&gt;)&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;flink-doris-connector/src/main/java/org/apache/doris/flink/sink/writer/serializer/jsondebezium/JsonDebeziumDataChange.java:89-131&lt;/code&gt; (op dispatch)&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;7-tooling-used&#34;&gt;7. Tooling used&#xA;&lt;/h2&gt;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Situation&lt;/th&gt;&#xA;          &lt;th&gt;Tool&lt;/th&gt;&#xA;          &lt;th&gt;Usage&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Heap snapshot&lt;/td&gt;&#xA;          &lt;td&gt;JVM &lt;code&gt;-XX:+HeapDumpOnOutOfMemoryError&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;Write to a PVC-mounted &lt;code&gt;/tmp/dumps&lt;/code&gt;; survives pod termination&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Heap analysis (offline)&lt;/td&gt;&#xA;          &lt;td&gt;Eclipse MAT 1.15.0 (&lt;code&gt;20231206-linux.gtk.x86_64&lt;/code&gt;)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;ParseHeapDump.sh heap.hprof org.eclipse.mat.api:suspects&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Top components&lt;/td&gt;&#xA;          &lt;td&gt;MAT&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;org.eclipse.mat.api:top_components&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Live instance count&lt;/td&gt;&#xA;          &lt;td&gt;arthas 4.1.8 &lt;code&gt;vmtool --action getInstances&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;--express &#39;instances.length&#39;&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Live instance fields&lt;/td&gt;&#xA;          &lt;td&gt;arthas &lt;code&gt;vmtool ... --express&lt;/code&gt; (OGNL)&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;instances[0].value.get(&amp;quot;before&amp;quot;).get(&amp;quot;col&amp;quot;).length()&lt;/code&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Shaded class path&lt;/td&gt;&#xA;          &lt;td&gt;arthas &lt;code&gt;sc *Keyword&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;JRE-only container attach&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;jattach&lt;/code&gt; (apangin/jattach v2.2)&lt;/td&gt;&#xA;          &lt;td&gt;Doesn&amp;rsquo;t need JDK tools.jar&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Exfiltrate large heap dump from k8s&lt;/td&gt;&#xA;          &lt;td&gt;PVC + dump-reader pod + &lt;code&gt;kubectl cp&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;emptyDir&lt;/code&gt; dies with the pod&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;MySQL binlog observation&lt;/td&gt;&#xA;          &lt;td&gt;dedicated CDC user + &lt;code&gt;SHOW MASTER STATUS&lt;/code&gt; polling&lt;/td&gt;&#xA;          &lt;td&gt;Verify commit timing&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;h2 id=&#34;8-takeaways&#34;&gt;8. Takeaways&#xA;&lt;/h2&gt;&lt;p&gt;Six general debugging habits worth keeping:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Build a controllable reproduction first. PVC-backed dumps, operator auto-restart off. But remember that Flink&amp;rsquo;s internal restart-strategy can still mask mid-drain OOM, so check the dump directory directly.&lt;/li&gt;&#xA;&lt;li&gt;MAT Leak Suspects is the starting point. The retained ranking tells you &amp;ldquo;who owns the memory&amp;rdquo; directly; no need to guess first.&lt;/li&gt;&#xA;&lt;li&gt;For multi-layer pipelines (CDC / Debezium), line up MAT thread names and stack frames with source. Faster than class-name search.&lt;/li&gt;&#xA;&lt;li&gt;Arthas live-object checks are the fastest way to verify configs and counts at runtime.&lt;/li&gt;&#xA;&lt;li&gt;Reserved-subsystem memory is often tunable. Flink&amp;rsquo;s &lt;code&gt;managed.fraction=0.4&lt;/code&gt; is wasted on low-state jobs.&lt;/li&gt;&#xA;&lt;li&gt;Skim your own reasoning for assumptions that contradict known semantics (§3.1&amp;rsquo;s &amp;ldquo;partial flush during commit&amp;rdquo; error was exactly that kind of self-caught mistake; cheap to verify, embarrassing to miss).&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;Debezium / Flink CDC specifically:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Two independent queues, both must be sized. Debezium &lt;code&gt;max.queue.size&lt;/code&gt; (upstream, holds DataChangeEvent) and Flink &lt;code&gt;source.reader.element.queue.capacity&lt;/code&gt; (downstream, holds SourceRecord). Defaults: 8192 and 2 respectively. The latter is capped by element count, not bytes.&lt;/li&gt;&#xA;&lt;li&gt;Heap size is not binlog bytes. Distinguish &lt;code&gt;R_binlog&lt;/code&gt; (on-disk), &lt;code&gt;R_dbz&lt;/code&gt; (Debezium queue), &lt;code&gt;R_flink&lt;/code&gt; (Flink queue).&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;avg_row_length&lt;/code&gt; is misleading for bimodal distributions. Bucket the table and look at the long tail.&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;ignoreUpdateBefore=true&lt;/code&gt; does not help source-side OOM. It only controls whether the sink writes BEFORE; upstream memory is unchanged.&lt;/li&gt;&#xA;&lt;li&gt;Respect the &lt;code&gt;queue &amp;gt; batch&lt;/code&gt; invariant (Debezium &lt;code&gt;validateMaxQueueSize&lt;/code&gt;) even if nothing visibly complains when it&amp;rsquo;s violated.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr&gt;&#xA;&lt;p&gt;&lt;em&gt;All measurements were collected on 2026-04-21 in a dev cluster. The 12g/16g &lt;code&gt;task.heap&lt;/code&gt; numbers come from the Flink 1.18 memory-model formula; those tiers were not measured for OOM. The experimental slice is fat-rich (13% vs 4% table-wide), so extrapolating conclusions to other id ranges requires care.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;h2 id=&#34;appendix-a-update-latency-non-linearity-side-observation&#34;&gt;Appendix A: UPDATE latency non-linearity (side observation)&#xA;&lt;/h2&gt;&lt;ul&gt;&#xA;&lt;li&gt;§3.1&amp;rsquo;s 1000-row UPDATE took 1.6 seconds.&lt;/li&gt;&#xA;&lt;li&gt;§3.2&amp;rsquo;s 10k-row UPDATE took 5 minutes 45 seconds.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;10× rows, 216× latency. Not fully explained. Possible factors:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Slice difference. The 1k UPDATE used ids &lt;code&gt;1913… → 1914…&lt;/code&gt; (narrow range, non-fat-rich), while the 10k used &lt;code&gt;1870… → 1913…&lt;/code&gt; (wider range, fat-rich).&lt;/li&gt;&#xA;&lt;li&gt;Buffer-pool state (the two runs were hours apart).&lt;/li&gt;&#xA;&lt;li&gt;Fat rows live on larger InnoDB pages (off-page BLOB storage), so writes take more I/O.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;I don&amp;rsquo;t have enough data to decompose these. Tagged as &amp;ldquo;measured anomaly, partial explanation.&amp;rdquo;&lt;/p&gt;&#xA;&lt;h2 id=&#34;appendix-b-how-this-post-was-written&#34;&gt;Appendix B: How this post was written&#xA;&lt;/h2&gt;&lt;p&gt;The investigation was run by me in a dev cluster, paired with Claude Code on the tooling and analysis loop. No teammate reviewed or ran the experiments. The &lt;code&gt;kubectl top&lt;/code&gt;/&lt;code&gt;arthas&lt;/code&gt;/MAT commands and their outputs are my own; the conclusions are mine. The production hotfix (app-side batch splitting down to 50 rows per SQL) was a separate, parallel track by the ops team, not part of what&amp;rsquo;s written up here. Mentions of &amp;ldquo;we&amp;rdquo; in the post mean me + the tooling I was driving.&lt;/p&gt;&#xA;</description>
        </item></channel>
</rss>
