Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths |
Date | |
Msg-id | CAEze2Wjyvkip6CiqLJoEL-_BDdcJUHB6QCEReX=RHgQUCfUSuQ@mail.gmail.com Whole thread Raw |
In response to | Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths (David Zhang <david.zhang@highgo.ca>) |
Responses |
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths |
List | pgsql-hackers |
On Sat, 11 Jun 2022 at 01:32, David Zhang <david.zhang@highgo.ca> wrote: > > Hi, > > > > MaxAllocSize is pretty easy: > > > SELECT pg_logical_emit_message(false, long, long) FROM repeat(repeat(' ', 1024), 1024*1023) as l(long); > > > > > > on a standby: > > > > > > 2022-03-11 16:41:59.336 PST [3639744][startup][1/0:0] LOG: record length 2145386550 at 0/3000060 too long > > > > Thanks for the reference. I was already playing around with 2PC log > > records (which can theoretically contain >4GB of data); but your > > example is much easier and takes significantly less time. > > A little confused here, does this patch V3 intend to solve this problem "record length 2145386550 at 0/3000060 too long"? No, not once the record exists. But it does remove Postgres' ability to create such records, thereby solving the problem for all systems that generate WAL through Postgres' WAL writing APIs. > I set up a simple Primary and Standby stream replication environment, and use the above query to run the test for beforeand after patch v3. The error message still exist, but with different message. > > Before patch v3, the error is showing below, > > 2022-06-10 15:32:25.307 PDT [4253] LOG: record length 2145386550 at 0/3000060 too long > 2022-06-10 15:32:47.763 PDT [4257] FATAL: terminating walreceiver process due to administrator command > 2022-06-10 15:32:47.763 PDT [4253] LOG: record length 2145386550 at 0/3000060 too long > > After patch v3, the error displays differently > > 2022-06-10 15:53:53.397 PDT [12848] LOG: record length 2145386550 at 0/3000060 too long > 2022-06-10 15:54:07.249 PDT [12852] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000045has already been removed > 2022-06-10 15:54:07.275 PDT [12848] LOG: record length 2145386550 at 0/3000060 too long > > And once the error happens, then the Standby can't continue the replication. Did you initiate a new cluster or otherwise skip the invalid record you generated when running the instance based on master? It seems to me you're trying to replay the invalid record (len > MaxAllocSize), and this patch does not try to fix that issue. This patch just tries to forbid emitting records larger than MaxAllocSize, as per the check in XLogRecordAssemble, so that we wont emit unreadable records into the WAL anymore. Reading unreadable records still won't be possible, but that's also not something I'm trying to fix. > Is a particular reason to say "more datas" at line 52 in patch v3? > > + * more datas than are being accounted for by the XLog infrastructure. Yes. This error is thrown when you try to register a 34th block, or an Nth rdata where the caller previously only reserved n - 1 data slots. As such 'datas', for the num_rdatas and max_rdatas variables. Thanks for looking at the patch. - Matthias
pgsql-hackers by date: