Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data
Date
Msg-id CAMsr+YH2-1MzAGHQeGB1HKDKtRVHvxDmeE_UHfYv6HKZJVZSDA@mail.gmail.com
Whole thread Raw
In response to Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data  (Andres Freund <andres@anarazel.de>)
Responses Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data
List pgsql-hackers
On 31 March 2016 at 16:13, Andres Freund <andres@anarazel.de> wrote:
 
It's probably easier to just generate a humongous commit record. You can
do so by having a *lot* of subtransactions. Relatively easy to do with
plpgsql by creating them in a loop (SELECT txid_current() in EXCEPTION
bearing block ought to suffice).


This does the trick and does it quickly on 9.4:

CREATE TABLE trivial(padding text);
ALTER TABLE trivial ALTER COLUMN padding SET STORAGE plain;

DO
LANGUAGE plpgsql
$$
DECLARE
  wal_seg_size integer;
  remaining_size integer;
BEGIN
  wal_seg_size := (select setting from pg_settings where name = 'wal_segment_size')::integer
                * (select setting from pg_settings where name = 'wal_block_size')::integer;
  LOOP
    SELECT wal_seg_size - file_offset FROM pg_xlogfile_name_offset(pg_current_xlog_insert_location()) INTO remaining_size;
    IF remaining_size < 8192
    THEN
        -- Make a nice big commit record right at the end of the segment
        EXIT;
    ELSE
        BEGIN
          -- About 4k
          INSERT INTO trivial(padding) VALUES (repeat('0123456789abcdef', 256));
        EXCEPTION
          WHEN division_by_zero THEN
            CONTINUE;
        END;
    END IF;
  END LOOP;
END;
$$;


I had no issue reproducing the bug on 9.4, but I don't see it in 9.6. At least, not this one, not yet. 

Still, we might want to backpatch the fix; it seems clear enough even if I can't reproduce the issue yet. It doesn't look like it'll affect logical decoding since the snapshot builder has its own unrelated logic for finding the startpoint for decoding, and it certainly doesn't affect WAL replay otherwise we would've seen the fireworks much earlier. The upside is that also makes the backpatch much safer.

I was surprised to see that there are no tests for pg_xlogdump to add this on to, so I've added a trivial test. I've got some more complete xlogdump-helper code in the failover slots tests that I should extract and add to PostgresNode.pm but this'll do in the mean time and is much simpler.

In the process I noticed that pg_xlogdump doesn't follow the rules of the rest of the codebase when it comes to command line handling with --version, returning nonzero on bad options, etc. It might be good to fix that; there's a small BC impact, but I doubt anyone cares when it comes to pg_xlogdump.

I'll attach the new testcase once I either get it to reproduce this bug or give up and leave the basic xlogdump testcase alone.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: [PATCH] Phrase search ported to 9.6
Next
From: "Shulgin, Oleksandr"
Date:
Subject: Re: SSL indicator in psql prompt