Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id 20220402081346.GD3719101@rfd.leadboat.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:
> It seems that 0/B0706F72 is not a random value. Two subscriber logs
> show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
> might show the next field in the pg_subscription catalog, i.e.,
> subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
> CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
> PUBLICATION pub WITH (disable_on_error = true, streaming = on,
> two_phase = on)".
> 
> Given subscription.sql passes, something is wrong when we read the
> subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

That's a good clue.  We've never made pg_type.typalign able to represent
alignment as it works on AIX.  A uint64 like pg_lsn has 8-byte alignment, so
the C struct follows from that.  At the typalign level, we have only these:

#define  TYPALIGN_CHAR            'c' /* char alignment (i.e. unaligned) */
#define  TYPALIGN_SHORT            's' /* short alignment (typically 2 bytes) */
#define  TYPALIGN_INT            'i' /* int alignment (typically 4 bytes) */
#define  TYPALIGN_DOUBLE        'd' /* double alignment (often 8 bytes) */

On AIX, they are:

#define ALIGNOF_DOUBLE 4 
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 8   
/* #undef ALIGNOF_LONG_LONG_INT */
/* #undef ALIGNOF_PG_INT128_TYPE */
#define ALIGNOF_SHORT 2  

uint64 and pg_lsn use TYPALIGN_DOUBLE.  For AIX, they really need a typalign
corresponding to ALIGNOF_LONG.  Hence, the C struct layout doesn't match the
tuple layout.  Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class c on c.oid = attrelid where
attalign= 'd' and relkind = 'r' and attnotnull and attlen <> -1;
 
    attrelid     │   attname    
─────────────────┼──────────────
 pg_sequence     │ seqstart
 pg_sequence     │ seqincrement
 pg_sequence     │ seqmax
 pg_sequence     │ seqmin
 pg_sequence     │ seqcache
 pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.


Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway.  I've
  confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
- Introduce a new typalign value suitable for uint64.  This is more intrusive,
  but it's more future-proof.  Looking beyond catalog columns, it might
  improve performance by avoiding unaligned reads.

> Is it possible to run the test again with the attached patch?

Logs attached.  The test "passed", though it printed "poll_query_until timed
out" three times and took awhile.

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector - v66