Re: logical decoding / rewrite map vs. maxAllocatedDescs - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding / rewrite map vs. maxAllocatedDescs
Date
Msg-id 470adb65-5101-4659-d213-41bde1eef8f2@2ndquadrant.com
Whole thread Raw
In response to Re: logical decoding / rewrite map vs. maxAllocatedDescs  (Andres Freund <andres@anarazel.de>)
Responses Re: logical decoding / rewrite map vs. maxAllocatedDescs
List pgsql-hackers

On 08/10/2018 11:13 PM, Andres Freund wrote:
> On 2018-08-10 22:57:57 +0200, Tomas Vondra wrote:
>>
>>
>> On 08/09/2018 07:47 PM, Alvaro Herrera wrote:
>>> On 2018-Aug-09, Tomas Vondra wrote:
>>>
>>>> I suppose there are reasons why it's done this way, and admittedly the test
>>>> that happens to trigger this is a bit extreme (essentially running pgbench
>>>> concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's
>>>> extreme enough to deem it not an issue, because people using many temporary
>>>> tables often deal with bloat by doing frequent vacuum full on catalogs.
>>>
>>> Actually, it seems to me that ApplyLogicalMappingFile is just leaking
>>> the file descriptor for no good reason.  There's a different
>>> OpenTransientFile call in ReorderBufferRestoreChanges that is not
>>> intended to be closed immediately, but the other one seems a plain bug,
>>> easy enough to fix.
>>>
>>
>> Indeed. Adding a CloseTransientFile to ApplyLogicalMappingFile solves
>> the issue with hitting maxAllocatedDecs. Barring objections I'll commit
>> this shortly.
> 
> Yea, that's clearly a bug. I've not seen a patch, so I can't quite
> formally sign off, but it seems fairly obvious.
> 
> 
>> But while running the tests on this machine, I repeatedly got pgbench
>> failures like this:
>>
>> client 2 aborted in command 0 of script 0; ERROR:  could not read block
>> 3 in file "base/16384/24573": read only 0 of 8192 bytes
>>
>> That kinda reminds me the issues we're observing on some buildfarm
>> machines, I wonder if it's the same thing.
> 
> Oooh, that's interesting! What's the precise recipe that gets you there?
> 

I don't have an exact reproducer - it's kinda rare and unpredictable,
and I'm not sure how much it depends on the environment etc. But I'm
doing this:

1) one cluster with publication (wal_level=logical)

2) one cluster with subscription to (1)

3) simple table, replicated from (1) to (2)

   -- publisher
   create table t (a serial primary key, b int, c int);
   create publication p for table t;

   -- subscriber
   create table t (a serial primary key, b int, c int);
   create subscription s CONNECTION '...' publication p;

4) pgbench inserting rows into the replicated table

   pgbench -n -c 4 -T 300 -p 5433 -f insert.sql test

5) pgbench doing vacuum full on pg_class

   pgbench -n -f vacuum.sql -T 300 -p 5433 test

And once in a while I see failures like this:

   client 0 aborted in command 0 of script 0; ERROR:  could not read
   block 3 in file "base/16384/86242": read only 0 of 8192 bytes

   client 3 aborted in command 0 of script 0; ERROR:  could not read
   block 3 in file "base/16384/86242": read only 0 of 8192 bytes

   client 2 aborted in command 0 of script 0; ERROR:  could not read
   block 3 in file "base/16384/86242": read only 0 of 8192 bytes

or this:

   client 2 aborted in command 0 of script 0; ERROR:  could not read
   block 3 in file "base/16384/89369": read only 0 of 8192 bytes

   client 1 aborted in command 0 of script 0; ERROR:  could not read
   block 3 in file "base/16384/89369": read only 0 of 8192 bytes

I suspect there's some other ingredient, e.g. some manipulation with the
subscription. Or maybe it's not needed at all and I'm just imagining things.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: libpq compression
Next
From: Andreas Seltenreich
Date:
Subject: [sqlsmith] ERROR: partition missing from subplans