Thread: logical replication - negative bitmapset member not allowed

logical replication - negative bitmapset member not allowed

From
Tim Clarke
Date:
I'm getting this message every 5 seconds on a single-master,
single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
'net but otherwise seems to perform excellently. Any ideas what's
causing it and how to fix?

--
Tim Clarke
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England


----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee
youmust not use or disclose such information, instead please report it to
admin@minerva.info<mailto:admin@minerva.info>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820
RegisteredOffice at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
 

Re: logical replication - negative bitmapset member not allowed

From
Tom Lane
Date:
Tim Clarke <tim.clarke@minerva.info> writes:
> I'm getting this message every 5 seconds on a single-master,
> single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
> 'net but otherwise seems to perform excellently. Any ideas what's
> causing it and how to fix?

That'd certainly be a bug, but we'd need to reproduce it to fix it.
What are you doing that's different from everybody else?  Can you
provide any other info to narrow down the problem?

            regards, tom lane



Re: logical replication - negative bitmapset member not allowed

From
Tim Clarke
Date:
Dang. I just replicated ~380 tables. One was missing an index so I
paused replication, added a unique key on publisher and subscriber,
re-enabled replication and refreshed the subscription.

The table has only 7 columns, I added a primary key with a default value
from a new sequence.

Tim Clarke
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420

On 01/04/2019 15:02, Tom Lane wrote:
> Tim Clarke <tim.clarke@minerva.info> writes:
>> I'm getting this message every 5 seconds on a single-master,
>> single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
>> 'net but otherwise seems to perform excellently. Any ideas what's
>> causing it and how to fix?
> That'd certainly be a bug, but we'd need to reproduce it to fix it.
> What are you doing that's different from everybody else?  Can you
> provide any other info to narrow down the problem?
>
> regards, tom lane


Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England


----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee
youmust not use or disclose such information, instead please report it to
admin@minerva.info<mailto:admin@minerva.info>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820
RegisteredOffice at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
 

Re: logical replication - negative bitmapset member not allowed

From
Alvaro Herrera
Date:
On 2019-Apr-01, Tom Lane wrote:

> Tim Clarke <tim.clarke@minerva.info> writes:
> > I'm getting this message every 5 seconds on a single-master,
> > single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
> > 'net but otherwise seems to perform excellently. Any ideas what's
> > causing it and how to fix?
> 
> That'd certainly be a bug, but we'd need to reproduce it to fix it.
> What are you doing that's different from everybody else?  Can you
> provide any other info to narrow down the problem?

Maybe the replica identity of a table got set to a unique index on oid?
Or something else involving system columns?  (If replication is
otherwise working, the I suppose there's a separate publication that's
having the error; the first thing to isolate would be to see what tables
are involved in that publication).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical replication - negative bitmapset member not allowed

From
Tom Lane
Date:
Tim Clarke <tim.clarke@minerva.info> writes:
> Dang. I just replicated ~380 tables. One was missing an index so I
> paused replication, added a unique key on publisher and subscriber,
> re-enabled replication and refreshed the subscription.

Well, that's not much help :-(.  Can you provide any info to narrow
down where this is happening?  I mean, you haven't even told us whether
it's the primary or the slave that is complaining.  Does it seem to
be associated with any particular command?  (Turning on log_statement
and/or log_replication_commands would likely help with that.)  Does
data seem to be getting transferred despite the complaint?  If not,
what's missing on the slave?

            regards, tom lane



Re: logical replication - negative bitmapset member not allowed

From
Tim Clarke
Date:
On 02/04/2019 14:59, Tom Lane wrote:
> Well, that's not much help :-(.  Can you provide any info to narrow
> down where this is happening?  I mean, you haven't even told us whether
> it's the primary or the slave that is complaining.  Does it seem to
> be associated with any particular command?  (Turning on log_statement
> and/or log_replication_commands would likely help with that.)  Does
> data seem to be getting transferred despite the complaint?  If not,
> what's missing on the slave?
>
> regards, tom lane


I've been working to narrow it, the error is being reported on the slave.

The only schema changes have been the two primary keys added to two
tables. The problem occurred during this cycle:

1) Replication proceeding fine for ~380 tables, all added individually
not "all tables".

2) Add primary key on master.

3) Add primary key on slave.

4) Refresh subscription on slave; error starts being reported.

I've cleared it by dropping the slave database, re-creating from the
live schema then fully replicating. Its all running happily now.


Tim Clarke



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England


----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee
youmust not use or disclose such information, instead please report it to
admin@minerva.info<mailto:admin@minerva.info>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820
RegisteredOffice at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
 

Re: logical replication - negative bitmapset member not allowed

From
Tom Lane
Date:
Tim Clarke <tim.clarke@minerva.info> writes:
> I've cleared it by dropping the slave database, re-creating from the
> live schema then fully replicating. Its all running happily now.

I'm glad you're out of the woods, but we still have a bug there
waiting to bite the next person.  I wonder if you'd be willing to
spend some time trying to develop a reproduction sequence for this
(obviously, working on a test setup not your live servers).
Presumably there's something in the subscription-alteration logic
that needs work, but I don't think we have enough detail here for
somebody else to reproduce the error without a lot of guesswork.

            regards, tom lane



Re: logical replication - negative bitmapset member not allowed

From
Tim Clarke
Date:
On 02/04/2019 15:46, Tom Lane wrote:
> I'm glad you're out of the woods, but we still have a bug there
> waiting to bite the next person.  I wonder if you'd be willing to
> spend some time trying to develop a reproduction sequence for this
> (obviously, working on a test setup not your live servers).
> Presumably there's something in the subscription-alteration logic
> that needs work, but I don't think we have enough detail here for
> somebody else to reproduce the error without a lot of guesswork.
>
> regards, tom lane


I'll do what I can :)


Tim Clarke



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England


----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee
youmust not use or disclose such information, instead please report it to
admin@minerva.info<mailto:admin@minerva.info>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820
RegisteredOffice at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
 

Re: logical replication - negative bitmapset member not allowed

From
Peter Eisentraut
Date:
On 2019-04-01 23:43, Alvaro Herrera wrote:
> Maybe the replica identity of a table got set to a unique index on oid?
> Or something else involving system columns?  (If replication is
> otherwise working, the I suppose there's a separate publication that's
> having the error; the first thing to isolate would be to see what tables
> are involved in that publication).

Looking through the code, the bms_add_member() call in
logicalrep_read_attrs() does not use the usual
FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
problem.

However, I can't quite reproduce this.  There are various other checks
that prevent this scenario, but it's plausible that with a bit of
whacking around you could hit this error message.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical replication - negative bitmapset member not allowed

From
Tim Clarke
Date:
On 04/04/2019 22:37, Peter Intrauterine wrote:
> On 2019-04-01 23:43, Alvaro Herrera wrote:
>> Maybe the replica identity of a table got set to a unique index on oid?
>> Or something else involving system columns?  (If replication is
>> otherwise working, the I suppose there's a separate publication that's
>> having the error; the first thing to isolate would be to see what tables
>> are involved in that publication).
> Looking through the code, the bms_add_member() call in
> logicalrep_read_attrs() does not use the usual
> FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
> problem.
>
> However, I can't quite reproduce this.  There are various other checks
> that prevent this scenario, but it's plausible that with a bit of
> whacking around you could hit this error message.
>

Promise I've not been whacking around......


Tim Clarke



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England


----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee
youmust not use or disclose such information, instead please report it to
admin@minerva.info<mailto:admin@minerva.info>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820
RegisteredOffice at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
 

Re: logical replication - negative bitmapset member not allowed

From
Jehan-Guillaume de Rorthais
Date:
Hello,

On Thu, 4 Apr 2019 23:37:04 +0200
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2019-04-01 23:43, Alvaro Herrera wrote:
> > Maybe the replica identity of a table got set to a unique index on oid?
> > Or something else involving system columns?  (If replication is
> > otherwise working, the I suppose there's a separate publication that's
> > having the error; the first thing to isolate would be to see what tables
> > are involved in that publication).  
> 
> Looking through the code, the bms_add_member() call in
> logicalrep_read_attrs() does not use the usual
> FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
> problem.
> 
> However, I can't quite reproduce this.  There are various other checks
> that prevent this scenario, but it's plausible that with a bit of
> whacking around you could hit this error message.

Here is a script to reproduce it under version 10, 11 and 12:

################################################
# env
PUB=/tmp/pub
SUB=/tmp/sub
unset PGPORT PGHOST PGDATABASE PGDATA
export PGUSER=postgres

# cleanup
kill %1
pg_ctl -w -s -D "$PUB" -m immediate stop; echo $?
pg_ctl -w -s -D "$SUB" -m immediate stop; echo $?
rm -r "$PUB" "$SUB"

# cluster
initdb -U postgres -N "$PUB" &>/dev/null; echo $?
initdb -U postgres -N "$SUB" &>/dev/null; echo $?
echo "wal_level=logical" >> "$PUB"/postgresql.conf
echo "port=5433" >> "$SUB"/postgresql.conf
pg_ctl -w -s -D $PUB -l "$PUB"-"$(date +%FT%T)".log start; echo $?
pg_ctl -w -s -D $SUB -l "$SUB"-"$(date +%FT%T)".log start; echo $?
pgbench -p 5432 -qi 
pg_dump -p 5432 -s | psql -qXp 5433

# fake activity
pgbench -p 5432 -T 300 -c 2 &

# replication setup
psql -p 5432 -Xc "CREATE PUBLICATION prov FOR ALL TABLES"
psql -p 5433 -Xc "CREATE SUBSCRIPTION sub
                  CONNECTION 'port=5432'
                  PUBLICATION prov"

# wait for the streaming
unset V;
while [ "$V" != "streaming" ]; do sleep 1
    V=$(psql -AtXc "SELECT 'streaming'
                    FROM pg_stat_replication WHERE state='streaming'")
done

# trigger the error message
psql -p 5433 -Xc "ALTER SUBSCRIPTION sub DISABLE"
psql -p 5433 -Xc "ALTER TABLE pgbench_history ADD id SERIAL PRIMARY KEY"
psql -p 5432 -Xc "ALTER TABLE pgbench_history ADD id SERIAL PRIMARY KEY"
psql -p 5433 -Xc "ALTER SUBSCRIPTION sub ENABLE"
################################################

Regards,



Re: logical replication - negative bitmapset member not allowed

From
Jehan-Guillaume de Rorthais
Date:
On Thu, 10 Oct 2019 15:15:46 +0200
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:

[...]
> Here is a script to reproduce it under version 10, 11 and 12:

I investigated on this bug while coming back from pgconf.eu. Bellow what I found
so far.

The message "negative bitmapset member not allowed" comes from
logicalrep_rel_open().

Every field that are unknown, dropped or generated are mapped to remote attnum
-1. See backend/replication/logical/relation.c:

    if (attr->attisdropped || attr->attgenerated)
    {
        entry->attrmap[i] = -1;
        continue;
    }

    attnum = logicalrep_rel_att_by_name(remoterel, NameStr(attr->attname));

Note that logicalrep_rel_att_by_name returns -1 on unknown fields.

Later in the same function, we check if fields belonging to some PK or unique
index appears in remote keys as well:

    while ((i = bms_next_member(idkey, i)) >= 0)
    {
        [...]
        if (!bms_is_member(entry->attrmap[attnum], remoterel->attkeys))
        {
            entry->updatable = false;
            break;
        }
    }

However, before checking if the local attribute belong to the remote keys,
it should check if it actually mapped to a remote one. In other words, I
suppose we should check entry->attrmap[attnum] > 0 before calling
bms_is_member().

The trivial patch would be:

-        if (!bms_is_member(entry->attrmap[attnum], remoterel->attkeys))
+        if (entry->attrmap[attnum] < 0 ||
+            !bms_is_member(entry->attrmap[attnum], remoterel->attkeys))
         {
             entry->updatable = false;
             break;
         }

I tested with the attached scenario and it sound to work correctly.

Note that while trying to fix this bug, I found a segment fault while compiling
with asserts. You might want to review/test without --enable-cassert. I will
report in another thread as this seems not related to this bug or fix.

Attachment

Re: logical replication - negative bitmapset member not allowed

From
Peter Eisentraut
Date:
On 2019-10-25 17:38, Jehan-Guillaume de Rorthais wrote:
> On Thu, 10 Oct 2019 15:15:46 +0200
> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
> 
> [...]
>> Here is a script to reproduce it under version 10, 11 and 12:
> 
> I investigated on this bug while coming back from pgconf.eu. Bellow what I found
> so far.

I have simplified your reproduction steps from the previous message to a 
test case, and I can confirm that your proposed fix addresses the issue. 
  A patch is attached.  Maybe someone can look it over.  I target next 
week's minor releases.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: logical replication - negative bitmapset member not allowed

From
Jehan-Guillaume de Rorthais
Date:
On Tue, 5 Nov 2019 16:02:51 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2019-10-25 17:38, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 10 Oct 2019 15:15:46 +0200
> > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
> > 
> > [...]
> >> Here is a script to reproduce it under version 10, 11 and 12:
> > 
> > I investigated on this bug while coming back from pgconf.eu. Bellow what I
> > found so far.
> 
> I have simplified your reproduction steps from the previous message to a 
> test case, and I can confirm that your proposed fix addresses the issue. 

Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
DISABLE/ENABLE is useful in the test case?

Is it something recommended during DDL on logically replicated relation? If
yes, I suppose we should update the first point of the restriction chapter in
documentation:
https://www.postgresql.org/docs/11/logical-replication-restrictions

Regards,



Re: logical replication - negative bitmapset member not allowed

From
Andres Freund
Date:
Hi,

On 2019-11-05 16:02:51 +0100, Peter Eisentraut wrote:
>  $node_publisher->stop('fast');
> +
> +
> +# TODO: https://www.postgresql.org/message-id/flat/a9139c29-7ddd-973b-aa7f-71fed9c38d75%40minerva.info
> +
> +$node_publisher = get_new_node('publisher3');
> +$node_publisher->init(allows_streaming => 'logical');
> +$node_publisher->start;
> +
> +$node_subscriber = get_new_node('subscriber3');
> +$node_subscriber->init(allows_streaming => 'logical');
> +$node_subscriber->start;

Do we really have to create a new subscriber for this test? The creation
of one isn't free. Nor is the amount of test code duplication
neglegible.

Greetings,

Andres Freund



Re: logical replication - negative bitmapset member not allowed

From
Peter Eisentraut
Date:
On 2019-11-05 17:05, Jehan-Guillaume de Rorthais wrote:
>> I have simplified your reproduction steps from the previous message to a
>> test case, and I can confirm that your proposed fix addresses the issue.
> 
> Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
> DISABLE/ENABLE is useful in the test case?

Turns out it's not necessary.  Attached is an updated patch that 
simplifies the test even further and moves it into the 
008_diff_schema.pl file.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: logical replication - negative bitmapset member not allowed

From
Peter Eisentraut
Date:
On 2019-11-05 17:18, Andres Freund wrote:
> On 2019-11-05 16:02:51 +0100, Peter Eisentraut wrote:
>>   $node_publisher->stop('fast');
>> +
>> +
>> +# TODO: https://www.postgresql.org/message-id/flat/a9139c29-7ddd-973b-aa7f-71fed9c38d75%40minerva.info
>> +
>> +$node_publisher = get_new_node('publisher3');
>> +$node_publisher->init(allows_streaming => 'logical');
>> +$node_publisher->start;
>> +
>> +$node_subscriber = get_new_node('subscriber3');
>> +$node_subscriber->init(allows_streaming => 'logical');
>> +$node_subscriber->start;
> 
> Do we really have to create a new subscriber for this test? The creation
> of one isn't free. Nor is the amount of test code duplication
> neglegible.

I changed that in the v2 patch.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical replication - negative bitmapset member not allowed

From
Jehan-Guillaume de Rorthais
Date:
On Thu, 7 Nov 2019 16:02:21 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2019-11-05 17:05, Jehan-Guillaume de Rorthais wrote:
> >> I have simplified your reproduction steps from the previous message to a
> >> test case, and I can confirm that your proposed fix addresses the issue.  
> > 
> > Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
> > DISABLE/ENABLE is useful in the test case?  
> 
> Turns out it's not necessary.  Attached is an updated patch that 
> simplifies the test even further and moves it into the 
> 008_diff_schema.pl file.

OK. No further comments on my side.

Thanks,



Re: logical replication - negative bitmapset member not allowed

From
Peter Eisentraut
Date:
On 2019-11-07 16:18, Jehan-Guillaume de Rorthais wrote:
> On Thu, 7 Nov 2019 16:02:21 +0100
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> 
>> On 2019-11-05 17:05, Jehan-Guillaume de Rorthais wrote:
>>>> I have simplified your reproduction steps from the previous message to a
>>>> test case, and I can confirm that your proposed fix addresses the issue.
>>>
>>> Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
>>> DISABLE/ENABLE is useful in the test case?
>>
>> Turns out it's not necessary.  Attached is an updated patch that
>> simplifies the test even further and moves it into the
>> 008_diff_schema.pl file.
> 
> OK. No further comments on my side.

Committed and backpatched.  Thanks!

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical replication - negative bitmapset member not allowed

From
Jehan-Guillaume de Rorthais
Date:
On Sat, 9 Nov 2019 09:18:21 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2019-11-07 16:18, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 7 Nov 2019 16:02:21 +0100
> > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> >   
> >> On 2019-11-05 17:05, Jehan-Guillaume de Rorthais wrote:  
> >>>> I have simplified your reproduction steps from the previous message to a
> >>>> test case, and I can confirm that your proposed fix addresses the
> >>>> issue.  
> >>>
> >>> Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
> >>> DISABLE/ENABLE is useful in the test case?  
> >>
> >> Turns out it's not necessary.  Attached is an updated patch that
> >> simplifies the test even further and moves it into the
> >> 008_diff_schema.pl file.  
> > 
> > OK. No further comments on my side.  
> 
> Committed and backpatched.  Thanks!

I'm glad to help!

Thanks,