Thread: Logical Replication - "invalid ordering of speculative insertion changes"

Logical Replication - "invalid ordering of speculative insertion changes"

From
"Joe Wildish"
Date:
Hello,

We have a logical replication publisher (13.7) and subscriber (14.6) where we are seeing the following error on the
subscriber.IP address and publication name changed, otherwise verbatim:
 

2023-01-31 15:24:49 UTC:x.x.x.x(56276):super@pubdb:[1040971]: WARNING:  tables were not subscribed, you will have to
runALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 
2023-01-31 15:24:50 UTC::@:[1040975]: LOG:  logical replication apply worker for subscription "pub" has started
2023-01-31 15:24:50 UTC::@:[1040975]: ERROR:  could not receive data from WAL stream: ERROR:  invalid ordering of
speculativeinsertion changes
 

This error occurs during the initial set up of the subscription.  We hit REFRESH, and then immediately it goes into
thiserror state. It then repeats as it is retrying from here onwards and keeps hitting the same error.
 

My understanding is that the subscriber is performing some kind of reordering of the events contained within the WAL
message.As it cannot then consume the message, it aborts, retries, and gets the same message and errors again.  Looking
inthe source code it seems there is only one place where this error can be emitted --- reorderbuffer.c:2179.  Moreover
Ican't tell if this is an error that I can be expected to recover from as a user.
 

We see this error only sometimes. Other times, we REFRESH the subscription and it makes progress as one would expect.

Can anyone advise on what we are doing wrong here?

-Joe



Re: Logical Replication - "invalid ordering of speculative insertion changes"

From
"Joe Wildish"
Date:
Just a bump on this --- perhaps the error is a bug with the DBMS?

From what I can see "speculative insertion changes" in this context means INSERT..ON CONFLICT DML.  Although I have
someexperience writing extensions and simple patches for the code base, I don't know anything as a developer about the
transactionlog.  I dug around a bit and it appears that there is a specific record type inside the WAL that
differentiatesINSERT from INSERT..ON CONFLICT changes, and it is these changes that cannot be re-ordered when trying to
emitthe message to the output plugin for the logical replication slot (which in this case is the internal pgoutput
one). Given that, I don't see how it can be user error.  Unless anyone else knows differently?
 

-Joe

On Tue, 31 Jan 2023, at 18:10, Joe Wildish wrote:
> Hello,
>
> We have a logical replication publisher (13.7) and subscriber (14.6) 
> where we are seeing the following error on the subscriber. IP address 
> and publication name changed, otherwise verbatim:
>
> 2023-01-31 15:24:49 UTC:x.x.x.x(56276):super@pubdb:[1040971]: WARNING:  
> tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... 
> REFRESH PUBLICATION to subscribe the tables
> 2023-01-31 15:24:50 UTC::@:[1040975]: LOG:  logical replication apply 
> worker for subscription "pub" has started
> 2023-01-31 15:24:50 UTC::@:[1040975]: ERROR:  could not receive data 
> from WAL stream: ERROR:  invalid ordering of speculative insertion 
> changes
>
> This error occurs during the initial set up of the subscription.  We 
> hit REFRESH, and then immediately it goes into this error state. It 
> then repeats as it is retrying from here onwards and keeps hitting the 
> same error.
>
> My understanding is that the subscriber is performing some kind of 
> reordering of the events contained within the WAL message. As it cannot 
> then consume the message, it aborts, retries, and gets the same message 
> and errors again.  Looking in the source code it seems there is only 
> one place where this error can be emitted --- reorderbuffer.c:2179.  
> Moreover I can't tell if this is an error that I can be expected to 
> recover from as a user.
>
> We see this error only sometimes. Other times, we REFRESH the 
> subscription and it makes progress as one would expect.
>
> Can anyone advise on what we are doing wrong here?
>
> -Joe



Re: Logical Replication - "invalid ordering of speculative insertion changes"

From
Rahila Syed
Date:
Hi Joe,


On Fri, Feb 3, 2023 at 1:42 AM Joe Wildish <joe@lateraljoin.com> wrote:
Just a bump on this --- perhaps the error is a bug with the DBMS?

From what I can see "speculative insertion changes" in this context means INSERT..ON CONFLICT DML.  Although I have some experience writing extensions and simple patches for the code base, I don't know anything as a developer about the transaction log.  I dug around a bit and it appears that there is a specific record type inside the WAL that differentiates INSERT from INSERT..ON CONFLICT changes, and it is these changes that cannot be re-ordered when trying to emit the message to the output plugin for the logical replication slot (which in this case is the internal pgoutput one).  Given that, I don't see how it can be user error.  Unless anyone else knows differently?

It will be useful if you could provide steps to reproduce this issue.


Thank you,
Rahila Syed
 
-Joe

On Tue, 31 Jan 2023, at 18:10, Joe Wildish wrote:
> Hello,
>
> We have a logical replication publisher (13.7) and subscriber (14.6)
> where we are seeing the following error on the subscriber. IP address
> and publication name changed, otherwise verbatim:
>
> 2023-01-31 15:24:49 UTC:x.x.x.x(56276):super@pubdb:[1040971]: WARNING: 
> tables were not subscribed, you will have to run ALTER SUBSCRIPTION ...
> REFRESH PUBLICATION to subscribe the tables
> 2023-01-31 15:24:50 UTC::@:[1040975]: LOG:  logical replication apply
> worker for subscription "pub" has started
> 2023-01-31 15:24:50 UTC::@:[1040975]: ERROR:  could not receive data
> from WAL stream: ERROR:  invalid ordering of speculative insertion
> changes
>
> This error occurs during the initial set up of the subscription.  We
> hit REFRESH, and then immediately it goes into this error state. It
> then repeats as it is retrying from here onwards and keeps hitting the
> same error.
>
> My understanding is that the subscriber is performing some kind of
> reordering of the events contained within the WAL message. As it cannot
> then consume the message, it aborts, retries, and gets the same message
> and errors again.  Looking in the source code it seems there is only
> one place where this error can be emitted --- reorderbuffer.c:2179. 
> Moreover I can't tell if this is an error that I can be expected to
> recover from as a user.
>
> We see this error only sometimes. Other times, we REFRESH the
> subscription and it makes progress as one would expect.
>
> Can anyone advise on what we are doing wrong here?
>
> -Joe