Thread: Logical Decoding Failover

Logical Decoding Failover

From
Colin Morelli
Date:
Hey all,

I'm using logical decoding in my application to capture change streams and ship them to Kafka. However, logical replication slots aren't included in the WAL and thus don't make it to replicas. In the case of a failover, it's not clear what (if anything) can be done to continue receiving a change stream.

I noticed there was a failover slot patch that looks like it won't be making it for 9.6 (if at all), but does anyone else have suggestions for handling failover + logical decoding? Is there an answer, or is it limited to being a very manual process for now?

Thanks in advance.

Best,
Colin

Re: Logical Decoding Failover

From
Venkata Balaji N
Date:

On Sat, Aug 6, 2016 at 1:17 PM, Colin Morelli <colin.morelli@gmail.com> wrote:
Hey all,

I'm using logical decoding in my application to capture change streams and ship them to Kafka. However, logical replication slots aren't included in the WAL and thus don't make it to replicas. In the case of a failover, it's not clear what (if anything) can be done to continue receiving a change stream.

I noticed there was a failover slot patch that looks like it won't be making it for 9.6 (if at all), but does anyone else have suggestions for handling failover + logical decoding? Is there an answer, or is it limited to being a very manual process for now?

Yes, as of i know, It is a manual process for now. Another alternative you can consider (if possible at all) would be a tool like PgPool-II which is actually used for connection pooling, load balancing and automatic failover. It may not make sense if you are do not wish to pool you connections.

Regards,
Venkata B N

Fujitsu Australia

Re: Logical Decoding Failover

From
Colin Morelli
Date:
Venkata,

Thanks for the reply. Unfortunately something like PgPool still won't create the replication slots on all hosts, and record the LSN in a way that is reusable on the secondary.

This sort of puts logical decoding at odds with HA, correct? In the case of master failover, there's no way to: a) know where in the stream you read to, or b) convert that to something that can be understood by a replica that's about to be promoted.

Are there any scheduled improvements here either for 9.6 or 9.7?

Best,
Colin
On Sun, Aug 7, 2016 at 1:50 AM Venkata Balaji N <nag1010@gmail.com> wrote:
On Sat, Aug 6, 2016 at 1:17 PM, Colin Morelli <colin.morelli@gmail.com> wrote:
Hey all,

I'm using logical decoding in my application to capture change streams and ship them to Kafka. However, logical replication slots aren't included in the WAL and thus don't make it to replicas. In the case of a failover, it's not clear what (if anything) can be done to continue receiving a change stream.

I noticed there was a failover slot patch that looks like it won't be making it for 9.6 (if at all), but does anyone else have suggestions for handling failover + logical decoding? Is there an answer, or is it limited to being a very manual process for now?

Yes, as of i know, It is a manual process for now. Another alternative you can consider (if possible at all) would be a tool like PgPool-II which is actually used for connection pooling, load balancing and automatic failover. It may not make sense if you are do not wish to pool you connections.

Regards,
Venkata B N

Fujitsu Australia

Re: Logical Decoding Failover

From
Venkata Balaji N
Date:

On Sun, Aug 7, 2016 at 9:29 PM, Colin Morelli <colin.morelli@gmail.com> wrote:
Venkata,

Thanks for the reply. Unfortunately something like PgPool still won't create the replication slots on all hosts, and record the LSN in a way that is reusable on the secondary.

Yes, thats correct, pgPool does not have anything to do with replication slots. That is something which you need to manually configure.
 
This sort of puts logical decoding at odds with HA, correct? In the case of master failover, there's no way to: a) know where in the stream you read to, or b) convert that to something that can be understood by a replica that's about to be promoted.

Can you please clarify, what did you exactly mean here ? are you referring to cascading standbys or something ? Failover is something which application has to do, to reconnect to the promoted standby. As far as logical decoding is concerned, you need to have a replication slot configured for the new master to ensure all the WALs which are not replicated to standby are retained. After promotion, the standby database becomes a standalone instance and the replication must be re-enabled. You can automate the standby promotion using tools like pgpool-II or pacemaker.

I am trying to differentiate standby-promotion and failover mechanism here.

Are there any scheduled improvements here either for 9.6 or 9.7?

I am not 100% sure, if there is anything developed from an failover mechanism perspective.

Regards,
Venkata B N

Fujitsu Australia

Re: Logical Decoding Failover

From
Colin Morelli
Date:
Venkata,

No problem - thanks for replying. I'll try to clarify.

Say I have databases A and B, where A is primary and B is a replica. I have an application that executes 3 transactions against A, resulting in WAL log sequence numbers 1, 2, and 3. The WAL is successfully replicated to B.

I have a logical deciding application which is pulling the logical stream, and reads LSN 1 and 2 before the primary (A) fails. I decide to fail over to B, which has all data on it and things work great.

Now the logical decoding client connects to B (the new primary). The replication slot doesn't exist. So, it creates it and starts streaming. This is where the problem lies - as it would begin streaming from LSN 4 (anything after what has already been committed), because I have no way (that I can find) of restoring my "progress" through the WAL on a the replicas.

As a result, my application never sees the event at LSN 3. In fact, I'm not even sure how I could manually do this.

Hopefully that makes more sense.

Best,
Colin
On Mon, Aug 8, 2016 at 10:47 PM Venkata Balaji N <nag1010@gmail.com> wrote:
On Sun, Aug 7, 2016 at 9:29 PM, Colin Morelli <colin.morelli@gmail.com> wrote:
Venkata,

Thanks for the reply. Unfortunately something like PgPool still won't create the replication slots on all hosts, and record the LSN in a way that is reusable on the secondary.

Yes, thats correct, pgPool does not have anything to do with replication slots. That is something which you need to manually configure.
 
This sort of puts logical decoding at odds with HA, correct? In the case of master failover, there's no way to: a) know where in the stream you read to, or b) convert that to something that can be understood by a replica that's about to be promoted.

Can you please clarify, what did you exactly mean here ? are you referring to cascading standbys or something ? Failover is something which application has to do, to reconnect to the promoted standby. As far as logical decoding is concerned, you need to have a replication slot configured for the new master to ensure all the WALs which are not replicated to standby are retained. After promotion, the standby database becomes a standalone instance and the replication must be re-enabled. You can automate the standby promotion using tools like pgpool-II or pacemaker.

I am trying to differentiate standby-promotion and failover mechanism here.

Are there any scheduled improvements here either for 9.6 or 9.7?

I am not 100% sure, if there is anything developed from an failover mechanism perspective.

Regards,
Venkata B N

Fujitsu Australia

Re: Logical Decoding Failover

From
Venkata Balaji N
Date:

Now the logical decoding client connects to B (the new primary). The replication slot doesn't exist. So, it creates it and starts streaming. This is where the problem lies - as it would begin streaming from LSN 4 (anything after what has already been committed), because I have no way (that I can find) of restoring my "progress" through the WAL on a the replicas.

As a result, my application never sees the event at LSN 3. In fact, I'm not even sure how I could manually do this.

Yes, that is the current limitation and as of i know, there is no such patch being developed at the moment. I leave it to any of the developers to respond. This cannot be done manually as the new master will not read through from 
LSN 3 (which is a thing of past unknown to new master) and replication slot at standby is a new one with the latest starting from LSN 4 which, the logical decoder does not know.

Hope that helps !

Regards,
Venkata B N

Fujitsu Australia

Re: Logical Decoding Failover

From
Simon Riggs
Date:
On 6 August 2016 at 04:17, Colin Morelli <colin.morelli@gmail.com> wrote:

> I'm using logical decoding in my application to capture change streams and
> ship them to Kafka. However, logical replication slots aren't included in
> the WAL and thus don't make it to replicas. In the case of a failover, it's
> not clear what (if anything) can be done to continue receiving a change
> stream.
>
> I noticed there was a failover slot patch that looks like it won't be making
> it for 9.6 (if at all), but does anyone else have suggestions for handling
> failover + logical decoding? Is there an answer, or is it limited to being a
> very manual process for now?

Hi Colin,

Yes, its a current issue.

A patch was submitted but not applied to 9.6 for this, so the issue
continues. As a result, 2Q support that as a custom solution for a
number of customers.

A different design will be submitted for the next release, 10.0.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Logical Decoding Failover

From
Colin Morelli
Date:
Simon,

That's too bad. Logical decoding is incredibly powerful, but if it ultimately comes down to picking between HA or logical decoding the choice to use it becomes a lot more difficult.

Are you saying that logical decoding is receiving a different design in 10.0? (For clarify - is 10.0 the next scheduled release after 9.6 - so next year?). If so, I'd greatly appreciate any more info you have on the new design (even patches if you have them).

Best,
Colin

On Wed, Aug 10, 2016 at 12:53 AM Simon Riggs <simon@2ndquadrant.com> wrote:
On 6 August 2016 at 04:17, Colin Morelli <colin.morelli@gmail.com> wrote:

> I'm using logical decoding in my application to capture change streams and
> ship them to Kafka. However, logical replication slots aren't included in
> the WAL and thus don't make it to replicas. In the case of a failover, it's
> not clear what (if anything) can be done to continue receiving a change
> stream.
>
> I noticed there was a failover slot patch that looks like it won't be making
> it for 9.6 (if at all), but does anyone else have suggestions for handling
> failover + logical decoding? Is there an answer, or is it limited to being a
> very manual process for now?

Hi Colin,

Yes, its a current issue.

A patch was submitted but not applied to 9.6 for this, so the issue
continues. As a result, 2Q support that as a custom solution for a
number of customers.

A different design will be submitted for the next release, 10.0.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logical Decoding Failover

From
Simon Riggs
Date:
On 10 August 2016 at 09:03, Colin Morelli <colin.morelli@gmail.com> wrote:

> That's too bad. Logical decoding is incredibly powerful, but if it
> ultimately comes down to picking between HA or logical decoding the choice
> to use it becomes a lot more difficult.

Agreed.

> Are you saying that logical decoding is receiving a different design in
> 10.0? (For clarify - is 10.0 the next scheduled release after 9.6 - so next
> year?). If so, I'd greatly appreciate any more info you have on the new
> design (even patches if you have them).

A new design for the failover aspect of logical replication will be in 10.0.

Yes, 10.0 is the next release, due 2017.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services