Thread: Applying logical replication changes by more than one process

Applying logical replication changes by more than one process

From

Konstantin Knizhnik

Date:

19 March 2016, 10:46:13

Hi,

I am trying to use logical replication mechanism in implementation of PostgreSQL multimaster and faced with one
conceptualproblem.

Originally logical replication was intended to support asynchronous replication. In this case applying changes by
singleprocess should not be a bottleneck.

But if we are using distributed transaction manager to provide global consistency, then applying transaction by one
worker leads to very bad performance and what is worser: cause unintended serialization of transactions, which is not
takenin account by

distributed deadlock detection algorithm and so can cause
undetected deadlocks.

So I have implemented pool of background workers which can apply transactions concurrently.
It works and shows acceptable performance. But now I am thinking about HA and tracking origin LSNs which are needed to
correctlyspecify slot position in case of recovery. And there is a problem: as far as I understand to correctly record
originLSN in WAL

and advance slot position it is necessary to setup session
using replorigin_session_setup. It is not so convenient in case of using pool of background workers, because we have to
setupsession for each commit.

But the main problem is that for each slot session can be associated only with one process:
else if (curstate->acquired_by != 0) { ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE), errmsg("replication identifier %d is already active for PID %d",
curstate->roident, curstate->acquired_by))); }

Which once again means that there can be only one process applying changes.

To provide correct state of replication node it is necessary to enforce that each logical replication record is
replayedexactly once: we should not loose some change or try to apply it twice. So operation of recording original LSN
positionin WAL and

adjusting slot should be atomic. And during recovery we should restore slot current position based on the origin values
extracted from WAL. I wonder if it can be done using current logical replication mechanism when changes of each slot
areapplied by

more than one process? Or the only alternative is to write/read origin LSNs in WAL myself, for example using custom WAL
records?

Thanks in advance!

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Applying logical replication changes by more than one process

From

Petr Jelinek

Date:

21 March 2016, 12:10:18

Hi,

On 19/03/16 11:46, Konstantin Knizhnik wrote:
> Hi,
>
> I am trying to use logical replication mechanism in implementation of
> PostgreSQL multimaster and faced with one conceptual problem.
> Originally logical replication was intended to support asynchronous
> replication. In this case applying changes by single process should not
> be a bottleneck.
> But if we are using distributed transaction manager to provide global
> consistency, then applying transaction by one worker  leads to very bad
> performance and what is worser: cause unintended serialization of
> transactions, which is not taken in account by distributed deadlock
> detection algorithm and so can cause
> undetected deadlocks.
>
> So I have implemented pool of background workers which can apply
> transactions concurrently.
> It works and shows acceptable performance. But now I am thinking about
> HA and tracking origin LSNs which are needed to correctly specify slot
> position in case of recovery. And there is a problem: as far as I
> understand to correctly record origin LSN in WAL and advance slot
> position it is necessary to setup session
> using replorigin_session_setup. It is not so convenient in case of using
> pool of background workers, because we have to setup session for each
> commit.
> But the main problem is that for each slot session can be associated
> only with one process:
>
>          else if (curstate->acquired_by != 0)
>          {
>              ereport(ERROR,
>                      (errcode(ERRCODE_OBJECT_IN_USE),
>               errmsg("replication identifier %d is already active for
> PID %d",
>                      curstate->roident, curstate->acquired_by)));
>          }
>
> Which once again means that there can be only one process applying changes.
>

That's not true, all it means is that you can do 
replorigin_session_setup for same origin only in one process but you 
don't need to have it setup for session to update it, the 
replorigin_advance() works just fine.


--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Applying logical replication changes by more than one process

From

Petr Jelinek

Date:

21 March 2016, 12:10:24

Hi,

On 19/03/16 11:46, Konstantin Knizhnik wrote:
> Hi,
>
> I am trying to use logical replication mechanism in implementation of
> PostgreSQL multimaster and faced with one conceptual problem.
> Originally logical replication was intended to support asynchronous
> replication. In this case applying changes by single process should not
> be a bottleneck.
> But if we are using distributed transaction manager to provide global
> consistency, then applying transaction by one worker  leads to very bad
> performance and what is worser: cause unintended serialization of
> transactions, which is not taken in account by distributed deadlock
> detection algorithm and so can cause
> undetected deadlocks.
>
> So I have implemented pool of background workers which can apply
> transactions concurrently.
> It works and shows acceptable performance. But now I am thinking about
> HA and tracking origin LSNs which are needed to correctly specify slot
> position in case of recovery. And there is a problem: as far as I
> understand to correctly record origin LSN in WAL and advance slot
> position it is necessary to setup session
> using replorigin_session_setup. It is not so convenient in case of using
> pool of background workers, because we have to setup session for each
> commit.
> But the main problem is that for each slot session can be associated
> only with one process:
>
>          else if (curstate->acquired_by != 0)
>          {
>              ereport(ERROR,
>                      (errcode(ERRCODE_OBJECT_IN_USE),
>               errmsg("replication identifier %d is already active for
> PID %d",
>                      curstate->roident, curstate->acquired_by)));
>          }
>
> Which once again means that there can be only one process applying changes.
>

That's not true, all it means is that you can do 
replorigin_session_setup for same origin only in one process but you 
don't need to have it setup for session to update it, the 
replorigin_advance() works just fine.


--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Applying logical replication changes by more than one process

From

Konstantin Knizhnik

Date:

21 March 2016, 12:44:33


On 21.03.2016 15:10, Petr Jelinek wrote:
> Hi,
>
> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>> Hi,
>>
>> I am trying to use logical replication mechanism in implementation of
>> PostgreSQL multimaster and faced with one conceptual problem.
>> Originally logical replication was intended to support asynchronous
>> replication. In this case applying changes by single process should not
>> be a bottleneck.
>> But if we are using distributed transaction manager to provide global
>> consistency, then applying transaction by one worker  leads to very bad
>> performance and what is worser: cause unintended serialization of
>> transactions, which is not taken in account by distributed deadlock
>> detection algorithm and so can cause
>> undetected deadlocks.
>>
>> So I have implemented pool of background workers which can apply
>> transactions concurrently.
>> It works and shows acceptable performance. But now I am thinking about
>> HA and tracking origin LSNs which are needed to correctly specify slot
>> position in case of recovery. And there is a problem: as far as I
>> understand to correctly record origin LSN in WAL and advance slot
>> position it is necessary to setup session
>> using replorigin_session_setup. It is not so convenient in case of using
>> pool of background workers, because we have to setup session for each
>> commit.
>> But the main problem is that for each slot session can be associated
>> only with one process:
>>
>>          else if (curstate->acquired_by != 0)
>>          {
>>              ereport(ERROR,
>>                      (errcode(ERRCODE_OBJECT_IN_USE),
>>               errmsg("replication identifier %d is already active for
>> PID %d",
>>                      curstate->roident, curstate->acquired_by)));
>>          }
>>
>> Which once again means that there can be only one process applying 
>> changes.
>>
>
> That's not true, all it means is that you can do 
> replorigin_session_setup for same origin only in one process but you 
> don't need to have it setup for session to update it, the 
> replorigin_advance() works just fine.

But RecordTransactionCommit is using replorigin_session_advance, not 
replorigin_advance.
And replorigin_session_advance requires that session was setup:

void
replorigin_session_advance(XLogRecPtr remote_commit, XLogRecPtr 
local_commit)
{    Assert(session_replication_state != NULL);
}

"session_replication_state" is private variable which is set by 
replorigin_session_setup.
But attempt to call replorigin_session_setup from multiple process cause 
above error.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Applying logical replication changes by more than one process

From

Petr Jelinek

Date:

21 March 2016, 13:09:02

On 21/03/16 13:44, Konstantin Knizhnik wrote:
>
>
> On 21.03.2016 15:10, Petr Jelinek wrote:
>> Hi,
>>
>> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>>> Hi,
>>>
>>> I am trying to use logical replication mechanism in implementation of
>>> PostgreSQL multimaster and faced with one conceptual problem.
>>> Originally logical replication was intended to support asynchronous
>>> replication. In this case applying changes by single process should not
>>> be a bottleneck.
>>> But if we are using distributed transaction manager to provide global
>>> consistency, then applying transaction by one worker  leads to very bad
>>> performance and what is worser: cause unintended serialization of
>>> transactions, which is not taken in account by distributed deadlock
>>> detection algorithm and so can cause
>>> undetected deadlocks.
>>>
>>> So I have implemented pool of background workers which can apply
>>> transactions concurrently.
>>> It works and shows acceptable performance. But now I am thinking about
>>> HA and tracking origin LSNs which are needed to correctly specify slot
>>> position in case of recovery. And there is a problem: as far as I
>>> understand to correctly record origin LSN in WAL and advance slot
>>> position it is necessary to setup session
>>> using replorigin_session_setup. It is not so convenient in case of using
>>> pool of background workers, because we have to setup session for each
>>> commit.
>>> But the main problem is that for each slot session can be associated
>>> only with one process:
>>>
>>>          else if (curstate->acquired_by != 0)
>>>          {
>>>              ereport(ERROR,
>>>                      (errcode(ERRCODE_OBJECT_IN_USE),
>>>               errmsg("replication identifier %d is already active for
>>> PID %d",
>>>                      curstate->roident, curstate->acquired_by)));
>>>          }
>>>
>>> Which once again means that there can be only one process applying
>>> changes.
>>>
>>
>> That's not true, all it means is that you can do
>> replorigin_session_setup for same origin only in one process but you
>> don't need to have it setup for session to update it, the
>> replorigin_advance() works just fine.
>
> But RecordTransactionCommit is using replorigin_session_advance, not
> replorigin_advance.

Only when the origin is actually setup for the current session. You need 
to call the replorigin_advance yourself from your apply code.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Applying logical replication changes by more than one process

From

Andres Freund

Date:

21 March 2016, 13:15:50


On March 21, 2016 2:08:54 PM GMT+01:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
>On 21/03/16 13:44, Konstantin Knizhnik wrote:
>>
>>
>> On 21.03.2016 15:10, Petr Jelinek wrote:
>>> Hi,
>>>
>>> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>>>> Hi,
>>>>
>>>> I am trying to use logical replication mechanism in implementation
>of
>>>> PostgreSQL multimaster and faced with one conceptual problem.
>>>> Originally logical replication was intended to support asynchronous
>>>> replication. In this case applying changes by single process should
>not
>>>> be a bottleneck.
>>>> But if we are using distributed transaction manager to provide
>global
>>>> consistency, then applying transaction by one worker  leads to very
>bad
>>>> performance and what is worser: cause unintended serialization of
>>>> transactions, which is not taken in account by distributed deadlock
>>>> detection algorithm and so can cause
>>>> undetected deadlocks.
>>>>
>>>> So I have implemented pool of background workers which can apply
>>>> transactions concurrently.
>>>> It works and shows acceptable performance. But now I am thinking
>about
>>>> HA and tracking origin LSNs which are needed to correctly specify
>slot
>>>> position in case of recovery. And there is a problem: as far as I
>>>> understand to correctly record origin LSN in WAL and advance slot
>>>> position it is necessary to setup session
>>>> using replorigin_session_setup. It is not so convenient in case of
>using
>>>> pool of background workers, because we have to setup session for
>each
>>>> commit.
>>>> But the main problem is that for each slot session can be
>associated
>>>> only with one process:
>>>>
>>>>          else if (curstate->acquired_by != 0)
>>>>          {
>>>>              ereport(ERROR,
>>>>                      (errcode(ERRCODE_OBJECT_IN_USE),
>>>>               errmsg("replication identifier %d is already active
>for
>>>> PID %d",
>>>>                      curstate->roident, curstate->acquired_by)));
>>>>          }
>>>>
>>>> Which once again means that there can be only one process applying
>>>> changes.
>>>>
>>>
>>> That's not true, all it means is that you can do
>>> replorigin_session_setup for same origin only in one process but you
>>> don't need to have it setup for session to update it, the
>>> replorigin_advance() works just fine.
>>
>> But RecordTransactionCommit is using replorigin_session_advance, not
>> replorigin_advance.
>
>Only when the origin is actually setup for the current session. You
>need 
>to call the replorigin_advance yourself from your apply code.

That's problematic from a durability POV.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Applying logical replication changes by more than one process

From

Petr Jelinek

Date:

21 March 2016, 13:18:40


On 21/03/16 14:15, Andres Freund wrote:
>
>
> On March 21, 2016 2:08:54 PM GMT+01:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> On 21/03/16 13:44, Konstantin Knizhnik wrote:
>>>
>>>
>>> On 21.03.2016 15:10, Petr Jelinek wrote:
>>>> Hi,
>>>>
>>>> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>>>>> Hi,
>>>>>
>>>>> I am trying to use logical replication mechanism in implementation
>> of
>>>>> PostgreSQL multimaster and faced with one conceptual problem.
>>>>> Originally logical replication was intended to support asynchronous
>>>>> replication. In this case applying changes by single process should
>> not
>>>>> be a bottleneck.
>>>>> But if we are using distributed transaction manager to provide
>> global
>>>>> consistency, then applying transaction by one worker  leads to very
>> bad
>>>>> performance and what is worser: cause unintended serialization of
>>>>> transactions, which is not taken in account by distributed deadlock
>>>>> detection algorithm and so can cause
>>>>> undetected deadlocks.
>>>>>
>>>>> So I have implemented pool of background workers which can apply
>>>>> transactions concurrently.
>>>>> It works and shows acceptable performance. But now I am thinking
>> about
>>>>> HA and tracking origin LSNs which are needed to correctly specify
>> slot
>>>>> position in case of recovery. And there is a problem: as far as I
>>>>> understand to correctly record origin LSN in WAL and advance slot
>>>>> position it is necessary to setup session
>>>>> using replorigin_session_setup. It is not so convenient in case of
>> using
>>>>> pool of background workers, because we have to setup session for
>> each
>>>>> commit.
>>>>> But the main problem is that for each slot session can be
>> associated
>>>>> only with one process:
>>>>>
>>>>>           else if (curstate->acquired_by != 0)
>>>>>           {
>>>>>               ereport(ERROR,
>>>>>                       (errcode(ERRCODE_OBJECT_IN_USE),
>>>>>                errmsg("replication identifier %d is already active
>> for
>>>>> PID %d",
>>>>>                       curstate->roident, curstate->acquired_by)));
>>>>>           }
>>>>>
>>>>> Which once again means that there can be only one process applying
>>>>> changes.
>>>>>
>>>>
>>>> That's not true, all it means is that you can do
>>>> replorigin_session_setup for same origin only in one process but you
>>>> don't need to have it setup for session to update it, the
>>>> replorigin_advance() works just fine.
>>>
>>> But RecordTransactionCommit is using replorigin_session_advance, not
>>> replorigin_advance.
>>
>> Only when the origin is actually setup for the current session. You
>> need
>> to call the replorigin_advance yourself from your apply code.
>
> That's problematic from a durability POV.
>

Huh? How come?

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Applying logical replication changes by more than one process

From

Andres Freund

Date:

21 March 2016, 13:25:26

On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
> On 21/03/16 14:15, Andres Freund wrote:
> >>Only when the origin is actually setup for the current session. You
> >>need
> >>to call the replorigin_advance yourself from your apply code.
> >
> >That's problematic from a durability POV.
> >
> 
> Huh? How come?

If you use the session mechanism the replication progress is synced with
the apply process, even if there are crashes. Crash recovery updates the
progress.  There's no such interlock with apply otherwise, and I don't
see how you can build one with reasonable effort.

Re: Applying logical replication changes by more than one process

From

Petr Jelinek

Date:

21 March 2016, 13:30:27

On 21/03/16 14:25, Andres Freund wrote:
> On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
>> On 21/03/16 14:15, Andres Freund wrote:
>>>> Only when the origin is actually setup for the current session. You
>>>> need
>>>> to call the replorigin_advance yourself from your apply code.
>>>
>>> That's problematic from a durability POV.
>>>
>>
>> Huh? How come?
>
> If you use the session mechanism the replication progress is synced with
> the apply process, even if there are crashes. Crash recovery updates the
> progress.  There's no such interlock with apply otherwise, and I don't
> see how you can build one with reasonable effort.
>

Ah you mean because with wal_log=true the origin advance is in different 
WAL record than commit? OK yeah you might be one transaction behind 
then, true.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Applying logical replication changes by more than one process

From

konstantin knizhnik

Date:

22 March 2016, 06:32:37

On Mar 21, 2016, at 4:30 PM, Petr Jelinek wrote:

On 21/03/16 14:25, Andres Freund wrote:
On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
On 21/03/16 14:15, Andres Freund wrote:
Only when the origin is actually setup for the current session. You
need
to call the replorigin_advance yourself from your apply code.

That's problematic from a durability POV.

Huh? How come?

If you use the session mechanism the replication progress is synced with
the apply process, even if there are crashes. Crash recovery updates the
progress. There's no such interlock with apply otherwise, and I don't
see how you can build one with reasonable effort.

Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.

It actually means that we can not enforce database consistency. If we do replorigin_advance before commit and then crash happen, then we will loose some changes.

If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).

Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process, doesn't it?

I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately can't consider such approach as elegant.

Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?

Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all max_replication_slots.

It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.

But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.

I didn't think much about such configuration - may be it possible to propose more efficient mechanism for replication in this case.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Applying logical replication changes by more than one process

From

Craig Ringer

Date:

22 March 2016, 07:10:28

On 22 March 2016 at 14:32, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:

Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.

It actually means that we can not enforce database consistency. If we do replorigin_advance before commit and then crash happen, then we will loose some changes.
If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).
Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process, doesn't it?

Yes.

How would you expect it to work if you attempted to replorigin_advance without a session? From multiple concurrent backends?

Parallel apply is complicated business. You have to make sure you apply xacts in an order that's free from deadlocks and from insert/delete anomalies - though you can at least detect those, ERROR that xact and all subsequent ones, and retry. For progress tracking to be consistent and correct you'd have to make sure you committed strictly in the same order as upstream. Just before each commit you can set the origin LSN and advance the replication origin, which will commit atomically along with the commit it confirms. I don't really see the problem.

I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately can't consider such approach as elegant.
Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?

Especially since most slot changes LWLock- and/or spinlock-protected already.

The client would have to manage replay confirmations appropriately so that it doesn't confirm past the point where some other connection still needs it.

We'd have to expose a "slot" column in pg_stat_replication and remove the "pid" column from pg_replication_slots to handle the 1:n relationship between slot clients and slots, and it'd be a pain to show which normal user backends were using a slot. Not really sure how to handle that.

To actually make this useful would require a lot more though. A way to request that replay start from a new LSN without a full disconnect/reconnect each time. Client-side parallel consume/apply. Inter-transaction ordering information so the client can work out a viable xact apply order (possibly using SSI information per the discussion with Kevin?). Etc.

I haven't really looked into this and I suspect there are some hairy areas involved in replaying a slot from more than one client. The reason I'm interested in it personally is for initial replica state setup as Oleksandr prototyped and described earlier. We could attach to the slot's initial snapshot then issue a new replication command that, given a table name or oid, scans the table from the snapshot and passes each tuple to a new callback (like, but not the same as, the insert callback) on the output plugin.

That way clients could parallel-copy the initial state of the DB across the same replication protocol they then consume new changes from, with no need to make normal libpq connections and COPY initial state.

I'm interested in being able to do parallel receive of new changes from the slot too, but suspect that'd be a bunch harder.

Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all max_replication_slots.
It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.
But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.

Sounds like premature optimisation. Deal with it if it comes up in profiles in scale testing with 100 clients. I'll be surprised if it does.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Applying logical replication changes by more than one process

From

Petr Jelinek

Date:

22 March 2016, 08:14:45

On 22/03/16 07:32, konstantin knizhnik wrote:
>
> On Mar 21, 2016, at 4:30 PM, Petr Jelinek wrote:
>
>> On 21/03/16 14:25, Andres Freund wrote:
>>> On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
>>>> On 21/03/16 14:15, Andres Freund wrote:
>>>>>> Only when the origin is actually setup for the current session. You
>>>>>> need
>>>>>> to call the replorigin_advance yourself from your apply code.
>>>>>
>>>>> That's problematic from a durability POV.
>>>>>
>>>>
>>>> Huh? How come?
>>>
>>> If you use the session mechanism the replication progress is synced with
>>> the apply process, even if there are crashes. Crash recovery updates the
>>> progress.  There's no such interlock with apply otherwise, and I don't
>>> see how you can build one with reasonable effort.
>>>
>>
>> Ah you mean because with wal_log=true the origin advance is in
>> different WAL record than commit? OK yeah you might be one transaction
>> behind then, true.
>
> It actually means that we can not enforce database consistency. If we do
> replorigin_advance  before commit and then crash happen, then we will
> loose some changes.
> If we call replorigin_advance after commit but crash happen before, then
> some changes can be applied multiple times. For example we can insert
> some record twice (if there are no unique constraints).
> Look likes the only working scenario is to setup replication session for
> each commit and use locking to prevent concurrent session setup for the
> same slot by multiple process,  doesn't it?

You can do that, or you can move the tracking to the receiving process 
and spill the data to the disk (hurts IO obviously), or save the 
progress to table (also hurts IO), or write patch which solves this (no 
idea how though).

>
> Also I concern about using sequential search for slot location
> in replorigin_session_setup and many other functions - there is loop
> through all max_replication_slots.
> It seems to be not a problem when number of slots is less than 10. For
> multimaster this assumption is true - even Oracle RAC rarely has
> two-digit number of nodes.
> But if we want to perform sharding and use logical replication for
> providing redundancy, then number of nodes and slots can be essentially
> larger.
> I didn't think much about such configuration - may be it possible to
> propose more efficient mechanism for replication in this case.
>

And each slot means connection with logical decoding attached to it so 
you don't really want to have thousands of those anyway. I think you'll 
hit other problems faster than loop over slots becomes problem if you 
plan to keep all of them active.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Applying logical replication changes by more than one process

From

konstantin knizhnik

Date:

22 March 2016, 08:22:03

On Mar 22, 2016, at 10:10 AM, Craig Ringer wrote:

On 22 March 2016 at 14:32, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:

Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.

It actually means that we can not enforce database consistency. If we do replorigin_advance before commit and then crash happen, then we will loose some changes.
If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).
Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process, doesn't it?

Yes.

How would you expect it to work if you attempted to replorigin_advance without a session? From multiple concurrent backends?

I would not work. But I wonder why I would need to call replorigin_advance without a session.

Please excuse me, I am not thinking now about the general case of using logical replication, I just focused on multimaster.

What I need is some efficient, durable and atomic mechanism for applying changes.

I see only two ways to provide atomicity:

1. Tracking of origins should be done inside xact as part of normal commit.

2. Use custom WAL records.

1) is easier now and it really works if I correctly synchronize access to slots. And surprisingly it even doesn't add substantial overhead.

Parallel apply is complicated business. You have to make sure you apply xacts in an order that's free from deadlocks and from insert/delete anomalies - though you can at least detect those, ERROR that xact and all subsequent ones, and retry.

Well, this is exactly what our multimaster does. We do not try to enforce order of applying xacts. But we detect global deadlocks and use 2PC to provide data consistency.

So it is not task of logical replication, it is done by DTM overriding visibility checks and transaction commit protocol using XTM.

For progress tracking to be consistent and correct you'd have to make sure you committed strictly in the same order as upstream. Just before each commit you can set the origin LSN and advance the replication origin, which will commit atomically along with the commit it confirms. I don't really see the problem.

Sorry, I do not completely understand you. What you mean by "will commit atomically along with the commit it confirms"? How this atomicity will be enforced?

I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately can't consider such approach as elegant.
Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?

Especially since most slot changes LWLock- and/or spinlock-protected already.

The client would have to manage replay confirmations appropriately so that it doesn't confirm past the point where some other connection still needs it.

We'd have to expose a "slot" column in pg_stat_replication and remove the "pid" column from pg_replication_slots to handle the 1:n relationship between slot clients and slots, and it'd be a pain to show which normal user backends were using a slot. Not really sure how to handle that.

To actually make this useful would require a lot more though. A way to request that replay start from a new LSN without a full disconnect/reconnect each time. Client-side parallel consume/apply. Inter-transaction ordering information so the client can work out a viable xact apply order (possibly using SSI information per the discussion with Kevin?). Etc.

I haven't really looked into this and I suspect there are some hairy areas involved in replaying a slot from more than one client. The reason I'm interested in it personally is for initial replica state setup as Oleksandr prototyped and described earlier. We could attach to the slot's initial snapshot then issue a new replication command that, given a table name or oid, scans the table from the snapshot and passes each tuple to a new callback (like, but not the same as, the insert callback) on the output plugin.

That way clients could parallel-copy the initial state of the DB across the same replication protocol they then consume new changes from, with no need to make normal libpq connections and COPY initial state.

I'm interested in being able to do parallel receive of new changes from the slot too, but suspect that'd be a bunch harder.

Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all max_replication_slots.
It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.
But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.

Sounds like premature optimisation. Deal with it if it comes up in profiles in scale testing with 100 clients. I'll be surprised if it does.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Applying logical replication changes by more than one process

From

konstantin knizhnik

Date:

22 March 2016, 08:32:24

On Mar 22, 2016, at 11:14 AM, Petr Jelinek wrote:

And each slot means connection with logical decoding attached to it so you don't really want to have thousands of those anyway. I think you'll hit other problems faster than loop over slots becomes problem if you plan to keep all of them active.

Assume that cluster have thousands of nodes and we use sharding to scatter data through cluster nodes.

But to provide HA we want to perform sharding with some level of redundancy, for example save the same record at 3 different nodes.

Once possible approach (pg_shard) is to execute the same query at three different shards.

But there is no warranty that result of execution will be the same at all nodes.

Alternative approach is to execute transaction at one node and then replicate it using logical replication to replicas.

So we do not perform logical replication to all 1000 nodes. Just to 2 of them. But each time it will be different pair of nodes. So we still need to have 1000 active replication slots.

May be logical replication can not be used at all in such scenario - I have not thought much about it yet. Our first step will be multimaster without sharding.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers