Thread: Optimization of vacuum for logical replication

Optimization of vacuum for logical replication

From
Konstantin Knizhnik
Date:
Hi, hackers.

Right now if replication level is rgeater or equal than "replica", 
vacuum  of relation copies all its data to WAL:


     /*
      * We need to log the copied data in WAL iff WAL archiving/streaming is
      * enabled AND it's a WAL-logged rel.
      */
     use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);

Obviously we have to do it for physical replication and WAL archiving.
But why do we need to do so expensive operation (actually copy all table 
data three times) if we use logical replication?
Logically vacuum doesn't change relation so there is no need to write 
any data to the log and process it by WAL sender.

I wonder if we can check that

1. wal_revel is "logical"
2. There are no physical replication slots
3. WAL archiving is disables

and in this cases do not write cloned relation to the WAL?
Small patch implementing such behavior is attached to this mail.
It allows to significantly reduce WAL size when performing vacuum at 
multimaster, which uses logical replication between cluster nodes.

What can be wrong with such optimization?

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Optimization of vacuum for logical replication

From
Bernd Helmle
Date:
Am Mittwoch, den 21.08.2019, 12:20 +0300 schrieb Konstantin Knizhnik:
> I wonder if we can check that
> 
> 1. wal_revel is "logical"
> 2. There are no physical replication slots
> 3. WAL archiving is disables

Not sure i get that correctly, i can still have a physical standby
without replication slots connected to such an instance. How would your
idea handle this situation?

    Bernd





Re: Optimization of vacuum for logical replication

From
Konstantin Knizhnik
Date:

On 21.08.2019 12:34, Bernd Helmle wrote:
> Am Mittwoch, den 21.08.2019, 12:20 +0300 schrieb Konstantin Knizhnik:
>> I wonder if we can check that
>>
>> 1. wal_revel is "logical"
>> 2. There are no physical replication slots
>> 3. WAL archiving is disables
> Not sure i get that correctly, i can still have a physical standby
> without replication slots connected to such an instance. How would your
> idea handle this situation?

Yes, it is possible to have physical replica withotu replication slot.
But it is not safe, because there is always a risk that lag between 
master and replica becomes larger than size of WAL kept at master.
Also I can't believe that  DBA which explicitly sets wal_level is set to 
logical will use streaming replication without associated replication slot.

And certainly it is possible to add GUC which controls such optimization.

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Optimization of vacuum for logical replication

From
Sergei Kornilov
Date:
Hello

> Also I can't believe that  DBA which explicitly sets wal_level is set to
> logical will use streaming replication without associated replication slot.

I am.

> Yes, it is possible to have physical replica withotu replication slot.
> But it is not safe, because there is always a risk that lag between
> master and replica becomes larger than size of WAL kept at master.

Just an example: replica for manual queries, QA purposes or for something else that is not an important part of the
system.
If I use replication slots - my risk is out-of-space on primary and therefore shutdown of primary. With downtime for
application.
If I use wal_keep_segments instead - I have some limited (and usually stable) amount of WAL but risk to have outdated
replica.

I prefer to have an outdated replica but primary is more safe. Its OK for me to just take fresh pg_basebackup from
anotherreplica.
 
And application want to use logical replication so wal_level = logical.

If we not want support such usecase - we need explicitly forbid replication without replication slots.

regards, Sergei



Re: Optimization of vacuum for logical replication

From
Bernd Helmle
Date:
Am Mittwoch, den 21.08.2019, 13:26 +0300 schrieb Konstantin Knizhnik:
> Yes, it is possible to have physical replica withotu replication
> slot.
> But it is not safe, because there is always a risk that lag between 
> master and replica becomes larger than size of WAL kept at master.

Sure, but that doesn't mean use cases for this aren't real.

> Also I can't believe that  DBA which explicitly sets wal_level is set
> to 
> logical will use streaming replication without associated replication
> slot.

Well, i know people doing exactly this, for various reasons (short
living replicas, logical replicated table sets for reports, ...). The
fact that they can have loosely coupled replicas with either physical
or logical replication is a feature they'd really miss....

    Bernd




Re: Optimization of vacuum for logical replication

From
Konstantin Knizhnik
Date:

On 21.08.2019 14:45, Bernd Helmle wrote:
> Am Mittwoch, den 21.08.2019, 13:26 +0300 schrieb Konstantin Knizhnik:
>> Yes, it is possible to have physical replica withotu replication
>> slot.
>> But it is not safe, because there is always a risk that lag between
>> master and replica becomes larger than size of WAL kept at master.
> Sure, but that doesn't mean use cases for this aren't real.
>
>> Also I can't believe that  DBA which explicitly sets wal_level is set
>> to
>> logical will use streaming replication without associated replication
>> slot.
> Well, i know people doing exactly this, for various reasons (short
> living replicas, logical replicated table sets for reports, ...). The
> fact that they can have loosely coupled replicas with either physical
> or logical replication is a feature they'd really miss....
>
>     Bernd
>

Ok, you convinced me that there are cases when people want to combine 
logical replication with streaming replication without slot.
But is it acceptable to have GUC variable (disabled by default) which 
allows to use this optimizations?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Optimization of vacuum for logical replication

From
Kyotaro Horiguchi
Date:
Hello.

At Wed, 21 Aug 2019 18:06:52 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in
<968fc591-51d3-fd74-8a55-40aa770baa3a@postgrespro.ru>
> Ok, you convinced me that there are cases when people want to combine
> logical replication with streaming replication without slot.
> But is it acceptable to have GUC variable (disabled by default) which
> allows to use this optimizations?

The odds are quite high. Couldn't we introduce a new wal_level
value instead?

wal_level = logical_only


I think this thread shows that logical replication no longer is a
superset(?) of physical replication.  I thougt that we might be
able to change wal_level from scalar to bitmap but it breaks
backward compatibility..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Optimization of vacuum for logical replication

From
Konstantin Knizhnik
Date:

On 22.08.2019 6:13, Kyotaro Horiguchi wrote:
> Hello.
>
> At Wed, 21 Aug 2019 18:06:52 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in
<968fc591-51d3-fd74-8a55-40aa770baa3a@postgrespro.ru>
>> Ok, you convinced me that there are cases when people want to combine
>> logical replication with streaming replication without slot.
>> But is it acceptable to have GUC variable (disabled by default) which
>> allows to use this optimizations?
> The odds are quite high. Couldn't we introduce a new wal_level
> value instead?
>
> wal_level = logical_only
>
>
> I think this thread shows that logical replication no longer is a
> superset(?) of physical replication.  I thougt that we might be
> able to change wal_level from scalar to bitmap but it breaks
> backward compatibility..
>
> regards.
>
I think that introducing new wal_level is good idea.
There are a lot of other places (except vacuum) where we insert in the 
log information which is not needed for logical decoding.
Instead of changing all places in code where this information is 
inserted, we can filter it at xlog level (xlog.c).
My only concern is how much incompatibilities will be caused by 
introducing new wal level.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Optimization of vacuum for logical replication

From
Konstantin Knizhnik
Date:

On 22.08.2019 6:13, Kyotaro Horiguchi wrote:
> Hello.
>
> At Wed, 21 Aug 2019 18:06:52 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in
<968fc591-51d3-fd74-8a55-40aa770baa3a@postgrespro.ru>
>> Ok, you convinced me that there are cases when people want to combine
>> logical replication with streaming replication without slot.
>> But is it acceptable to have GUC variable (disabled by default) which
>> allows to use this optimizations?
> The odds are quite high. Couldn't we introduce a new wal_level
> value instead?
>
> wal_level = logical_only
>
>
> I think this thread shows that logical replication no longer is a
> superset(?) of physical replication.  I thougt that we might be
> able to change wal_level from scalar to bitmap but it breaks
> backward compatibility..
>
> regards.
>

I can propose the following patch introducing new level logical_only.
I will be please to receive comments concerning adding new wal_level and 
possible problems caused by it.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment