Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1+xQb06NGs6Y7OzwMtKYYixEqR8tdWV5THAVE4SAqNrDg@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Mon, Jul 12, 2021 at 1:09 AM Euler Taveira <euler@eulerto.com> wrote:
>
> I did another measure using as baseline the previous patch (v16).
>
> without cache (v16)
> ---------------------------
>
> mean:           1.46 us
> stddev:         2.13 us
> median:         1.39 us
> min-max:        [0.69 .. 1456.69] us
> percentile(99): 3.15 us
> mode:           0.91 us
>
> with cache (v18)
> -----------------------
>
> mean:           0.63 us
> stddev:         1.07 us
> median:         0.55 us
> min-max:        [0.29 .. 844.87] us
> percentile(99): 1.38 us
> mode:           0.41 us
>
> It represents -57%. It is a really good optimization for just a few extra lines
> of code.
>

Good improvement but I think it is better to measure the performance
by using synchronous_replication by setting the subscriber as
standby_synchronous_names, which will provide the overall saving of
time. We can probably see when the timings when no rows are filtered,
when 10% rows are filtered when 30% are filtered and so on.

I think the way caching has been done in the patch is a bit
inefficient. Basically, it always invalidates and rebuilds the
expressions even though some unrelated operation has happened on
publication. For example, say publication has initially table t1 with
rowfilter r1 for which we have cached the state. Now you altered
publication and added table t2, it will invalidate the entire state of
t1 as well. I think we can avoid that if we invalidate the rowfilter
related state only on relcache invalidation i.e in
rel_sync_cache_relation_cb and save it the very first time we prepare
the expression. In that case, we don't need to do it in advance when
preparing relsyncentry, this will have the additional advantage that
we won't spend cycles on preparing state unless it is required (for
truncate we won't require row_filtering, so it won't be prepared).

Few other things, I have noticed:
1.
I am seeing tupledesc leak by following below steps:
ERROR:  tupdesc reference 00000000008D7D18 is not owned by resource
owner TopTransaction
CONTEXT:  slot "tap_sub", output plugin "pgoutput", in the change
callback, associated LSN 0/170BD50

Publisher
CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 WHERE (a > 1000
AND b <> 'filtered');

Subscriber
CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE SUBSCRIPTION tap_sub
         CONNECTION 'host=localhost port=5432 dbname=postgres'
        PUBLICATION tap_pub_1;

Publisher
INSERT INTO tab_rowfilter_1 (a, b) VALUES (1980, 'not filtered');
Alter table tab_rowfilter_1 drop column b cascade;
INSERT INTO tab_rowfilter_1 (a) VALUES (1982);

2.
postgres=# Alter table tab_rowfilter_1 alter column b set data type varchar;
ERROR:  unexpected object depending on column: publication of table
tab_rowfilter_1 in publication tap_pub_1

I think for this you need to change ATExecAlterColumnType to handle
the publication case.


-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Support kerberos authentication for postgres_fdw
Next
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.