Re: row filtering for logical replication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: row filtering for logical replication
Date
Msg-id CAHut+PvTVTXa+HGK-=78ugAS+c+jQPxEhs2FTdTNJzgi1TuJSw@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
> Are there any recent performance evaluations of the overhead of row filters? I
> think it'd be good to get some numbers comparing:
>
> 1) $workload with master
> 2) $workload with patch, but no row filters
> 3) $workload with patch, row filter matching everything
> 4) $workload with patch, row filter matching few rows
>
> For workload I think it'd be worth testing:
> a) bulk COPY/INSERT into one table
> b) Many transactions doing small modifications to one table
> c) Many transactions targetting many different tables
> d) Interspersed DDL + small changes to a table
>

We have collected the performance data results for all the different
workloads [*].

The test strategy is now using pg_recvlogical with steps as Andres
suggested [1].

Note - "Allow 0%" and "Allow 100%" are included as tests cases, but in
practice, a user is unlikely to deliberately use a filter that allows
nothing to pass through it, or allows everything to pass through it.

PSA the bar charts of the results. All other details are below.

~~~~~

RESULTS - workload "a" (v76)
======================
HEAD        18.40
No Filters  18.86
Allow 100%  17.96
Allow 75%   16.39
Allow 50%   14.60
Allow 25%   11.23
Allow 0%    9.41

Observations for "a":
- Using row filters has minimal overhead in the worst case (compare
HEAD versus "Allow 100%")
- As more % data is filtered out (less is replicated) then the times decrease

RESULTS - workload "b" (v76)
======================
HEAD        2.30
No Filters  1.96
Allow 100%  1.99
Allow 75%   1.65
Allow 50%   1.35
Allow 25%   1.17
Allow 0%    0.84

Observations for "b":
- Using row filters has minimal overhead in the worst case (compare
HEAD versus "Allow 100%")
- As more % data is filtered out (less is replicated) then the times decrease


RESULTS - workload "c" (v76)
======================
HEAD        20.40
No Filters  19.85
Allow 100%  20.94
Allow 75%   17.26
Allow 50%   16.13
Allow 25%   13.32
Allow 0%    10.33

Observations for "c":
- Using row filters has minimal overhead in the worst case (compare
HEAD versus "Allow 100%")
- As more % data is filtered out (less is replicated) then the times decrease

RESULTS - workload "d" (v80)
======================
HEAD        6.81
No Filters  6.85
Allow 100%  7.61
Allow 75%   7.80
Allow 50%   6.46
Allow 25%   6.35
Allow 0%    6.46

Observations for "d":
- As more % data is filtered out (less is replicated) then the times
became less than HEAD, but not much.
- Improvements due to row filtering are less noticeable (e.g. HEAD
versus "Allow 0%") for this workload; we attribute this to the fact
that for this script there are fewer rows getting replicated in the
1st place so we are only comparing 1000 x INSERT/UPDATE against 0 x
INSERT/UPDATE.

~~~~~~

Details - workload "a"
=======================

CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test;
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

INSERT INTO test SELECT i, i::text, row_to_json(row(i)) FROM
generate_series(1,1000001)i;


Details - workload "b"
======================

CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test;
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

DO
$do$
BEGIN
FOR i IN 0..1000001 BY 10 LOOP
INSERT INTO test VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 1000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;


Details - workload "c"
======================

CREATE TABLE test1 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test2 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test3 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test4 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test5 (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test1, test2, test3, test4, test5;
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 0), test2 WHERE
(key > 0), test3 WHERE (key > 0), test4 WHERE (key > 0), test5 WHERE
(key > 0);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 250000), test2
WHERE (key > 250000), test3 WHERE (key > 250000), test4 WHERE (key >
250000), test5 WHERE (key > 250000);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 500000), test2
WHERE (key > 500000), test3 WHERE (key > 500000), test4 WHERE (key >
500000), test5 WHERE (key > 500000);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 750000), test2
WHERE (key > 750000), test3 WHERE (key > 750000), test4 WHERE (key >
750000), test5 WHERE (key > 750000);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 1000000), test2
WHERE (key > 1000000), test3 WHERE (key > 1000000), test4 WHERE (key >
1000000), test5 WHERE (key > 1000000);

DO
$do$
BEGIN
FOR i IN 0..1000001 BY 10 LOOP
-- test1
INSERT INTO test1 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test1 SET value = 'FOO' WHERE key = i;
-- test2
INSERT INTO test2 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test2 SET value = 'FOO' WHERE key = i;
-- test3
INSERT INTO test3 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test3 SET value = 'FOO' WHERE key = i;
-- test4
INSERT INTO test4 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test4 SET value = 'FOO' WHERE key = i;
-- test5
INSERT INTO test5 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test5 SET value = 'FOO' WHERE key = i;

IF I % 1000 = 0 THEN
-- raise notice 'commit: %', i;
COMMIT;
END IF;
END LOOP;
END
$do$;

Details - workload "d"
======================

CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test;
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

DO
$do$
BEGIN
FOR i IN 0..1000000 BY 1000 LOOP
ALTER TABLE test ALTER COLUMN value1 TYPE varchar(30);
INSERT INTO test VALUES(i,'BAH','BAH', row_to_json(row(i)));
ALTER TABLE test ALTER COLUMN value1 TYPE text;
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 10000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;

------
[*] This post repeats some results for already sent for workloads
"a","b","c"; this is so the complete set is now all here in one place
[1] https://www.postgresql.org/message-id/20220203182922.344fhhqzjp2ah6yp%40alap3.anarazel.de

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Make mesage at end-of-recovery less scary.
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Support escape sequence for cluster_name in postgres_fdw.application_name