Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | CAHut+PvTVTXa+HGK-=78ugAS+c+jQPxEhs2FTdTNJzgi1TuJSw@mail.gmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
> Are there any recent performance evaluations of the overhead of row filters? I > think it'd be good to get some numbers comparing: > > 1) $workload with master > 2) $workload with patch, but no row filters > 3) $workload with patch, row filter matching everything > 4) $workload with patch, row filter matching few rows > > For workload I think it'd be worth testing: > a) bulk COPY/INSERT into one table > b) Many transactions doing small modifications to one table > c) Many transactions targetting many different tables > d) Interspersed DDL + small changes to a table > We have collected the performance data results for all the different workloads [*]. The test strategy is now using pg_recvlogical with steps as Andres suggested [1]. Note - "Allow 0%" and "Allow 100%" are included as tests cases, but in practice, a user is unlikely to deliberately use a filter that allows nothing to pass through it, or allows everything to pass through it. PSA the bar charts of the results. All other details are below. ~~~~~ RESULTS - workload "a" (v76) ====================== HEAD 18.40 No Filters 18.86 Allow 100% 17.96 Allow 75% 16.39 Allow 50% 14.60 Allow 25% 11.23 Allow 0% 9.41 Observations for "a": - Using row filters has minimal overhead in the worst case (compare HEAD versus "Allow 100%") - As more % data is filtered out (less is replicated) then the times decrease RESULTS - workload "b" (v76) ====================== HEAD 2.30 No Filters 1.96 Allow 100% 1.99 Allow 75% 1.65 Allow 50% 1.35 Allow 25% 1.17 Allow 0% 0.84 Observations for "b": - Using row filters has minimal overhead in the worst case (compare HEAD versus "Allow 100%") - As more % data is filtered out (less is replicated) then the times decrease RESULTS - workload "c" (v76) ====================== HEAD 20.40 No Filters 19.85 Allow 100% 20.94 Allow 75% 17.26 Allow 50% 16.13 Allow 25% 13.32 Allow 0% 10.33 Observations for "c": - Using row filters has minimal overhead in the worst case (compare HEAD versus "Allow 100%") - As more % data is filtered out (less is replicated) then the times decrease RESULTS - workload "d" (v80) ====================== HEAD 6.81 No Filters 6.85 Allow 100% 7.61 Allow 75% 7.80 Allow 50% 6.46 Allow 25% 6.35 Allow 0% 6.46 Observations for "d": - As more % data is filtered out (less is replicated) then the times became less than HEAD, but not much. - Improvements due to row filtering are less noticeable (e.g. HEAD versus "Allow 0%") for this workload; we attribute this to the fact that for this script there are fewer rows getting replicated in the 1st place so we are only comparing 1000 x INSERT/UPDATE against 0 x INSERT/UPDATE. ~~~~~~ Details - workload "a" ======================= CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE PUBLICATION pub_1 FOR TABLE test; CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed INSERT INTO test SELECT i, i::text, row_to_json(row(i)) FROM generate_series(1,1000001)i; Details - workload "b" ====================== CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE PUBLICATION pub_1 FOR TABLE test; CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed DO $do$ BEGIN FOR i IN 0..1000001 BY 10 LOOP INSERT INTO test VALUES(i,'BAH', row_to_json(row(i))); UPDATE test SET value = 'FOO' WHERE key = i; IF I % 1000 = 0 THEN COMMIT; END IF; END LOOP; END $do$; Details - workload "c" ====================== CREATE TABLE test1 (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE TABLE test2 (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE TABLE test3 (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE TABLE test4 (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE TABLE test5 (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE PUBLICATION pub_1 FOR TABLE test1, test2, test3, test4, test5; CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 0), test2 WHERE (key > 0), test3 WHERE (key > 0), test4 WHERE (key > 0), test5 WHERE (key > 0); CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 250000), test2 WHERE (key > 250000), test3 WHERE (key > 250000), test4 WHERE (key > 250000), test5 WHERE (key > 250000); CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 500000), test2 WHERE (key > 500000), test3 WHERE (key > 500000), test4 WHERE (key > 500000), test5 WHERE (key > 500000); CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 750000), test2 WHERE (key > 750000), test3 WHERE (key > 750000), test4 WHERE (key > 750000), test5 WHERE (key > 750000); CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 1000000), test2 WHERE (key > 1000000), test3 WHERE (key > 1000000), test4 WHERE (key > 1000000), test5 WHERE (key > 1000000); DO $do$ BEGIN FOR i IN 0..1000001 BY 10 LOOP -- test1 INSERT INTO test1 VALUES(i,'BAH', row_to_json(row(i))); UPDATE test1 SET value = 'FOO' WHERE key = i; -- test2 INSERT INTO test2 VALUES(i,'BAH', row_to_json(row(i))); UPDATE test2 SET value = 'FOO' WHERE key = i; -- test3 INSERT INTO test3 VALUES(i,'BAH', row_to_json(row(i))); UPDATE test3 SET value = 'FOO' WHERE key = i; -- test4 INSERT INTO test4 VALUES(i,'BAH', row_to_json(row(i))); UPDATE test4 SET value = 'FOO' WHERE key = i; -- test5 INSERT INTO test5 VALUES(i,'BAH', row_to_json(row(i))); UPDATE test5 SET value = 'FOO' WHERE key = i; IF I % 1000 = 0 THEN -- raise notice 'commit: %', i; COMMIT; END IF; END LOOP; END $do$; Details - workload "d" ====================== CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value)); CREATE PUBLICATION pub_1 FOR TABLE test; CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed DO $do$ BEGIN FOR i IN 0..1000000 BY 1000 LOOP ALTER TABLE test ALTER COLUMN value1 TYPE varchar(30); INSERT INTO test VALUES(i,'BAH','BAH', row_to_json(row(i))); ALTER TABLE test ALTER COLUMN value1 TYPE text; UPDATE test SET value = 'FOO' WHERE key = i; IF I % 10000 = 0 THEN COMMIT; END IF; END LOOP; END $do$; ------ [*] This post repeats some results for already sent for workloads "a","b","c"; this is so the complete set is now all here in one place [1] https://www.postgresql.org/message-id/20220203182922.344fhhqzjp2ah6yp%40alap3.anarazel.de Kind Regards, Peter Smith. Fujitsu Australia
Attachment
pgsql-hackers by date: