Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Column Filtering in Logical Replication |
Date | |
Msg-id | 78b8d385-57cb-9a3f-cfe5-9f300f15eced@enterprisedb.com Whole thread Raw |
In response to | Re: Column Filtering in Logical Replication (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Column Filtering in Logical Replication
|
List | pgsql-hackers |
On 3/10/22 19:17, Tomas Vondra wrote: > On 3/9/22 11:12, houzj.fnst@fujitsu.com wrote: >> Hi, >> >> Here are some tests and results about the table sync query of >> column filter patch and row filter. >> >> 1) multiple publications which publish schema of parent table and partition. >> ----pub >> create schema s1; >> create table s1.t (a int, b int, c int) partition by range (a); >> create table t_1 partition of s1.t for values from (1) to (10); >> create publication pub1 for all tables in schema s1; >> create publication pub2 for table t_1(b); >> >> ----sub >> - prepare tables >> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub1, pub2; >> >> When doing table sync for 't_1', the column list will be (b). I think it should >> be no filter because table t_1 is also published via ALL TABLES IN SCHMEA >> publication. >> >> For Row Filter, it will use no filter for this case. >> >> >> 2) one publication publishes both parent and child >> ----pub >> create table t (a int, b int, c int) partition by range (a); >> create table t_1 partition of t for values from (1) to (10) >> partition by range (a); >> create table t_2 partition of t_1 for values from (1) to (10); >> >> create publication pub2 for table t_1(a), t_2 >> with (PUBLISH_VIA_PARTITION_ROOT); >> >> ----sub >> - prepare tables >> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub2; >> >> When doing table sync for table 't_1', it has no column list. I think the >> expected column list is (a). >> >> For Row Filter, it will use the row filter of the top most parent table(t_1) in >> this case. >> >> >> 3) one publication publishes both parent and child >> ----pub >> create table t (a int, b int, c int) partition by range (a); >> create table t_1 partition of t for values from (1) to (10) >> partition by range (a); >> create table t_2 partition of t_1 for values from (1) to (10); >> >> create publication pub2 for table t_1(a), t_2(b) >> with (PUBLISH_VIA_PARTITION_ROOT); >> >> ----sub >> - prepare tables >> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub2; >> >> When doing table sync for table 't_1', the column list would be (a, b). I think >> the expected column list is (a). >> >> For Row Filter, it will use the row filter of the top most parent table(t_1) in >> this case. >> > > Attached is an updated patch version, addressing all of those issues. > > 0001 is a bugfix, reworking how we calculate publish_as_relid. The old > approach was unstable with multiple publications, giving different > results depending on order of the publications. This should be > backpatched into PG13 where publish_via_partition_root was introduced, I > think. > > 0002 is the main patch, merging the changes proposed by Peter and fixing > the issues reported here. In most cases this means adopting the code > used for row filters, and perhaps simplifying it a bit. > > > But I also tried to implement a row-filter test for 0001, and I'm not > sure I understand the behavior I observe. Consider this: > > -- a chain of 3 partitions (on both publisher and subscriber) > CREATE TABLE test_part_rf (a int primary key, b int, c int) > PARTITION BY LIST (a); > > CREATE TABLE test_part_rf_1 > PARTITION OF test_part_rf FOR VALUES IN (1,2,3,4,5) > PARTITION BY LIST (a); > > CREATE TABLE test_part_rf_2 > PARTITION OF test_part_rf_1 FOR VALUES IN (1,2,3,4,5); > > -- initial data > INSERT INTO test_part_rf VALUES (1, 5, 100); > INSERT INTO test_part_rf VALUES (2, 15, 200); > > -- two publications, each adding a different partition > CREATE PUBLICATION test_pub_part_1 FOR TABLE test_part_rf_1 > WHERE (b < 10) WITH (publish_via_partition_root); > > CREATE PUBLICATION test_pub_part_2 FOR TABLE test_part_rf_2 > WHERE (b > 10) WITH (publish_via_partition_root); > > -- now create the subscription (also try opposite ordering) > CREATE SUBSCRIPTION test_part_sub CONNECTION '...' > PUBLICATION test_pub_part_1, test_pub_part_2; > > -- wait for sync > > -- inert some more data > INSERT INTO test_part_rf VALUES (3, 6, 300); > INSERT INTO test_part_rf VALUES (4, 16, 400); > > -- wait for catchup > > Now, based on the discussion here, my expectation is that we'll use the > row filter from the top-most ancestor in any publication, which in this > case is test_part_rf_1. Hence the filter should be (b < 10). > > So I'd expect these rows to be replicated: > > 1,5,100 > 3,6,300 > > But that's not what I get, unfortunately. I get different results, > depending on the order of publications: > > 1) test_pub_part_1, test_pub_part_2 > > 1|5|100 > 2|15|200 > 3|6|300 > 4|16|400 > > 2) test_pub_part_2, test_pub_part_1 > > 3|6|300 > 4|16|400 > > That seems pretty bizarre, because it either means we're not enforcing > any filter or some strange combination of filters (notice that for (2) > we skip/replicate rows matching either filter). > > I have to be missing something important, but this seems confusing. > There's a patch adding a simple test case to 028_row_filter.sql (named > .txt, so as not to confuse cfbot). > FWIW I think the reason is pretty simple - pgoutput_row_filter_init is broken. It assumes you can just do this rftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(entry->publish_as_relid), ObjectIdGetDatum(pub->oid)); if (HeapTupleIsValid(rftuple)) { /* Null indicates no filter. */ rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple, Anum_pg_publication_rel_prqual, &pub_no_filter); } else { pub_no_filter = true; } and pub_no_filter=true means there's no filter at all. Which is nonsense, because we're using publish_as_relid here - the publication may not include this particular ancestor, in which case we need to just ignore this publication. So yeah, this needs to be reworked. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: