Thread: Logical replication without a Primary Key
-Hackers, In the docs it says: " If the table does not have any suitable key, then it can be set to replica identity“full”, which means the entire row becomes the key. " How does that work? Is it using one of the hidden columns on a row? Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.org ***** Unless otherwise stated, opinions are my own. *****
On 12/6/17 19:03, Joshua D. Drake wrote: > -Hackers, > > In the docs it says: > > " > If the table does not have any suitable key, then it can be set to > replica identity“full”, which means the entire row becomes the key. > > " > > How does that work? Is it using one of the hidden columns on a row? It means that for example if an update record is produced, the entire row is included in the record as the key. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/07/2017 05:30 AM, Peter Eisentraut wrote: > >> How does that work? Is it using one of the hidden columns on a row? > It means that for example if an update record is produced, the entire > row is included in the record as the key. Thanks Peter, Craig also responded, The confusion I have is what if we have two rows that are identical and now that I think about it we would just update both rows, yes? That would make sense because it would produce two updated rows. Thanks, JD > -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.org ***** Unless otherwise stated, opinions are my own. *****
On Thursday, December 7, 2017, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/6/17 19:03, Joshua D. Drake wrote:
>
> How does that work? Is it using one of the hidden columns on a row?
It means that for example if an update record is produced, the entire
row is included in the record as the key.
IOW, IIUC, whether defined or not the user data portion of the table must, as a whole, provide a natural unique key if you are going to use logical replication. If two records only differ in their OID (or maybe ctid?) value you will have problem.
David J.
On 7 December 2017 at 22:32, Joshua D. Drake <jd@commandprompt.com> wrote:
The confusion I have is what if we have two rows that are identical and now that I think about it we would just update both rows, yes? That would make sense because it would produce two updated rows.
I expect so, but honestly, maybe it's easier to just add a pg_regress test to check and demonstrate the behaviour, or do a manual test?
(a pg_regress test would make sense to add anyway, though, and might be easier)
The only ways you could update only one of two identical rows would be if you did so by ctid (since if it were a table with oids, the rows wouldn't be identical anymore). It'd make no sense to send the ctid to the downstream since it'd bear no relationship to the downstream table. I expect it'd probably update both rows. If it doesn't, it probably should.
On 07/12/17 15:32, Joshua D. Drake wrote: > On 12/07/2017 05:30 AM, Peter Eisentraut wrote: >> >>> How does that work? Is it using one of the hidden columns on a row? >> It means that for example if an update record is produced, the entire >> row is included in the record as the key. > > Thanks Peter, Craig also responded, > > The confusion I have is what if we have two rows that are identical and > now that I think about it we would just update both rows, yes? That > would make sense because it would produce two updated rows. > No it won't, it will update only one row, it does not try to find multiple matching rows. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > No it won't, it will update only one row, it does not try to find > multiple matching rows. Good, because that's exactly what it should do. I mean, if you have on the master two tuples that are identical, and you update one of them, then the replica had better update exactly one of them as well. Since they are identical, it doesn't matter *which* one gets updated on the replica, but if you update *both* of them on the replica, then things are out of sync. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/07/2017 10:49 AM, Robert Haas wrote: > On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> No it won't, it will update only one row, it does not try to find >> multiple matching rows. > Good, because that's exactly what it should do. I mean, if you have > on the master two tuples that are identical, and you update one of > them, then the replica had better update exactly one of them as well. > Since they are identical, it doesn't matter *which* one gets updated > on the replica, but if you update *both* of them on the replica, then > things are out of sync. Well I think that is a problem actually. If I have: A B C foo,bar,baz foo,bar,baz And then I say: UPDATE test set A = 1 where C = baz I have updated two rows because there is no primary key to identify the differences. Both of those rows should be updated and thus replicated otherwise, logical replication (of this specific table) provides inaccurate data on the subscriber. Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.org ***** Unless otherwise stated, opinions are my own. *****
On 12/07/2017 02:38 PM, Joshua D. Drake wrote: > A B C > foo,bar,baz > foo,bar,baz > > And then I say: > > UPDATE test set A = 1 where C = baz > > I have updated two rows because there is no primary key to identify the > differences. Both of those rows should be updated and thus replicated Would the subscriber see two records reporting update of a foo,bar,baz row to 1, so it would do that to (arbitrarily) one of them the first time, and (necessarily) the other, the second time? Or is that not the way it would work? -Chap
On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack <chap@anastigmatix.net> wrote: > On 12/07/2017 02:38 PM, Joshua D. Drake wrote: >> A B C >> foo,bar,baz >> foo,bar,baz >> >> And then I say: >> >> UPDATE test set A = 1 where C = baz >> >> I have updated two rows because there is no primary key to identify the >> differences. Both of those rows should be updated and thus replicated > > Would the subscriber see two records reporting update of a > foo,bar,baz row to 1, so it would do that to (arbitrarily) > one of them the first time, and (necessarily) the other, the > second time? Exactly. (I think.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017-12-07 11:38:51 -0800, Joshua D. Drake wrote: > On 12/07/2017 10:49 AM, Robert Haas wrote: > > On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek > > <petr.jelinek@2ndquadrant.com> wrote: > > > No it won't, it will update only one row, it does not try to find > > > multiple matching rows. > > Good, because that's exactly what it should do. I mean, if you have > > on the master two tuples that are identical, and you update one of > > them, then the replica had better update exactly one of them as well. > > Since they are identical, it doesn't matter *which* one gets updated > > on the replica, but if you update *both* of them on the replica, then > > things are out of sync. > > Well I think that is a problem actually. If I have: > > A B C > foo,bar,baz > foo,bar,baz > > And then I say: > > UPDATE test set A = 1 where C = baz > > I have updated two rows because there is no primary key to identify the > differences. Both of those rows should be updated and thus replicated > otherwise, logical replication (of this specific table) provides inaccurate > data on the subscriber. Not a problem. If you updated both rows, then there's two cases: a) the update actually changed the column values. In which case the first per-row change that's replicated updates the first row, but the second one won't again find it as matching in all columns. b) the update didn't actually change anything. In which case the same row gets updated twice, but because the column values didn't change, that doesn't matter. Greetings, Andres Freund
On 07/12/17 21:19, Robert Haas wrote: > On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack <chap@anastigmatix.net> wrote: >> On 12/07/2017 02:38 PM, Joshua D. Drake wrote: >>> A B C >>> foo,bar,baz >>> foo,bar,baz >>> >>> And then I say: >>> >>> UPDATE test set A = 1 where C = baz >>> >>> I have updated two rows because there is no primary key to identify the >>> differences. Both of those rows should be updated and thus replicated >> >> Would the subscriber see two records reporting update of a >> foo,bar,baz row to 1, so it would do that to (arbitrarily) >> one of them the first time, and (necessarily) the other, the >> second time? > > Exactly. > > (I think.) > Yes, that how it was designed to work. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/07/2017 12:39 PM, Andres Freund wrote: > > Not a problem. If you updated both rows, then there's two cases: > a) the update actually changed the column values. In which case the first per-row > change that's replicated updates the first row, but the second one won't > again find it as matching in all columns. > b) the update didn't actually change anything. In which case the same > row gets updated twice, but because the column values didn't change, > that doesn't matter. I may be misunderstanding what is said above but if I ran a test: Publisher: reptest=# \d foorep Table "public.foorep" Column | Type | Collation | Nullable | Default --------+------+-----------+----------+--------- one | text | | | two | text | | | Publications: "reptestpub" reptest=# select * from foorep; one | two -----+----- c | b c | b c | b (3 rows) reptest=# update foorep set one = 'd'; UPDATE 3 reptest=# select * from foorep; one | two -----+----- d | b d | b d | b (3 rows) Subscriber before Publisher update: reptest=# select * from foorep ; one | two -----+----- c | b c | b c | b (3 rows) Subscriber after Publisher update: reptest=# select * from foorep ; one | two -----+----- d | b d | b d | b (3 rows) This is the behavior I was expecting. As I said, I may have misunderstood the responses but it is acting as I would expect. Thanks! JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://postgresconf.org ***** Unless otherwise stated, opinions are my own. *****
On 2017-12-18 12:43:24 -0800, Joshua D. Drake wrote: > This is the behavior I was expecting. As I said, I may have misunderstood > the responses but it is acting as I would expect. Just ot make sure: You're saying there's no problem here, and that logical rep is behaving correctly, right? FWIW, I wonder if we need to add a warning somewhere about FULL replication, given it's essentially O(#changes * #rows) -> O(n^2) for updating the whole table. Greetings, Andres Freund
On 12/18/2017 12:52 PM, Andres Freund wrote: > > Just ot make sure: You're saying there's no problem here, and that > logical rep is behaving correctly, right? Correct. I am not sure where the miscommunication was (fully willing to accept it was on my side) but if I update multiple rows in a single statement, all the rows that were modified get replicated. That is the behavior I would have expected. > FWIW, I wonder if we need to add a warning somewhere about FULL > replication, given it's essentially O(#changes * #rows) -> O(n^2) for > updating the whole table. The docs do mention it is a performance hit but something a little more "IF YOU DO THIS, BEWARE" may be good. Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://postgresconf.org ***** Unless otherwise stated, opinions are my own. *****
On 18/12/17 21:57, Joshua D. Drake wrote: > On 12/18/2017 12:52 PM, Andres Freund wrote: >> >> Just ot make sure: You're saying there's no problem here, and that >> logical rep is behaving correctly, right? > > Correct. I am not sure where the miscommunication was (fully willing to > accept it was on my side) but if I update multiple rows in a single > statement, all the rows that were modified get replicated. That is the > behavior I would have expected. > I think it's because we said if you update single row on upstream which may be confusing in case of multiple rows. It will update single row on downstream even though there is 4 same rows on downstream. That's still true. In your test however you have 4 same rows on downstream but you also have same 4 rows on upstream which you all updated. So you got 4 row updates which were replicated and each of those 4 updates changed one row. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services