Thread: Logical replication without a Primary Key

Logical replication without a Primary Key

From

"Joshua D. Drake"

Date:

07 December 2017, 03:03:14

-Hackers,

In the docs it says:

"
If the table does not have any suitable key, then it can be set to 
replica identity“full”, which means the entire row becomes the key.

"

How does that work? Is it using one of the hidden columns on a row?


Thanks,


JD


-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
*****     Unless otherwise stated, opinions are my own.   *****

Re: Logical replication without a Primary Key

From

Peter Eisentraut

Date:

07 December 2017, 16:30:16

On 12/6/17 19:03, Joshua D. Drake wrote:
> -Hackers,
> 
> In the docs it says:
> 
> "
> If the table does not have any suitable key, then it can be set to 
> replica identity“full”, which means the entire row becomes the key.
> 
> "
> 
> How does that work? Is it using one of the hidden columns on a row?

It means that for example if an update record is produced, the entire
row is included in the record as the key.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logical replication without a Primary Key

From

"Joshua D. Drake"

Date:

07 December 2017, 17:32:58

On 12/07/2017 05:30 AM, Peter Eisentraut wrote:
>
>> How does that work? Is it using one of the hidden columns on a row?
> It means that for example if an update record is produced, the entire
> row is included in the record as the key.

Thanks Peter, Craig also responded,

The confusion I have is what if we have two rows that are identical and 
now that I think about it we would just update both rows, yes? That 
would make sense because it would produce two updated rows.

Thanks,

JD

>

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
*****     Unless otherwise stated, opinions are my own.   *****

Re: Logical replication without a Primary Key

From

"David G. Johnston"

Date:

07 December 2017, 17:37:17

On Thursday, December 7, 2017, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 12/6/17 19:03, Joshua D. Drake wrote:

>
> How does that work? Is it using one of the hidden columns on a row?

It means that for example if an update record is produced, the entire
row is included in the record as the key.

IOW, IIUC, whether defined or not the user data portion of the table must, as a whole, provide a natural unique key if you are going to use logical replication. If two records only differ in their OID (or maybe ctid?) value you will have problem.

David J.

Re: Logical replication without a Primary Key

From

Craig Ringer

Date:

07 December 2017, 17:38:41

On 7 December 2017 at 22:32, Joshua D. Drake <jd@commandprompt.com> wrote:

The confusion I have is what if we have two rows that are identical and now that I think about it we would just update both rows, yes? That would make sense because it would produce two updated rows.

I expect so, but honestly, maybe it's easier to just add a pg_regress test to check and demonstrate the behaviour, or do a manual test?

(a pg_regress test would make sense to add anyway, though, and might be easier)

The only ways you could update only one of two identical rows would be if you did so by ctid (since if it were a table with oids, the rows wouldn't be identical anymore). It'd make no sense to send the ctid to the downstream since it'd bear no relationship to the downstream table. I expect it'd probably update both rows. If it doesn't, it probably should.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical replication without a Primary Key

From

Petr Jelinek

Date:

07 December 2017, 17:43:08

On 07/12/17 15:32, Joshua D. Drake wrote:
> On 12/07/2017 05:30 AM, Peter Eisentraut wrote:
>>
>>> How does that work? Is it using one of the hidden columns on a row?
>> It means that for example if an update record is produced, the entire
>> row is included in the record as the key.
> 
> Thanks Peter, Craig also responded,
> 
> The confusion I have is what if we have two rows that are identical and
> now that I think about it we would just update both rows, yes? That
> would make sense because it would produce two updated rows.
> 

No it won't, it will update only one row, it does not try to find
multiple matching rows.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical replication without a Primary Key

From

Robert Haas

Date:

07 December 2017, 21:49:37

On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> No it won't, it will update only one row, it does not try to find
> multiple matching rows.

Good, because that's exactly what it should do.  I mean, if you have
on the master two tuples that are identical, and you update one of
them, then the replica had better update exactly one of them as well.
Since they are identical, it doesn't matter *which* one gets updated
on the replica, but if you update *both* of them on the replica, then
things are out of sync.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Logical replication without a Primary Key

From

"Joshua D. Drake"

Date:

07 December 2017, 22:38:51

On 12/07/2017 10:49 AM, Robert Haas wrote:
> On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> No it won't, it will update only one row, it does not try to find
>> multiple matching rows.
> Good, because that's exactly what it should do.  I mean, if you have
> on the master two tuples that are identical, and you update one of
> them, then the replica had better update exactly one of them as well.
> Since they are identical, it doesn't matter *which* one gets updated
> on the replica, but if you update *both* of them on the replica, then
> things are out of sync.

Well I think that is a problem actually. If I have:

A    B   C
foo,bar,baz
foo,bar,baz

And then I say:

UPDATE test set A = 1 where C = baz

I have updated two rows because there is no primary key to identify the 
differences. Both of those rows should be updated and thus replicated 
otherwise, logical replication (of this specific table) provides 
inaccurate data on the subscriber.

Thanks,

JD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
*****     Unless otherwise stated, opinions are my own.   *****

Re: Logical replication without a Primary Key

From

Chapman Flack

Date:

07 December 2017, 22:53:21

On 12/07/2017 02:38 PM, Joshua D. Drake wrote:

> A    B   C
> foo,bar,baz
> foo,bar,baz
> 
> And then I say:
> 
> UPDATE test set A = 1 where C = baz
> 
> I have updated two rows because there is no primary key to identify the
> differences. Both of those rows should be updated and thus replicated

Would the subscriber see two records reporting update of a
foo,bar,baz row to 1, so it would do that to (arbitrarily)
one of them the first time, and (necessarily) the other, the
second time?

Or is that not the way it would work?

-Chap

Re: Logical replication without a Primary Key

From

Robert Haas

Date:

07 December 2017, 23:19:56

On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack <chap@anastigmatix.net> wrote:
> On 12/07/2017 02:38 PM, Joshua D. Drake wrote:
>> A    B   C
>> foo,bar,baz
>> foo,bar,baz
>>
>> And then I say:
>>
>> UPDATE test set A = 1 where C = baz
>>
>> I have updated two rows because there is no primary key to identify the
>> differences. Both of those rows should be updated and thus replicated
>
> Would the subscriber see two records reporting update of a
> foo,bar,baz row to 1, so it would do that to (arbitrarily)
> one of them the first time, and (necessarily) the other, the
> second time?

Exactly.

(I think.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Logical replication without a Primary Key

From

Andres Freund

Date:

07 December 2017, 23:39:36

On 2017-12-07 11:38:51 -0800, Joshua D. Drake wrote:
> On 12/07/2017 10:49 AM, Robert Haas wrote:
> > On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
> > <petr.jelinek@2ndquadrant.com> wrote:
> > > No it won't, it will update only one row, it does not try to find
> > > multiple matching rows.
> > Good, because that's exactly what it should do.  I mean, if you have
> > on the master two tuples that are identical, and you update one of
> > them, then the replica had better update exactly one of them as well.
> > Since they are identical, it doesn't matter *which* one gets updated
> > on the replica, but if you update *both* of them on the replica, then
> > things are out of sync.
> 
> Well I think that is a problem actually. If I have:
> 
> A    B   C
> foo,bar,baz
> foo,bar,baz
> 
> And then I say:
> 
> UPDATE test set A = 1 where C = baz
> 
> I have updated two rows because there is no primary key to identify the
> differences. Both of those rows should be updated and thus replicated
> otherwise, logical replication (of this specific table) provides inaccurate
> data on the subscriber.

Not a problem. If you updated both rows, then there's two cases:
a) the update actually changed the column values. In which case the first per-row
   change that's replicated updates the first row, but the second one won't
   again find it as matching in all columns.
b) the update didn't actually change anything. In which case the same
   row gets updated twice, but because the column values didn't change,
   that doesn't matter.

Greetings,

Andres Freund

Re: Logical replication without a Primary Key

From

Petr Jelinek

Date:

08 December 2017, 05:29:22

On 07/12/17 21:19, Robert Haas wrote:
> On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack <chap@anastigmatix.net> wrote:
>> On 12/07/2017 02:38 PM, Joshua D. Drake wrote:
>>> A    B   C
>>> foo,bar,baz
>>> foo,bar,baz
>>>
>>> And then I say:
>>>
>>> UPDATE test set A = 1 where C = baz
>>>
>>> I have updated two rows because there is no primary key to identify the
>>> differences. Both of those rows should be updated and thus replicated
>>
>> Would the subscriber see two records reporting update of a
>> foo,bar,baz row to 1, so it would do that to (arbitrarily)
>> one of them the first time, and (necessarily) the other, the
>> second time?
> 
> Exactly.
> 
> (I think.)
> 

Yes, that how it was designed to work.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical replication without a Primary Key

From

"Joshua D. Drake"

Date:

18 December 2017, 23:43:24

On 12/07/2017 12:39 PM, Andres Freund wrote:
>
> Not a problem. If you updated both rows, then there's two cases:
> a) the update actually changed the column values. In which case the first per-row
>     change that's replicated updates the first row, but the second one won't
>     again find it as matching in all columns.
> b) the update didn't actually change anything. In which case the same
>     row gets updated twice, but because the column values didn't change,
>     that doesn't matter.

I may be misunderstanding what is said above but if I ran a test:

Publisher:
reptest=# \d foorep
              Table "public.foorep"
  Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
  one    | text |           |          |
  two    | text |           |          |
Publications:
     "reptestpub"

reptest=# select * from foorep;
  one | two
-----+-----
  c   | b
  c   | b
  c   | b
(3 rows)

reptest=# update foorep set one = 'd';
UPDATE 3
reptest=# select * from foorep;
  one | two
-----+-----
  d   | b
  d   | b
  d   | b
(3 rows)

Subscriber before Publisher update:
reptest=# select * from foorep ;
  one | two
-----+-----
  c   | b
  c   | b
  c   | b
(3 rows)

Subscriber after Publisher update:
reptest=# select * from foorep ;
  one | two
-----+-----
  d   | b
  d   | b
  d   | b
(3 rows)

This is the behavior I was expecting. As I said, I may have 
misunderstood the responses but it is acting as I would expect.

Thanks!

JD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****

Re: Logical replication without a Primary Key

From

Andres Freund

Date:

18 December 2017, 23:52:55

On 2017-12-18 12:43:24 -0800, Joshua D. Drake wrote:
> This is the behavior I was expecting. As I said, I may have misunderstood
> the responses but it is acting as I would expect.

Just ot make sure: You're saying there's no problem here, and that
logical rep is behaving correctly, right?

FWIW, I wonder if we need to add a warning somewhere about FULL
replication, given it's essentially O(#changes * #rows) -> O(n^2) for
updating the whole table.

Greetings,

Andres Freund

Re: Logical replication without a Primary Key

From

"Joshua D. Drake"

Date:

18 December 2017, 23:57:02

On 12/18/2017 12:52 PM, Andres Freund wrote:
>
> Just ot make sure: You're saying there's no problem here, and that
> logical rep is behaving correctly, right?

Correct. I am not sure where the miscommunication was (fully willing to 
accept it was on my side) but if I update multiple rows in a single 
statement, all the rows that were modified get replicated. That is the 
behavior I would have expected.

> FWIW, I wonder if we need to add a warning somewhere about FULL
> replication, given it's essentially O(#changes * #rows) -> O(n^2) for
> updating the whole table.

The docs do mention it is a performance hit but something a little more 
"IF YOU DO THIS, BEWARE" may be good.

Thanks,

JD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****

Re: Logical replication without a Primary Key

From

Petr Jelinek

Date:

19 December 2017, 00:23:12

On 18/12/17 21:57, Joshua D. Drake wrote:
> On 12/18/2017 12:52 PM, Andres Freund wrote:
>>
>> Just ot make sure: You're saying there's no problem here, and that
>> logical rep is behaving correctly, right?
> 
> Correct. I am not sure where the miscommunication was (fully willing to
> accept it was on my side) but if I update multiple rows in a single
> statement, all the rows that were modified get replicated. That is the
> behavior I would have expected.
> 

I think it's because we said if you update single row on upstream which
may be confusing in case of multiple rows. It will update single row on
downstream even though there is 4 same rows on downstream. That's still
true. In your test however you have 4 same rows on downstream but you
also have same 4 rows on upstream which you all updated. So you got 4
row updates which were replicated and each of those 4 updates changed
one row.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services