Re: logical changeset generation v6.4 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: logical changeset generation v6.4
Date
Msg-id 20131021150602.GD2968@awork2.anarazel.de
Whole thread Raw
In response to Re: logical changeset generation v6.4  (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses Re: logical changeset generation v6.4
List pgsql-hackers
On 2013-10-21 16:40:43 +0200, Hannu Krosing wrote:
> On 10/18/2013 08:50 PM, Andres Freund wrote:
> > On 2013-10-18 08:11:29 -0400, Robert Haas wrote:
> ...
> >> 2. If that seems too complicated, how about just logging the whole old
> >> tuple for version 1?
> > I think that'd make the patch much less useful because it bloats WAL
> > unnecessarily for the primary user (replication) of it. I'd rather go
> > for primary keys only if that proves to be the contentious point.
> >
> > How about modifying the selection to go from:
> > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL;
> > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname
> > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb)
> > * primary key
> > * candidate key with the smallest oid
> >
> > Including the candidate key will help people using changeset extration
> > for auditing that do not have primary key. That really isn't an
> > infrequent usecase.

> As I understand it for a table with *no* unique index,
> the "candidate key" is the full tuple, so if we get an UPDATE for
> it then this should be replicated as
> "UPDATE first row matching (NOT DISTINCT FROM) all columns"
> which on replay side will be equivalent to
> CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...'

No, it's not a candidate key since it's not uniquely identifying a
row. You can play tricks as you describe, but that still doesn't make
the whole row a candidate key.

But anyway,  I suggest allowing for logging all columns above...

> I know that this will slow down replication, as you can not use direct
> index updates internally - at least not easily - but need to let postgreSQL
> actually plan this, but such single row update is no faster on origin
> either.

That's not actually true. Consider somebody doing something like:
UPDATE big_table_without_indexes SET column = ...;
On the source side that's essentialy O(n). If you replicate on a
row-by-row basis it will be O(n^2) on the replay side.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Commitfest II CLosed
Next
From: Tom Lane
Date:
Subject: Re: Commitfest II CLosed