Re: lcr v5 - primary/candidate key in relcache - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: lcr v5 - primary/candidate key in relcache
Date
Msg-id 1379013664.28056.YahooMailNeo@web162906.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: lcr v5 - primary/candidate key in relcache  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Andres Freund <andres@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> Andres Freund <andres@2ndquadrant.com> wrote:
>>> 0007 wal_decoding: Add information about a tables primary key to
>>>  struct RelationData
>>> * Could be used in the matview refresh code
>
>> I think you and Kevin should discuss whether this is actually the
>> right way to do this.  ISTM that if logical replication and
>> materialized views end up selecting different approaches to this
>> problem, everybody loses.
>
> The patch we're discussion here adds a new struct RelationData field
> called 'rd_primary' (should possibly be renamed) which contains
> information about the "best" candidate key available for a table.
>
> From the header comments:
>     /*
>     * The 'best' primary or candidate key that has been found, only set
>     * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
>     *
>     * Indexes are chosen in the following order:
>     * * Primary Key
>     * * oid index
>     * * the first (OID order) unique, immediate, non-partial and
>     *  non-expression index over one or more NOT NULL'ed columns
>     */
>     Oid rd_primary;
>
> I thought we could use that in matview.c:refresh_by_match_merge() to
> select a more efficient diff if rd_primary has a valid index. In that
> case you only'd need to compare that index's fields which should result
> in an more efficient plan.
>
> Maybe it's also useful in other cases for you?
>
> If it's relevant at all, would you like to have a different priority
> list than the one above?

My first thought was that it was necessary to use all unique,
immediate, non-partial, non-expression indexes to avoid getting
errors on the UPDATE phase of the concurrent refresh for transient
duplicates; but then I remembered that I had to give up on that and
do it all with DELETE followed by INSERT, which eliminates that
risk.  As things now stand the *existence* of any unique,
non-partial, non-expression index (note that immediate is not
needed) is sufficient for correctness.  We could now even drop that,
I think, if we added a duplicate check at the end in the absence of
such an index.

The reason I left it comparing columns from *all* such indexes is
that it gives the optimizer the chance to pick the one that looks
fastest.  With the upcoming patch that can add some extra
"equality" comparisons in addition to the "identical" comparisons
the patch uses, so the mechanism you propose might be a worthwhile
optimization for some cases as long as it does a good job of
picking *the fastest* such index.  The above method of choosing an
index doesn't seem to necessarily ensure that.

Also, if you need to include the "immediate" test, it could not be
used for RMVC without "fallback" code if this mechanism didn't find
an appropriate index.  Of course, that would satisfy those who
would like to relax the requirement for a unique index on the MV to
be able to use RMVC.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Patch for fail-back without fresh backup
Next
From: Peter Eisentraut
Date:
Subject: Re: ENABLE/DISABLE CONSTRAINT NAME