Re: lcr v5 - primary/candidate key in relcache - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: lcr v5 - primary/candidate key in relcache |
Date | |
Msg-id | 1379013664.28056.YahooMailNeo@web162906.mail.bf1.yahoo.com Whole thread Raw |
In response to | Re: lcr v5 - primary/candidate key in relcache (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-hackers |
Andres Freund <andres@2ndquadrant.com> wrote: > Robert Haas wrote: >> Andres Freund <andres@2ndquadrant.com> wrote: >>> 0007 wal_decoding: Add information about a tables primary key to >>> struct RelationData >>> * Could be used in the matview refresh code > >> I think you and Kevin should discuss whether this is actually the >> right way to do this. ISTM that if logical replication and >> materialized views end up selecting different approaches to this >> problem, everybody loses. > > The patch we're discussion here adds a new struct RelationData field > called 'rd_primary' (should possibly be renamed) which contains > information about the "best" candidate key available for a table. > > From the header comments: > /* > * The 'best' primary or candidate key that has been found, only set > * correctly if RelationGetIndexList has been called/rd_indexvalid > 0. > * > * Indexes are chosen in the following order: > * * Primary Key > * * oid index > * * the first (OID order) unique, immediate, non-partial and > * non-expression index over one or more NOT NULL'ed columns > */ > Oid rd_primary; > > I thought we could use that in matview.c:refresh_by_match_merge() to > select a more efficient diff if rd_primary has a valid index. In that > case you only'd need to compare that index's fields which should result > in an more efficient plan. > > Maybe it's also useful in other cases for you? > > If it's relevant at all, would you like to have a different priority > list than the one above? My first thought was that it was necessary to use all unique, immediate, non-partial, non-expression indexes to avoid getting errors on the UPDATE phase of the concurrent refresh for transient duplicates; but then I remembered that I had to give up on that and do it all with DELETE followed by INSERT, which eliminates that risk. As things now stand the *existence* of any unique, non-partial, non-expression index (note that immediate is not needed) is sufficient for correctness. We could now even drop that, I think, if we added a duplicate check at the end in the absence of such an index. The reason I left it comparing columns from *all* such indexes is that it gives the optimizer the chance to pick the one that looks fastest. With the upcoming patch that can add some extra "equality" comparisons in addition to the "identical" comparisons the patch uses, so the mechanism you propose might be a worthwhile optimization for some cases as long as it does a good job of picking *the fastest* such index. The above method of choosing an index doesn't seem to necessarily ensure that. Also, if you need to include the "immediate" test, it could not be used for RMVC without "fallback" code if this mechanism didn't find an appropriate index. Of course, that would satisfy those who would like to relax the requirement for a unique index on the MV to be able to use RMVC. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: