matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD) - Mailing list pgsql-hackers

From Noah Misch
Subject matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Date
Msg-id 20130404215245.GA26736@tornado.leadboat.com
Whole thread Raw
In response to Re: Drastic performance loss in assert-enabled build in HEAD  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
List pgsql-hackers
Subject updated to account for the wider topics now appearing.

On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I can't see taking MVs in the direction of allowing arbitrary DML; that's what
tables are for.  Users wishing to do that should keep using current methods:
 CREATE VIEW mv_impl AS SELECT ...; CREATE TABLE mv AS SELECT * FROM mv_impl; -- REFRESH analogue: fancier approaches
existBEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT;
 

If anything, I'd help these users by introducing mechanisms for obtaining a
TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW.
Namely, bulk index rebuilds, skipping WAL, and frozen tuples.

> > ... Making sure that
> > the heap has at least one page if data has been generated seems
> > like a not-entirely-unreasonable way to do that, although there
> > remains at least one vacuum bug to fix if we keep it, in addition
> > to Tom's concerns.
> 
> No.  This is an absolute disaster.  It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors.  We *need* to get rid of that aspect of
> things.  If you must have scannability state in version 0, okay, but
> it has to be a catalog property not this.

In large part, this ended up outside the catalogs due to key limitations of
the startup process.  This isn't the first time we've arranged an unusual
dance for this reason, and it probably won't be the last.  Sure, we could take
out unlogged MVs to evade the problem, but re-adding them will mean either
restoring relfilenode-based bookkeeping or attacking the startup process
limitation directly.  There exist fundamental challenges to a direct attack,
like the potential inconsistency of system catalogs themselves.  We could
teach pg_relation_is_scannable() that unlogged MVs are always non-scannable
during recovery, then somehow update system catalogs in all databases at the
end of recovery.  Not a project for 9.3, and I wouldn't be surprised to see it
mushroom in complexity.  The currently-committed approach is a good one given
the applicable constraints.

A slight variation on the committed approach would be to add a "_scannable"
relation fork.  The fork would either be absent (not scannable if an MV) or
empty (possibly-scannable).  Create it in CREATE MATERIALIZED VIEW sans WITH
NO DATA and in REFRESH MATERIALIZED VIEW.  Remove it in TRUNCATE.  When the
startup process reinitializes an unlogged relation, it would also remove any
_scannable fork.  This has a few advantages over the current approach: VACUUM
won't need a special case, and pg_upgrade will be in a better position to blow
away all traces if we introduce a better approach.  The disadvantage is an
extra inode per healthy MV.  (Though it does not lead to a 9.3 solution, I'll
note that an always-present relation metapage would help here.)

Note that I said "possibly-scannable" -- the relfilenode-based indicator
(whether the committed approach or something else) doesn't need to remain the
only input to the question of scannability.  If 9.5 introduces the concept of
age-based scannability expiration, the applicable timestamp could go in
pg_class, and pg_relation_is_scannable() could check both that and the
relfilenode-based indicator.


Concerning the original $SUBJECT, I would look at fixing the performance
problem by having pg_relation_is_scannable() use an algorithm like this:

1. Grab the pg_class entry from syscache.  If it's not found, return NULL.
2. If it's not a matview, return false.
3. Lock the matview and try to open a relcache entry.  Return NULL on failure.
4. Return the scannability as reported by the relcache.

This boils down to the CASE statement you noted upthread, except putting the
fast-exit logic in the function rather than in its caller(s).

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Clang compiler warning on 9.3 HEAD
Next
From: Noah Misch
Date:
Subject: Re: Drastic performance loss in assert-enabled build in HEAD