matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD) - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD) |
Date | |
Msg-id | 20130404215245.GA26736@tornado.leadboat.com Whole thread Raw |
In response to | Re: Drastic performance loss in assert-enabled build in HEAD (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
|
List | pgsql-hackers |
Subject updated to account for the wider topics now appearing. On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote: > What I'd actually rather see us spending time on right now is making > some provision for incremental updates, which I will boldly propose > could be supported by user-written triggers on the underlying tables > if we only diked out the prohibitions against INSERT/UPDATE/DELETE on > matviews, and allowed them to operate on a matview's contents just like > it was a table. Now admittedly that would foreclose allowing matviews > to be updatable in the updatable-view sense, but that's a feature I > would readily give up if it meant users could build incremental update > mechanisms this year and not two years down the road. I can't see taking MVs in the direction of allowing arbitrary DML; that's what tables are for. Users wishing to do that should keep using current methods: CREATE VIEW mv_impl AS SELECT ...; CREATE TABLE mv AS SELECT * FROM mv_impl; -- REFRESH analogue: fancier approaches existBEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT; If anything, I'd help these users by introducing mechanisms for obtaining a TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW. Namely, bulk index rebuilds, skipping WAL, and frozen tuples. > > ... Making sure that > > the heap has at least one page if data has been generated seems > > like a not-entirely-unreasonable way to do that, although there > > remains at least one vacuum bug to fix if we keep it, in addition > > to Tom's concerns. > > No. This is an absolute disaster. It's taking something we have always > considered to be an irrelevant implementation detail and making it into > user-visible DDL state, despite the fact that it doesn't begin to satisfy > basic transactional behaviors. We *need* to get rid of that aspect of > things. If you must have scannability state in version 0, okay, but > it has to be a catalog property not this. In large part, this ended up outside the catalogs due to key limitations of the startup process. This isn't the first time we've arranged an unusual dance for this reason, and it probably won't be the last. Sure, we could take out unlogged MVs to evade the problem, but re-adding them will mean either restoring relfilenode-based bookkeeping or attacking the startup process limitation directly. There exist fundamental challenges to a direct attack, like the potential inconsistency of system catalogs themselves. We could teach pg_relation_is_scannable() that unlogged MVs are always non-scannable during recovery, then somehow update system catalogs in all databases at the end of recovery. Not a project for 9.3, and I wouldn't be surprised to see it mushroom in complexity. The currently-committed approach is a good one given the applicable constraints. A slight variation on the committed approach would be to add a "_scannable" relation fork. The fork would either be absent (not scannable if an MV) or empty (possibly-scannable). Create it in CREATE MATERIALIZED VIEW sans WITH NO DATA and in REFRESH MATERIALIZED VIEW. Remove it in TRUNCATE. When the startup process reinitializes an unlogged relation, it would also remove any _scannable fork. This has a few advantages over the current approach: VACUUM won't need a special case, and pg_upgrade will be in a better position to blow away all traces if we introduce a better approach. The disadvantage is an extra inode per healthy MV. (Though it does not lead to a 9.3 solution, I'll note that an always-present relation metapage would help here.) Note that I said "possibly-scannable" -- the relfilenode-based indicator (whether the committed approach or something else) doesn't need to remain the only input to the question of scannability. If 9.5 introduces the concept of age-based scannability expiration, the applicable timestamp could go in pg_class, and pg_relation_is_scannable() could check both that and the relfilenode-based indicator. Concerning the original $SUBJECT, I would look at fixing the performance problem by having pg_relation_is_scannable() use an algorithm like this: 1. Grab the pg_class entry from syscache. If it's not found, return NULL. 2. If it's not a matview, return false. 3. Lock the matview and try to open a relcache entry. Return NULL on failure. 4. Return the scannability as reported by the relcache. This boils down to the CASE statement you noted upthread, except putting the fast-exit logic in the function rather than in its caller(s). nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: