Thread: Drastic performance loss in assert-enabled build in HEAD

Drastic performance loss in assert-enabled build in HEAD

From
Tom Lane
Date:
Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5 seconds.
On the other hand, running the same executable against the regression
database on a 9.2 postmaster takes 1.2 seconds.  Looks to me like we
broke something performance-wise.

A quick check with oprofile says it's all AllocSetCheck's fault:

samples  %        image name               symbol name
87777    83.6059  postgres                 AllocSetCheck
1140      1.0858  postgres                 base_yyparse
918       0.8744  postgres                 AllocSetAlloc
778       0.7410  postgres                 SearchCatCache
406       0.3867  postgres                 pg_strtok
394       0.3753  postgres                 hash_search_with_hash_value
387       0.3686  postgres                 core_yylex
373       0.3553  postgres                 MemoryContextCheck
256       0.2438  postgres                 nocachegetattr
231       0.2200  postgres                 ScanKeywordLookup
207       0.1972  postgres                 palloc

So maybe I'm nuts to care about the performance of an assert-enabled
backend, but I don't really find a 4X runtime degradation acceptable,
even for development work.  Does anyone want to fess up to having caused
this, or do I need to start tracking down what changed?
        regards, tom lane



Re: Drastic performance loss in assert-enabled build in HEAD

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5
> seconds.
> On the other hand, running the same executable against the regression
> database on a 9.2 postmaster takes 1.2 seconds.  Looks to me like we
> broke something performance-wise.
>
> A quick check with oprofile says it's all AllocSetCheck's fault:
>
> samples  %        image name              symbol name
> 87777    83.6059  postgres                AllocSetCheck
> 1140      1.0858  postgres                base_yyparse
> 918      0.8744  postgres                AllocSetAlloc
> 778      0.7410  postgres                SearchCatCache
> 406      0.3867  postgres                pg_strtok
> 394      0.3753  postgres                hash_search_with_hash_value
> 387      0.3686  postgres                core_yylex
> 373      0.3553  postgres                MemoryContextCheck
> 256      0.2438  postgres                nocachegetattr
> 231      0.2200  postgres                ScanKeywordLookup
> 207      0.1972  postgres                palloc
>
> So maybe I'm nuts to care about the performance of an assert-enabled
> backend, but I don't really find a 4X runtime degradation acceptable,
> even for development work.  Does anyone want to fess up to having caused
> this, or do I need to start tracking down what changed?

I checked master HEAD for a dump of regression and got about 4
seconds.  I checked right after my initial push of matview code and
got 2.5 seconds.  I checked just before that and got 1 second.
There was some additional pg_dump work for matviews after the
initial push which may or may not account for the rest of the time.

Investigating now.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drastic performance loss in assert-enabled build in HEAD

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So maybe I'm nuts to care about the performance of an assert-enabled
>> backend, but I don't really find a 4X runtime degradation acceptable,
>> even for development work.� Does anyone want to fess up to having caused
>> this, or do I need to start tracking down what changed?

> I checked master HEAD for a dump of regression and got about 4
> seconds.� I checked right after my initial push of matview code and
> got 2.5 seconds.� I checked just before that and got 1 second. 
> There was some additional pg_dump work for matviews after the
> initial push which may or may not account for the rest of the time.

I poked at this a bit, and eventually found that the performance
differential goes away if I dike out the pg_relation_is_scannable() call
in getTables()'s table-collection query.  What appears to be happening
is that those calls cause a great many more relcache entries to be made
than used to happen, plus many entries are made earlier in the run than
they used to be.  Many of those entries have subsidiary memory contexts,
which results in an O(N^2) growth in the time spent in AllocSetCheck,
since postgres.c does MemoryContextCheck(TopMemoryContext) once per
received command, and pg_dump will presumably issue O(N) commands in an
N-table database.

So one question that brings up is whether we need to dial back the
amount of work done in memory context checking, but I'm loath to do so
in development builds.  That code has fingered an awful lot of bugs.
OTOH, if the number of tables in the regression DB continues to grow,
we may have little choice.

Anyway, the immediate takeaway is that this represents a horribly
expensive way for pg_dump to find out which matviews aren't scannable.
The cheap way for it to find out would be if we had a boolean flag for
that in pg_class.  Would you review the bidding as to why things were
done the way they are?  Because in general, having to ask the kernel
something is going to suck in any case, so basing it on the size of the
disk file doesn't seem to me to be basically a good thing.

We could alleviate the pain by changing pg_dump's query to something
like
(case when c.relkind = 'm' then pg_relation_is_scannable(c.oid) else false end)

but TBH this feels like bandaging a bad design.

Another reason why I don't like this code is that
pg_relation_is_scannable is broken by design:
relid = PG_GETARG_OID(0);relation = RelationIdGetRelation(relid);result =
relation->rd_isscannable;RelationClose(relation);

You can't do that: if the relcache entry doesn't already exist, this
will try to construct one while not holding any lock on the relation,
which is subject to all sorts of race conditions.  (In general, direct
calls on RelationIdGetRelation are probably broken.  I see you
introduced more than one such in the matviews patch, and I'm willing to
bet they are all unsafe.)
        regards, tom lane



Re: Drastic performance loss in assert-enabled build in HEAD

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So maybe I'm nuts to care about the performance of an
>>> assert-enabled backend, but I don't really find a 4X runtime
>>> degradation acceptable, even for development work.  Does anyone
>>> want to fess up to having caused this, or do I need to start
>>> tracking down what changed?
>
>> I checked master HEAD for a dump of regression and got about 4
>> seconds.  I checked right after my initial push of matview code
>> and got 2.5 seconds.  I checked just before that and got 1
>> second.  There was some additional pg_dump work for matviews
>> after the initial push which may or may not account for the rest
>> of the time.
>
> I poked at this a bit, and eventually found that the performance
> differential goes away if I dike out the
> pg_relation_is_scannable() call in getTables()'s table-collection
> query.  What appears to be happening is that those calls cause a
> great many more relcache entries to be made than used to happen,
> plus many entries are made earlier in the run than they used to
> be.  Many of those entries have subsidiary memory contexts, which
> results in an O(N^2) growth in the time spent in AllocSetCheck,
> since postgres.c does MemoryContextCheck(TopMemoryContext) once
> per received command, and pg_dump will presumably issue O(N)
> commands in an N-table database.
>
> So one question that brings up is whether we need to dial back
> the amount of work done in memory context checking, but I'm loath
> to do so in development builds.  That code has fingered an awful
> lot of bugs.  OTOH, if the number of tables in the regression DB
> continues to grow, we may have little choice.
>
> Anyway, the immediate takeaway is that this represents a horribly
> expensive way for pg_dump to find out which matviews aren't
> scannable.  The cheap way for it to find out would be if we had a
> boolean flag for that in pg_class.  Would you review the bidding
> as to why things were done the way they are?  Because in general,
> having to ask the kernel something is going to suck in any case,
> so basing it on the size of the disk file doesn't seem to me to
> be basically a good thing.

The early versions of the patch had a boolean in pg_class to track
this, but I got complaints from Robert and Noah (and possibly
others?) that this got too ugly in combination with the support for
unlogged matviews, and they suggested the current approach.  For an
unlogged matview we need to replace the heap (main fork) with the
init fork before the database is up and running, so there would
need to be some way for that to result in forcing the flag in
pg_class.  I was attempting to do that when a matview which was
flagged as scannable in pg_class was opened, but I gotta agree that
it wasn't pretty.  Noah suggested a function based on testing the
heap to see if it looked like the init fork, and the current state
of affairs I my attempt to make it work that way.

We could definitely minimize the overhead by only testing this for
matviews.  It seemed potentially useful for the function to have
some purpose for other types of relations, so I was trying to avoid
that.  Maybe that was a bad idea.

> We could alleviate the pain by changing pg_dump's query to
> something like
>
>     (case when c.relkind = 'm'
>      then pg_relation_is_scannable(c.oid)
>      else false end)

Yeah, that's the sort of thing I was thinking of.  If we're going
to go that way, we may want to touch one or two other spots.

> but TBH this feels like bandaging a bad design.

So far nobody has been able to suggest a better way to support both
unlogged matviews and some way to prevent a matview from being used
if it does not contain materialized data for its query from *some*
point in time.  Suggestions welcome.

> Another reason why I don't like this code is that
> pg_relation_is_scannable is broken by design:
>
>     relid = PG_GETARG_OID(0);
>     relation = RelationIdGetRelation(relid);
>     result = relation->rd_isscannable;
>     RelationClose(relation);
>
> You can't do that: if the relcache entry doesn't already exist,
> this will try to construct one while not holding any lock on the
> relation, which is subject to all sorts of race conditions.

Hmm.  I think I had that covered earlier but messed up in
rearranging to respond to review comments.  Will review both new
calling locations.

> In general, direct calls on RelationIdGetRelation are probably
> broken.

If valid contexts for use of the function are that limited, it
might merit a comment where the function is defined.  I'm not sure
what a good alternative to this would be.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drastic performance loss in assert-enabled build in HEAD

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> Another reason why I don't like this code is that
>> pg_relation_is_scannable is broken by design:
>>
>>      relid = PG_GETARG_OID(0);
>>      relation = RelationIdGetRelation(relid);
>>      result = relation->rd_isscannable;
>>      RelationClose(relation);
>>
>> You can't do that: if the relcache entry doesn't already exist,
>> this will try to construct one while not holding any lock on the
>> relation, which is subject to all sorts of race conditions.
>
> Hmm.  I think I had that covered earlier but messed up in
> rearranging to respond to review comments.  Will review both new
> calling locations.

For the SQL-level function, does this look OK?:

diff --git a/src/backend/utils/adt/dbsize.c
b/src/backend/utils/adt/dbsize.c
index d589d26..94e55f0 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -850,9 +850,13 @@ pg_relation_is_scannable(PG_FUNCTION_ARGS)
    bool        result;
 
    relid = PG_GETARG_OID(0);
-   relation = RelationIdGetRelation(relid);
+   relation = try_relation_open(relid, AccessShareLock);
+
+   if (relation == NULL)
+       PG_RETURN_BOOL(false);
+
    result = relation->rd_isscannable;
-   RelationClose(relation);
+   relation_close(relation, AccessShareLock);
 
    PG_RETURN_BOOL(result);
 }

I think the call in ExecCheckRelationsScannable() is safe because
it comes after the tables are all already locked.  I put it there
so that the appropriate lock strength should be used based on the
whether it was locked by ExecInitNode() or something before it.  Am
I missing something?  Can I not count on the lock being held at
that point?  Would the right level of API here be relation_open()
with NoLock rather than RelationIdGetRelation()?  Or is there some
other call which is more appropriate there?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drastic performance loss in assert-enabled build in HEAD

From
Tom Lane
Date:
[ sorry for being slow to respond, things are crazy this week ]

Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, the immediate takeaway is that this represents a horribly
>> expensive way for pg_dump to find out which matviews aren't
>> scannable.  The cheap way for it to find out would be if we had a
>> boolean flag for that in pg_class.  Would you review the bidding
>> as to why things were done the way they are?  Because in general,
>> having to ask the kernel something is going to suck in any case,
>> so basing it on the size of the disk file doesn't seem to me to
>> be basically a good thing.

> The early versions of the patch had a boolean in pg_class to track
> this, but I got complaints from Robert and Noah (and possibly
> others?) that this got too ugly in combination with the support for
> unlogged matviews, and they suggested the current approach.

Meh.  I don't think that we should allow that corner case to drive the
design like this.  I would *far* rather not have unlogged matviews at
all than this boondoggle.  I'm not even convinced that the existing
semantics are desirable: who's to say that having an unlogged matview
revert to empty on crash renders it invalid?  If it's a summary of
underlying unlogged tables, then that's actually the right thing.
(If someone is desperately unhappy with such a behavior, why are they
insisting on it being unlogged, anyway?)

If we go with this implementation, I think we are going to be painting
ourselves into a corner that will be difficult or impossible to get out
of, because the scannability state of a matview is not just a matter of
catalog contents but of on-disk files.  That will make it difficult to
impossible for pg_upgrade to cope reasonably with any future rethinking
of scannability status.

In fact, I'm going to go further and say that I do not like the entire
concept of scannability, either as to design or implementation, and
I think we should just plain rip it out.  We can rethink that for 9.4
or later, but what we've got right now is a kluge, and I don't want
to find ourselves required to be bug-compatible with it forevermore.
To take just the most salient point: assuming that we've somehow
determined that some matview is too out-of-date to be usable, why is the
appropriate response to that to cause queries on it to fail altogether?
That seems like about the least useful feature that could be devised.
Why not, say, have queries fall back to expanding the view definition as
though it were a regular view?

I think matviews without any scannability control are already a pretty
useful feature, and one I'd not be ashamed to ship just like that for
9.3.  But the scannability stuff is clearly going to need to be
revisited.  IMO, driving a stake in the ground at the exact spot that
this stake is placed is not a good plan.  If we simply remove that
aspect altogether for now, I think we'll have more room to maneuver in
future releases.

I apologize for not having griped about this earlier, but I've really
paid no attention to the matviews work up to now; there are only so
many cycles in a day.
        regards, tom lane



Re: Drastic performance loss in assert-enabled build in HEAD

From
Robert Haas
Date:
On Wed, Apr 3, 2013 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In fact, I'm going to go further and say that I do not like the entire
> concept of scannability, either as to design or implementation, and
> I think we should just plain rip it out.

This has been my feeling from the beginning, so I'm happy to support
this position.  I think the current version - where scan-ability is
tracked in just one way - is an improvement over the previous version
of the patch - where it was tracked in two different ways with a
confusing shuffle of information from one place to the other.  But my
favorite number of places to track it would be zero.

...Robert



Re: Drastic performance loss in assert-enabled build in HEAD

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 3, 2013 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In fact, I'm going to go further and say that I do not like the entire
>> concept of scannability, either as to design or implementation, and
>> I think we should just plain rip it out.

> This has been my feeling from the beginning, so I'm happy to support
> this position.  I think the current version - where scan-ability is
> tracked in just one way - is an improvement over the previous version
> of the patch - where it was tracked in two different ways with a
> confusing shuffle of information from one place to the other.  But my
> favorite number of places to track it would be zero.

To be clear, I think we'll end up tracking some notion of scannability
eventually.  I just don't think the current notion is sufficiently baked
to want to promise to be upward-compatible with it in future.
        regards, tom lane



Re: Drastic performance loss in assert-enabled build in HEAD

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:

>>> In fact, I'm going to go further and say that I do not like the
>>> entire concept of scannability, either as to design or
>>> implementation, and I think we should just plain rip it out.
>
>> This has been my feeling from the beginning, so I'm happy to
>> support this position.  I think the current version - where
>> scan-ability is tracked in just one way - is an improvement over
>> the previous version of the patch - where it was tracked in two
>> different ways with a confusing shuffle of information from one
>> place to the other. But my favorite number of places to track it
>> would be zero.
>
> To be clear, I think we'll end up tracking some notion of
> scannability eventually.  I just don't think the current notion
> is sufficiently baked to want to promise to be upward-compatible
> with it in future.

To be honest, I don't think I've personally seen a single use case
for matviews where they could be used if you couldn't count on an
error if attempting to use them without the contents reflecting a
materialization of the associated query at *some* point in time.
There probably are such, but I think removing this entirely takes
the percentage of use cases covered by the implementation in this
release down from 10% to 2%.

Consider where the Wisconsin Courts use "home-grown" materialized
views currently:

(1) On the public web site for circuit court data, visibility is
based on supreme court rules and the advice of a committee
consisting of judges, representatives of the press, defense
attorneys, prosecuting attorneys, etc.  There are cases in the
database which, for one reason or another, should not show up on
the public web site.  On a weekly basis, where monitoring shows the
lowest usage, the table of cases which are "too old" to be shown
according to the rules thus determined is regenerated.  If there
was the possibility that a dump and load could fail to fill this,
and the queries would run without error, they could not move from
ad hoc matview techniques to the new feature without excessive
risk.

(2) Individual judges have a "dashboard" showing such things as the
number of court cases which have gone beyond certain thresholds
without action.  They can "drill down" to detail so that cases
which have "slipped through the cracks" can be scheduled for some
appropriate action.  "Justice delayed..." and all of that.  It
would be much better to get an error which would result in
"information currently unavailable" than to give the impression
that there are no such cases.

Since the main point of this patch is to get the basis for a more
complete matview feature into place while still supporting *some*
use cases, and a high priority needs to be place on not painting
ourselves into a corner, I agree we should rip this out if we think
it does so.  I have spent some time looking at what we will want to
add in future releases, and a more sophisticated concept of what is
"fresh" enough to allow use of the materialized data is certainly
on the list, and falling back to running the underlying query like
a "normal" view would be a reasonable option to be able to choose
in some cases; but I think that will *always* need to start with
information about whether data *has* been generated, and an empty
set has to be considered valid data if it has been.  If we come up
with a way to track that which isn't too fragile, I'm confident
that it will remain useful as the feature evolves. Making sure that
the heap has at least one page if data has been generated seems
like a not-entirely-unreasonable way to do that, although there
remains at least one vacuum bug to fix if we keep it, in addition
to Tom's concerns.  It has the advantage of playing nicely with
unlogged tables.  If this is not going to be what we use long term,
do we have a clue what is?

Personally, I think it would be sad to reduce the use cases for
which matviews in 9.3 can be used to those which can tolerate an
error-free reference to a matview for which data has not been
generated, but I'll rip out support for that distinction if that is
the consensus.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drastic performance loss in assert-enabled build in HEAD

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> In fact, I'm going to go further and say that I do not like the
>>> entire concept of scannability, either as to design or
>>> implementation, and I think we should just plain rip it out.

> To be honest, I don't think I've personally seen a single use case
> for matviews where they could be used if you couldn't count on an
> error if attempting to use them without the contents reflecting a
> materialization of the associated query at *some* point in time.

Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED
VIEW, that minimum requirement is satisfied no?

I wouldn't actually want to remove that option, because pg_dump will
need it to avoid circular-reference problems.  But if you simply don't
use it then you have the minimum guarantee.  And I do not see where
the current implementation is giving you any more guarantees.

What it *is* doing is setting a rather dubious behavioral precedent that
we will no doubt hear Robert complaining about when (not if) we decide
we don't want to be backward-compatible with it anymore.

Granting that throwing an error is actually of some use to some people,
I would not think that people would want to turn it on via a command
that throws away the existing view contents altogether, nor turn it off
with a full-throated REFRESH.  There are going to need to be ways to
incrementally update matviews, and ways to disable/enable access that
are not tied to a complete rebuild, not to mention being based on
user-determined rather than hard-wired criteria for what's too stale.
So I don't think this is a useful base to build on.

If you feel that scannability disable is an absolute must for version 0,
let's invent a matview reloption or some such to implement it and let
users turn it on and off as they wish.  That seems a lot more likely
to still be useful two years from now.  And if you're absolutely
convinced that unlogged matviews mustn't work as I suggest, we can
lose those from 9.3, too.

What I'd actually rather see us spending time on right now is making
some provision for incremental updates, which I will boldly propose
could be supported by user-written triggers on the underlying tables
if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
matviews, and allowed them to operate on a matview's contents just like
it was a table.  Now admittedly that would foreclose allowing matviews
to be updatable in the updatable-view sense, but that's a feature I
would readily give up if it meant users could build incremental update
mechanisms this year and not two years down the road.

> (1) On the public web site for circuit court data, visibility is
> based on supreme court rules and the advice of a committee
> consisting of judges, representatives of the press, defense
> attorneys, prosecuting attorneys, etc.� There are cases in the
> database which, for one reason or another, should not show up on
> the public web site.� On a weekly basis, where monitoring shows the
> lowest usage, the table of cases which are "too old" to be shown
> according to the rules thus determined is regenerated.� If there
> was the possibility that a dump and load could fail to fill this,
> and the queries would run without error, they could not move from
> ad hoc matview techniques to the new feature without excessive
> risk.

Why exactly do you think that what I'm suggesting would cause a dump and
reload to not regenerate the data?  (Personally, I think pg_dump ought
to have an option selecting whether or not to repopulate matviews, but
again, if that's not what you want *don't use it*.)

> (2) Individual judges have a "dashboard" showing such things as the
> number of court cases which have gone beyond certain thresholds
> without action.� They can "drill down" to detail so that cases
> which have "slipped through the cracks" can be scheduled for some
> appropriate action.� "Justice delayed..." and all of that.� It
> would be much better to get an error which would result in
> "information currently unavailable" than to give the impression
> that there are no such cases.

If you need 100% accuracy, which is what this scenario appears to be
demanding, matviews are not for you (yet).  The existing implementation
certainly is nowhere near satisfying this scenario.

> ... Making sure that
> the heap has at least one page if data has been generated seems
> like a not-entirely-unreasonable way to do that, although there
> remains at least one vacuum bug to fix if we keep it, in addition
> to Tom's concerns.

No.  This is an absolute disaster.  It's taking something we have always
considered to be an irrelevant implementation detail and making it into
user-visible DDL state, despite the fact that it doesn't begin to satisfy
basic transactional behaviors.  We *need* to get rid of that aspect of
things.  If you must have scannability state in version 0, okay, but
it has to be a catalog property not this.

> It has the advantage of playing nicely with
> unlogged tables.  If this is not going to be what we use long term,
> do we have a clue what is?

I don't, but I don't think I'm required to propose something, given
that (a) we can certainly ship 9.3 without unlogged matviews, and
(b) you still haven't convinced me that the current semantics for
unlogged matviews are a requirement anyway.  Again, somebody who
doesn't want his matviews to go to empty on crash simply shouldn't
use an unlogged matview.
        regards, tom lane



Re: Drastic performance loss in assert-enabled build in HEAD

From
Nicolas Barbier
Date:
2013/4/3 Tom Lane <tgl@sss.pgh.pa.us>:

> Kevin Grittner <kgrittn@ymail.com> writes:
>
>> To be honest, I don't think I've personally seen a single use case
>> for matviews where they could be used if you couldn't count on an
>> error if attempting to use them without the contents reflecting a
>> materialization of the associated query at *some* point in time.
>
> Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED
> VIEW, that minimum requirement is satisfied no?

An argument against that is that computing the contents may be very expensive.

> Granting that throwing an error is actually of some use to some people,
> I would not think that people would want to turn it on via a command
> that throws away the existing view contents altogether, nor turn it off
> with a full-throated REFRESH.  There are going to need to be ways to
> incrementally update matviews, and ways to disable/enable access that
> are not tied to a complete rebuild, not to mention being based on
> user-determined rather than hard-wired criteria for what's too stale.
> So I don't think this is a useful base to build on.

Am I correct when I think that you are saying here, that the “zero
pages == unscannable” logic is not very future-proof? In that case I
concur, and I also think that this knowledge leaks in way too many
other places (the VACUUM bug mentioned by Kevin is a good example).

> If you feel that scannability disable is an absolute must for version 0,
> let's invent a matview reloption or some such to implement it and let
> users turn it on and off as they wish.  That seems a lot more likely
> to still be useful two years from now.

(In the context of making an unlogged matview unscannable after a crash:)

Is it imaginable that such a reloption could (in a future
implementation) be changed during or right after crash recovery? For
example, by storing the set of “truncated by crash recovery” relations
in a shared catalog table, which is then inspected when connecting to
a database to continue the truncation (in the case of a matview by
making it unscannable)?

> And if you're absolutely convinced that unlogged matviews mustn't work as I
> suggest, we can lose those from 9.3, too.

+1. Having unlogged matviews without having incremental updates yet,
isn’t super useful anyway.

> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

Please make the syntax for updating the “extent” (physical
representation) of a matview different from updating the view’s
logical contents. Examples:

(1) Require to use a special function to update the extent:

SELECT pg_mv_maintain('INSERT INTO example_matview ...');

While parsing the INSERT, the parser would know that it must interpret
“example_matview” as the matview’s extent; As currently the extent and
the view are the same, nothing must be done except for only allowing
the INSERT when it is parsed in the context of pg_mv_maintain, and
otherwise saying that matviews aren’t updatable yet (“NOTICE: did you
mean to update the extent? in that case use pg_mv_maintain”).

(2) Use a different schema (cf. TOAST) for the extent, e.g., view
“public.example_matview” vs. extent “pg_mv_extent.example_matview”. I
imagine future implementations to possibly require multiple extents
anyway, e.g., for storing the “not yet applied changesets” or other
intermediate things.

> Why exactly do you think that what I'm suggesting would cause a dump and
> reload to not regenerate the data?

Expensiveness: Matviews are used in cases where the data is expensive
to compute.

> We *need* to get rid of that aspect of things.  If you must have
> scannability state in version 0, okay, but it has to be a catalog property
> not this.

+1

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Drastic performance loss in assert-enabled build in HEAD

From
Kevin Grittner
Date:
Early versions of the matview patch had a relisvalid flag in
pg_class to determine whether the relation was scannable.  The name
was chosen based on a similarity to the purpose of indisvalid,
although the proliferation of new bools for related issues left me
wondering if a "char" would be a better idea.  Based on on-list
reviews I removed that in favor of basing the state on a
zero-length heap file, in an attempt to work better with unlogged
matviews.  I can easily look back through my development branch to
find the commits which made this change and revert them if this
approach is preferred.

I realize this would need to be Real Soon Now, but before reverting
to the earlier code I want to allow a *little* more time for
opinions.

Responses to particular points below.


Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Granting that throwing an error is actually of some use to some people,
> I would not think that people would want to turn it on via a command
> that throws away the existing view contents altogether, nor turn it off
> with a full-throated REFRESH.

There is no doubt that in future versions we will want to be able
to disallow scans based on other criteria than just whether the
data is valid as of *some* point in time.  I can't imagine a
circumstance under which we would want to allow scans if it
doesn't.  So, at least for this release and as a default for future
releases, I think it makes sense that a matview last CREATEd or
REFRESHed WITH NO DATA should not be scannable.  Additional knobs
for the users to control this can and should be added in future
releases.

> There are going to need to be ways to incrementally update
> matviews, and ways to disable/enable access that are not tied to
> a complete rebuild, not to mention being based on user-determined
> rather than hard-wired criteria for what's too stale.

Absolutely.  So far as I know, nobody has ever suggested or
expected otherwise.  This paper provides a useful summary of
techniques for incremental updates, with references to more
detailed papers:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&rep=rep1&type=pdf

I expect to be working on implementing the most obvious special
cases and a "catch-all" general solution for 9.4.  Efficient
incremental update seems to depend on the ability to have at least
one new "hidden" system column, so once we get to a good point for
discussing 9.4 issues, I'll be on about that.

> If you feel that scannability disable is an absolute must for version 0,
> let's invent a matview reloption or some such to implement it and let
> users turn it on and off as they wish.  That seems a lot more likely
> to still be useful two years from now.

What about the idea of a relisvalid bool or some "char" column in
pg_class?

> And if you're absolutely convinced that unlogged matviews mustn't
> work as I suggest, we can lose those from 9.3, too.

I was loath to do that, but as Nicolas points out, they really
aren't that interesting without incremental update.  Perhaps it is
better to drop those until next release and have a more
sophisticated way to deal with invalidation of those -- or as you
suggested, just punt them to be empty.  (I would hate to do that,
but it might be the lesser of evils.)

> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I think that should be the last resort, after we have a more
automated declarative way of maintaining the majority of cases.
Since I don't see too much there where using that technique with
matviews would give you more than you could do right now with
tables and triggers, hand-coding the incremental maintenance is
very low on my list of priorities.  In any event, I don't want to
rush into any such thing this close to release; that seems to me to
be clearly 9.4 material.

>> Individual judges have a "dashboard" showing such things as the
>> number of court cases which have gone beyond certain thresholds
>> without action

> If you need 100% accuracy, which is what this scenario appears to be
> demanding, matviews are not for you (yet).  The existing implementation
> certainly is nowhere near satisfying this scenario.

No, we're talking about timelines measured in weeks or months.  A
nightly update is acceptable, although there are occasional gripes
by a judge that they would like to see the results of cleaning up
such cases sooner than the next morning.  An hour latency would
probably make them happy.  If async incremental update could give a
new view of things in five or ten minutes they would be overjoyed.

>> ... Making sure that the heap has at least one page if data has
>> been generated seems like a not-entirely-unreasonable way to do
>> that, although there remains at least one vacuum bug to fix if
>> we keep it, in addition to Tom's concerns.
>
> No.  This is an absolute disaster.  It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors.  We *need* to get rid of that aspect of
> things.

OK, I'm convinced.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Subject updated to account for the wider topics now appearing.

On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I can't see taking MVs in the direction of allowing arbitrary DML; that's what
tables are for.  Users wishing to do that should keep using current methods:
 CREATE VIEW mv_impl AS SELECT ...; CREATE TABLE mv AS SELECT * FROM mv_impl; -- REFRESH analogue: fancier approaches
existBEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT;
 

If anything, I'd help these users by introducing mechanisms for obtaining a
TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW.
Namely, bulk index rebuilds, skipping WAL, and frozen tuples.

> > ... Making sure that
> > the heap has at least one page if data has been generated seems
> > like a not-entirely-unreasonable way to do that, although there
> > remains at least one vacuum bug to fix if we keep it, in addition
> > to Tom's concerns.
> 
> No.  This is an absolute disaster.  It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors.  We *need* to get rid of that aspect of
> things.  If you must have scannability state in version 0, okay, but
> it has to be a catalog property not this.

In large part, this ended up outside the catalogs due to key limitations of
the startup process.  This isn't the first time we've arranged an unusual
dance for this reason, and it probably won't be the last.  Sure, we could take
out unlogged MVs to evade the problem, but re-adding them will mean either
restoring relfilenode-based bookkeeping or attacking the startup process
limitation directly.  There exist fundamental challenges to a direct attack,
like the potential inconsistency of system catalogs themselves.  We could
teach pg_relation_is_scannable() that unlogged MVs are always non-scannable
during recovery, then somehow update system catalogs in all databases at the
end of recovery.  Not a project for 9.3, and I wouldn't be surprised to see it
mushroom in complexity.  The currently-committed approach is a good one given
the applicable constraints.

A slight variation on the committed approach would be to add a "_scannable"
relation fork.  The fork would either be absent (not scannable if an MV) or
empty (possibly-scannable).  Create it in CREATE MATERIALIZED VIEW sans WITH
NO DATA and in REFRESH MATERIALIZED VIEW.  Remove it in TRUNCATE.  When the
startup process reinitializes an unlogged relation, it would also remove any
_scannable fork.  This has a few advantages over the current approach: VACUUM
won't need a special case, and pg_upgrade will be in a better position to blow
away all traces if we introduce a better approach.  The disadvantage is an
extra inode per healthy MV.  (Though it does not lead to a 9.3 solution, I'll
note that an always-present relation metapage would help here.)

Note that I said "possibly-scannable" -- the relfilenode-based indicator
(whether the committed approach or something else) doesn't need to remain the
only input to the question of scannability.  If 9.5 introduces the concept of
age-based scannability expiration, the applicable timestamp could go in
pg_class, and pg_relation_is_scannable() could check both that and the
relfilenode-based indicator.


Concerning the original $SUBJECT, I would look at fixing the performance
problem by having pg_relation_is_scannable() use an algorithm like this:

1. Grab the pg_class entry from syscache.  If it's not found, return NULL.
2. If it's not a matview, return false.
3. Lock the matview and try to open a relcache entry.  Return NULL on failure.
4. Return the scannability as reported by the relcache.

This boils down to the CASE statement you noted upthread, except putting the
fast-exit logic in the function rather than in its caller(s).

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Drastic performance loss in assert-enabled build in HEAD

From
Noah Misch
Date:
On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote:
> 2013/4/3 Tom Lane <tgl@sss.pgh.pa.us>:
> > And if you're absolutely convinced that unlogged matviews mustn't work as I
> > suggest, we can lose those from 9.3, too.
> 
> +1. Having unlogged matviews without having incremental updates yet,
> isn't super useful anyway.

I would have surmised the opposite: since an unlogged MV requires a full
refresh at unpredictable moments, logged MVs will be preferred where a refresh
is prohibitively expensive.  Why might unlogged-MV applications desire
incremental updates more acutely than logged-MV applications?

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Noah Misch <noah@leadboat.com> writes:
> On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
>> No.  This is an absolute disaster.  It's taking something we have always
>> considered to be an irrelevant implementation detail and making it into
>> user-visible DDL state, despite the fact that it doesn't begin to satisfy
>> basic transactional behaviors.  We *need* to get rid of that aspect of
>> things.  If you must have scannability state in version 0, okay, but
>> it has to be a catalog property not this.

> In large part, this ended up outside the catalogs due to key limitations of
> the startup process.

Yeah, I realize that there's no other (easy) way to make unlogged
matviews reset to an invalid state on crash, but that doesn't make this
design choice less of a disaster.  It boxes us into something that's
entirely unable to support transitions between scannable and unscannable
states by any means short of a complete rewrite of the matview contents;
which seems fundamentally incompatible with any sort of incremental
update scenario.  And I remain of the opinion that it's going to box us
into not being able to fix the problems because of pg_upgrade
on-disk-compatibility issues.  We will be far better off to drop
unlogged matviews until we can come up with a better design.  If that's
so far off that no one can see it happening, well, that's tough.
Leaving the door open for incremental maintenance is more important.

> A slight variation on the committed approach would be to add a "_scannable"
> relation fork.

Not very transaction-safe, I think (consider crash midway through a
transaction that adds or removes the fork), and in any case orders of
magnitude more expensive than looking at a pg_class field.  This really
needs to be catalog state, not filesystem state.
        regards, tom lane



Tom Lane escribió:
> Noah Misch <noah@leadboat.com> writes:

> > A slight variation on the committed approach would be to add a "_scannable"
> > relation fork.
>
> Not very transaction-safe, I think (consider crash midway through a
> transaction that adds or removes the fork), and in any case orders of
> magnitude more expensive than looking at a pg_class field.  This really
> needs to be catalog state, not filesystem state.

We could revive the pg_class_nt patch proposed a decade ago, perhaps ...

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Drastic performance loss in assert-enabled build in HEAD

From
Nicolas Barbier
Date:
2013/4/5 Noah Misch <noah@leadboat.com>:

> On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote:
>
>> +1. Having unlogged matviews without having incremental updates yet,
>> isn't super useful anyway.
>
> I would have surmised the opposite: since an unlogged MV requires a full
> refresh at unpredictable moments, logged MVs will be preferred where a refresh
> is prohibitively expensive.

That sounds like a good reason to choose your matview to be logged indeed.

I.e., very expensive to rebuild → choose logged

The opposite is also true: If your matview is not so expensive to
rebuild, why would it matter that much if it is logged? (Rebuilding
would be a tad slower, but it is not that slow to start with, so who
cares?)

I.e., not so expensive to rebuild → logged or unlogged are fine

This would mean “always choose logged,” except for the restricted case
of “incremental updates of a matview that is not so expensive to
rebuild” that I describe next:

> Why might unlogged-MV applications desire incremental updates more acutely
> than logged-MV applications?

My reasoning was more like: If you have incremental updates, there
will probably be some overhead linked to executing any transaction
that updates the base tables, namely for storing the changesets
somewhere. I imagined it could at least be this storing of changesets
that one would want to be unlogged, lest it slowing down the commit of
most transactions that don’t even touch the matview.

(Note that losing any changeset is the same as losing the contents of
the whole matview in the “always guaranteed to be 100% up-to-date when
read” case.)

Of course, because of the previous reasoning, we should always make a
matview logged if it is very expensive to rebuild. That leaves the
case of a not-so-expensive matview: It is smart to make it unlogged
when the changesets are stored in a way similar to what I just
described.

Anyway, I conclude that (especially as we don’t have incremental
updates yet), unlogged matviews are, as things stand now, not very
useful.


As a sidenote, I see two ways to avoid storing changesets as part of a
commit that changes the base tables. Choosing any of those would
invalidate my previous logic, and even more diminish the need for
unlogged matviews:

(1) WAL is used as the source for the changesets; Andres’ logical
replication work comes to mind. Probably mostly useful for the “always
guaranteed to be 100% up-to-date when read” case.

(2) The base tables are scanned and the xmin/xmax values are used to
determine what changed since last time. This probably requires keeping
VACUUM from removing rows that are still needed to determine how to
change any matviews that depend on that base table. Probably mostly
useful for the case where bulk-incremental updates are performed only
sporadically, and where the base tables are not so big that they
cannot usefully be scanned in full.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



On Thu, Apr 04, 2013 at 06:07:17PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
> >> No.  This is an absolute disaster.  It's taking something we have always
> >> considered to be an irrelevant implementation detail and making it into
> >> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> >> basic transactional behaviors.  We *need* to get rid of that aspect of
> >> things.  If you must have scannability state in version 0, okay, but
> >> it has to be a catalog property not this.
> 
> > In large part, this ended up outside the catalogs due to key limitations of
> > the startup process.
> 
> Yeah, I realize that there's no other (easy) way to make unlogged
> matviews reset to an invalid state on crash, but that doesn't make this
> design choice less of a disaster.  It boxes us into something that's
> entirely unable to support transitions between scannable and unscannable
> states by any means short of a complete rewrite of the matview contents;
> which seems fundamentally incompatible with any sort of incremental
> update scenario.

I said:
> > [...] the relfilenode-based indicator
> > (whether the committed approach or something else) doesn't need to remain the
> > only input to the question of scannability.  If 9.5 introduces the concept of
> > age-based scannability expiration, the applicable timestamp could go in
> > pg_class, and pg_relation_is_scannable() could check both that and the
> > relfilenode-based indicator.

To make that concrete, suppose we implement something you suggested upthread,
"ALTER MATERIALIZED VIEW foo SET (scannable = false)".  Implementing that by
replacing the relfilenode with a single empty page or removing a _scannable
fork would indeed be all wrong.  The former blows away data you may wish to
make reappear later, and the latter is non-transactional.  But what has us do
it that way?  Store the new indicator in reloptions, where it belongs, and
make pg_relation_is_scannable() test "relfilenode_says_scannable &&
reloptions_say_scannable".

> And I remain of the opinion that it's going to box us
> into not being able to fix the problems because of pg_upgrade
> on-disk-compatibility issues.

I said:
> > [a "_scannable" fork] has a few advantages over the current approach: VACUUM
> > won't need a special case, and pg_upgrade will be in a better position to blow
> > away all traces if we introduce a better approach.

pg_upgrade looks for _scannable forks, then translates their presence into the
new representation.  The committed single-empty-page strategy doesn't have an
acute problem here, either, though.

> > A slight variation on the committed approach would be to add a "_scannable"
> > relation fork.
> 
> Not very transaction-safe, I think (consider crash midway through a
> transaction that adds or removes the fork)

The SQL commands I cited as responsible for creating or removing the fork all
make a new relfilenode anyway.  Thus, "add" actually means creating the fork
with the new relfilenode, and "remove" actually means omitting the fork from
the new relfilenode.  The association between relfilenodes and relations is,
of course, transactional.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I realize that there's no other (easy) way to make unlogged
> matviews reset to an invalid state on crash, but that doesn't
> make this design choice less of a disaster.  It boxes us into
> something that's entirely unable to support transitions between
> scannable and unscannable states by any means short of a complete
> rewrite of the matview contents; which seems fundamentally
> incompatible with any sort of incremental update scenario.

That assertion makes absolutely no sense.  Once we design (in a
future release cycle) a way for users to declare how current a view
must be to be usable, there is no reason the
pg_relation_is_scannable() function cannot be modified to use those
mechanisms.  I continue to believe that it is a bad idea to ever
allow a matview to be scanned when it is not a materialization of
its query, but that does *not* mean that all matviews that are a
materialization of the query would need to be considered scannable.
My working assumption was that the isscannable field in relcache
would be the first of many tests once we get there; if the matview
actually does represent some materialization of data, the function
would then proceed to check whether it is current enough to use,

It does mean that you could not start incremental maintenance on a
matview without first generating a base by running the query to
create initial data (as part of CREATE or REFRESH), but that seems
like a *good* thing to me.  To do otherwise would make no more
sense than to try to recover a database using just WAL files
without a base backup to apply them to.

> And I remain of the opinion that it's going to box us into not
> being able to fix the problems because of pg_upgrade on-disk-
> compatibility issues.

This argument also seems bogus to me.  Since it is a valid heap
either way, the *worst case* would be recommending that after
upgrading users take some special action on any materialized views
which were not scannable; and I doubt that we would need to do
that.  If you see some risk I'm missing, please elaborate.

> We will be far better off to drop unlogged matviews until we can
> come up with a better design.  If that's so far off that no one
> can see it happening, well, that's tough.  Leaving the door open
> for incremental maintenance is more important.

I've been looking at what is needed for incremental maintenance,
and I'm not seeing the problem.  Since you can't incrementally
update a view which has never been populated, the only way in which
it could create a problem in terms of transactional behavior, I
think, is that this would make it harder to not hold an
AccessExclusiveLock when transitioning between not having
materialized data and having it (or vice versa), and I'm dubious we
can avoid such a lock anyway.  I don't see where it creates any
problems for performing incremental updates once we are in a state
which can allow them.

> This really needs to be catalog state, not filesystem state.

That may be true, but the arguments in this post are so off-base
that I'm wondering whether it really is.  When I read some earlier
posts I was convinced, but now I think I need to review the whole
thread again to make sure I wasn't too quick to concede the point.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drastic performance loss in assert-enabled build in HEAD

From
Kevin Grittner
Date:
Noah Misch <noah@leadboat.com> wrote:
> On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote:

>> +1. Having unlogged matviews without having incremental updates
>> yet, isn't super useful anyway.
>
> I would have surmised the opposite

Hmm.  I was thinking about the fact that a full refresh can be
unlogged anyway, but that's only true if you're not using WAL for
replication.  If you need a matview on the master but not on the
replica(s), then there is still benefit to declaring it to be
unlogged in the absence of incremental maintenance.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Noah Misch <noah@leadboat.com> wrote:

> The SQL commands I cited as responsible for creating or removing
> the fork all make a new relfilenode anyway.  Thus, "add" actually
> means creating the fork with the new relfilenode, and "remove"
> actually means omitting the fork from the new relfilenode.  The
> association between relfilenodes and relations is, of course,
> transactional.

The same argument applies to the currently-committed code.  The
goal is to not change a matview between zero-length and non-zero
length once the heap exists; but to only have this state change
when REFRESH replaces the heap in the style of CLUSTER, TRUNCATE,
ALTER TABLE with heap rewrite, or the new VACUUM FULL.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, Apr 05, 2013 at 07:00:38AM -0700, Kevin Grittner wrote:
> Noah Misch <noah@leadboat.com> wrote:
> 
> > The SQL commands I cited as responsible for creating or removing
> > the fork all make a new relfilenode anyway.  Thus, "add" actually
> > means creating the fork with the new relfilenode, and "remove"
> > actually means omitting the fork from the new relfilenode.  The
> > association between relfilenodes and relations is, of course,
> > transactional.
> 
> The same argument applies to the currently-committed code.

True.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Kevin Grittner <kgrittn@ymail.com> wrote:

> I think I need to review the whole thread again to make sure I
> wasn't too quick to concede the point.

On a fresh reading of this, I think a large part of what is at
issue here stems from a bad name for the new bool field I added to
the RelationData structure -- instead of rd_isscannable it should
probably be called something like rd_ispopulated.  The current name
led to some fuzzy thinking on my part when it was referenced, and
both the name and a couple ill-considered uses of it probably
contributed to concerns about how it was generated and used.

I will post a draft patch today to see whether concerns abate.
Basically, the name change should help make clear that this is not
intended to be the only way to determine whether a matview is
scannable.  Second, there is at least on (and probably more) direct
tests of this field which should use a function for a scannability
test.  For 9.3, that will just wrap a test of this bool, but it
makes clear what the longer-term intent is, and help ensure that
things don't get missed when patches are written in later releases.
Third, some comments need to be corrected and added.

Hopefully it can help get us all onto the same page.  If not, it
should at least better focus the discussion.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Kevin Grittner <kgrittn@ymail.com> wrote:

> I think a large part of what is at issue here stems from a bad
> name for the new bool field I added to the RelationData structure
> -- instead of rd_isscannable it should probably be called
> something like rd_ispopulated.  The current name led to some
> fuzzy thinking on my part when it was referenced, and both the
> name and a couple ill-considered uses of it probably contributed
> to concerns about how it was generated and used.
>
> I will post a draft patch today to see whether concerns abate.
> Basically, the name change should help make clear that this is
> not intended to be the only way to determine whether a matview is
> scannable.  Second, there is at least on (and probably more)
> direct tests of this field which should use a function for a
> scannability test.  For 9.3, that will just wrap a test of this
> bool, but it makes clear what the longer-term intent is, and help
> ensure that things don't get missed when patches are written in
> later releases.  Third, some comments need to be corrected and
> added.
>
> Hopefully it can help get us all onto the same page.  If not, it
> should at least better focus the discussion.

Attached is a firt cut at drawing a bright line between the notion
of whether a matview has been *populated* and whether it is
*scannable*.  In 9.3 one is true if and only if the other is, but
I plan on doing work such that this will no longer be true in the
next release, and not making a clear distinction now has been
confusing everyone, including me.

This patch is light on functional changes, and heavier on name and
comment changes.  It does fix the lack of locking for the
user-visible pg_relation_is_scannable() function, and it does
correct the performance regression in pg_dump, but those should be
the only user-visible changes.  Internally I also tried to correct
a modularity violation complained of by Tom.

Vacuum still needs to be taught not to truncate away the first page
of a matview, but that seemed like it was better left for a
separate patch, and if we decide not to use the current technique
for identifying a non-populated matview, it owuld be one more thing
to undo.

It passed `make check-world` and `make installcheck-world`, and a
dump and load of the regression database works.  I got sidetracked
with some support issues, so I didn't have time to dig around for
other weak comments, but I think I'm close on all other changes.

Comments?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Drastic performance loss in assert-enabled build in HEAD

From
Noah Misch
Date:
On Fri, Apr 05, 2013 at 11:17:30AM +0200, Nicolas Barbier wrote:
> 2013/4/5 Noah Misch <noah@leadboat.com>:
> > On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote:
> >> +1. Having unlogged matviews without having incremental updates yet,
> >> isn't super useful anyway.
> >
> > I would have surmised the opposite: since an unlogged MV requires a full
> > refresh at unpredictable moments, logged MVs will be preferred where a refresh
> > is prohibitively expensive.
> 
> That sounds like a good reason to choose your matview to be logged indeed.
> 
> I.e., very expensive to rebuild → choose logged
> 
> The opposite is also true: If your matview is not so expensive to
> rebuild, why would it matter that much if it is logged? (Rebuilding
> would be a tad slower, but it is not that slow to start with, so who
> cares?)
> 
> I.e., not so expensive to rebuild → logged or unlogged are fine
> 
> This would mean “always choose logged,” except for the restricted case
> of “incremental updates of a matview that is not so expensive to
> rebuild” that I describe next:

Cheap is good, but cheaper is better.  Since a refresh locks out readers, mild
wins do count.

> > Why might unlogged-MV applications desire incremental updates more acutely
> > than logged-MV applications?
> 
> My reasoning was more like: If you have incremental updates, there
> will probably be some overhead linked to executing any transaction
> that updates the base tables, namely for storing the changesets
> somewhere. I imagined it could at least be this storing of changesets
> that one would want to be unlogged, lest it slowing down the commit of
> most transactions that don’t even touch the matview.

That's a good point.  However, the same holds when refreshing a logged MV,
especially at wal_level > minimal.  The refresh exerts pressure on the WAL
stream, likely delaying other WAL-using transactions.

> As a sidenote, I see two ways to avoid storing changesets as part of a
> commit that changes the base tables. Choosing any of those would
> invalidate my previous logic, and even more diminish the need for
> unlogged matviews:

Regardless of the change storage method, actually applying incremental changes
to a logged MV will require WAL.  UNLOGGED MVs will stay relevant.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com