Re: Drastic performance loss in assert-enabled build in HEAD - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Drastic performance loss in assert-enabled build in HEAD
Date
Msg-id 24977.1364571052@sss.pgh.pa.us
Whole thread Raw
In response to Re: Drastic performance loss in assert-enabled build in HEAD  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Drastic performance loss in assert-enabled build in HEAD  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So maybe I'm nuts to care about the performance of an assert-enabled
>> backend, but I don't really find a 4X runtime degradation acceptable,
>> even for development work.� Does anyone want to fess up to having caused
>> this, or do I need to start tracking down what changed?

> I checked master HEAD for a dump of regression and got about 4
> seconds.� I checked right after my initial push of matview code and
> got 2.5 seconds.� I checked just before that and got 1 second. 
> There was some additional pg_dump work for matviews after the
> initial push which may or may not account for the rest of the time.

I poked at this a bit, and eventually found that the performance
differential goes away if I dike out the pg_relation_is_scannable() call
in getTables()'s table-collection query.  What appears to be happening
is that those calls cause a great many more relcache entries to be made
than used to happen, plus many entries are made earlier in the run than
they used to be.  Many of those entries have subsidiary memory contexts,
which results in an O(N^2) growth in the time spent in AllocSetCheck,
since postgres.c does MemoryContextCheck(TopMemoryContext) once per
received command, and pg_dump will presumably issue O(N) commands in an
N-table database.

So one question that brings up is whether we need to dial back the
amount of work done in memory context checking, but I'm loath to do so
in development builds.  That code has fingered an awful lot of bugs.
OTOH, if the number of tables in the regression DB continues to grow,
we may have little choice.

Anyway, the immediate takeaway is that this represents a horribly
expensive way for pg_dump to find out which matviews aren't scannable.
The cheap way for it to find out would be if we had a boolean flag for
that in pg_class.  Would you review the bidding as to why things were
done the way they are?  Because in general, having to ask the kernel
something is going to suck in any case, so basing it on the size of the
disk file doesn't seem to me to be basically a good thing.

We could alleviate the pain by changing pg_dump's query to something
like
(case when c.relkind = 'm' then pg_relation_is_scannable(c.oid) else false end)

but TBH this feels like bandaging a bad design.

Another reason why I don't like this code is that
pg_relation_is_scannable is broken by design:
relid = PG_GETARG_OID(0);relation = RelationIdGetRelation(relid);result =
relation->rd_isscannable;RelationClose(relation);

You can't do that: if the relcache entry doesn't already exist, this
will try to construct one while not holding any lock on the relation,
which is subject to all sorts of race conditions.  (In general, direct
calls on RelationIdGetRelation are probably broken.  I see you
introduced more than one such in the matviews patch, and I'm willing to
bet they are all unsafe.)
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Getting to 9.3 beta
Next
From: Tom Lane
Date:
Subject: Re: Getting to 9.3 beta