Re: Drastic performance loss in assert-enabled build in HEAD - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Drastic performance loss in assert-enabled build in HEAD |
Date | |
Msg-id | 24977.1364571052@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Drastic performance loss in assert-enabled build in HEAD (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: Drastic performance loss in assert-enabled build in HEAD
|
List | pgsql-hackers |
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So maybe I'm nuts to care about the performance of an assert-enabled >> backend, but I don't really find a 4X runtime degradation acceptable, >> even for development work.� Does anyone want to fess up to having caused >> this, or do I need to start tracking down what changed? > I checked master HEAD for a dump of regression and got about 4 > seconds.� I checked right after my initial push of matview code and > got 2.5 seconds.� I checked just before that and got 1 second. > There was some additional pg_dump work for matviews after the > initial push which may or may not account for the rest of the time. I poked at this a bit, and eventually found that the performance differential goes away if I dike out the pg_relation_is_scannable() call in getTables()'s table-collection query. What appears to be happening is that those calls cause a great many more relcache entries to be made than used to happen, plus many entries are made earlier in the run than they used to be. Many of those entries have subsidiary memory contexts, which results in an O(N^2) growth in the time spent in AllocSetCheck, since postgres.c does MemoryContextCheck(TopMemoryContext) once per received command, and pg_dump will presumably issue O(N) commands in an N-table database. So one question that brings up is whether we need to dial back the amount of work done in memory context checking, but I'm loath to do so in development builds. That code has fingered an awful lot of bugs. OTOH, if the number of tables in the regression DB continues to grow, we may have little choice. Anyway, the immediate takeaway is that this represents a horribly expensive way for pg_dump to find out which matviews aren't scannable. The cheap way for it to find out would be if we had a boolean flag for that in pg_class. Would you review the bidding as to why things were done the way they are? Because in general, having to ask the kernel something is going to suck in any case, so basing it on the size of the disk file doesn't seem to me to be basically a good thing. We could alleviate the pain by changing pg_dump's query to something like (case when c.relkind = 'm' then pg_relation_is_scannable(c.oid) else false end) but TBH this feels like bandaging a bad design. Another reason why I don't like this code is that pg_relation_is_scannable is broken by design: relid = PG_GETARG_OID(0);relation = RelationIdGetRelation(relid);result = relation->rd_isscannable;RelationClose(relation); You can't do that: if the relcache entry doesn't already exist, this will try to construct one while not holding any lock on the relation, which is subject to all sorts of race conditions. (In general, direct calls on RelationIdGetRelation are probably broken. I see you introduced more than one such in the matviews patch, and I'm willing to bet they are all unsafe.) regards, tom lane
pgsql-hackers by date: