Re: performance of IN (subquery) - Mailing list pgsql-general

From Greg Stark
Subject Re: performance of IN (subquery)
Date
Msg-id 87r7psmlnp.fsf@stark.xeocode.com
Whole thread Raw
In response to performance of IN (subquery)  (Kevin Murphy <murphy@genome.chop.edu>)
Responses Re: performance of IN (subquery)
List pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > I'm not about to run analyze in the middle of the data generation
> > (which wouldn't work anyways since it's in a transaction).
>
> Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
> The cached-plan business is a problem, I agree, but I think it's
> orthogonal to this particular discussion (and you can always use EXECUTE
> if you have to).

It's orthogonal. My point was that I have a bigger problem, but even if I
address it by switching away from plpgsql, or I guess by using EXECUTE, I
would still have a problem. I didn't realize you could run analyze in a
transaction, but even being able to I wouldn't really want to have to do that
repeatedly during the job.

This new approach would actually complete the fix, a perl or plpgsql EXECUTE
implementation would gradually shift statistics during the job.

Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.

> but separate ANALYZE could definitely make an estimate of the fraction of
> dead tuples.

With analyze in a transaction I'm not clear what the semantics should be
though. I suppose it should only count tuples visible to the transaction
analyze?

> Nope, you aren't.  The above seems to me to be a recipe for degradation
> of performance over time, precisely because the plans wouldn't change in
> the face of changes in the situation.

A gradual degradation is ok. A gradual degradation means I can schedule a
nightly analyze and report on any changed plans and either automatically
accept them or manually approve them individually.

A sudden degradation is much more dangerous. Even if it's rare, a sudden
degradation means an outage in prime time.

As I said, it doesn't matter to me if every query is 10% slower than possible,
as long as no query takes 1000% as long as necessary even if it's a 1 in 1000
occurrence.

> I've resisted adding "planner hints" to the language for this reason, and
> I'm certainly not eager to offer any hard guarantees about plans not
> changing.

I just want to control _when_ they change. Eventually you'll come around. I
think it'll be a slow gradual change in thinking as the user-base changes
though. Not something I'll change with a single argument in one day.

--
greg

pgsql-general by date:

Previous
From: Dino Vliet
Date:
Subject: job for sql, pl/pgsql,gawk,perl or ??
Next
From: Greg Stark
Date:
Subject: Re: R: R: space taken by a row & compressed data