Re: performance of IN (subquery) - Mailing list pgsql-general

From Tom Lane
Subject Re: performance of IN (subquery)
Date
Msg-id 5862.1093578586@sss.pgh.pa.us
Whole thread Raw
In response to Re: performance of IN (subquery)  (Paul Tillotson <pntil@shentel.net>)
Responses Re: performance of IN (subquery)  (Greg Stark <gsstark@mit.edu>)
Re: performance of IN (subquery)  (Gaetano Mendola <mendola@bigfoot.com>)
Re: performance of IN (subquery)  (Jon Lapham <lapham@jandr.org>)
List pgsql-general
Paul Tillotson <pntil@shentel.net> writes:
> The planner thinks that the sequential scan on elements will return 1000
> rows, but it actually returned 185000. Did you ANALYZE this table recently?

Or either of the other ones?  All those scan costs look like defaults
:-(

> Afterthought: It would be nice if the database was smart enough to
> analyze a table of its own accord when a sequential scan returns more
> than, say, 20 times what it was supposed to.

I've thought about this before.  One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumberOfBlocks() during planning.  In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
   every planning operation.  I'm not sure this would be significant,
   but I'm not sure it wouldn't be, either.

2. Instability of plans.  Right now, the planner will not change plans
   underneath you --- you have to issue an explicit VACUUM or ANALYZE
   to change the terms of discussion.  That would stop being true if
   physical file size were always taken into account.  Maybe this is a
   problem, or maybe it isn't ... as someone who likes to be able to
   debug planner behavior without actually creating umpteen-gig test
   tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?

            regards, tom lane

pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: performance of IN (subquery)
Next
From: Tom Lane
Date:
Subject: Re: owner orphaned databases