Thread: performance of IN (subquery)

performance of IN (subquery)

From
Kevin Murphy
Date:
I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from
bar where baz in (subquery)', or updates like 'update bar set foo = 2
where baz in (subquery)'.  PG always seems to want to do a sequential
scan of the bar table.  I wish there were a way of telling PG, "use the
index on baz in your plan, because I know that the subquery will return
very few results".   Where it really matters, I have been constructing
dynamic queries by looping over the values for baz and building a
separate query for each one and combining with a UNION (or just
directly updating, in the update case).  Depending on the size of the
bar table, I can get speedups of hundreds or even more than a thousand
times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
       7644
       7644
(2 rows)
Time: 518.242 ms


The subquery is fast:

SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1;
------------
       41209
       25047
(2 rows)
Time: 3.268 ms


And using indexes on the main table is fast:

select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
       7644
       7644
(2 rows)
Time: 2.468 ms

The plan for the slow query:

egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#
                                                  QUERY PLAN
                         \

------------------------------------------------------------------------
-------------------------------------------------------------
  Hash Join  (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
    Hash Cond: ("outer".element_id = "inner".elementid)
    ->  HashAggregate  (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
          ->  Hash Join  (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
                Hash Cond: ("outer".superloc_id = "inner".superloc_id)
                ->  Seq Scan on superlocs_2  (cost=0.00..20.00 rows=1000
width=8) (actual time=0.004..0.012 rows=9 loops=1)
                ->  Hash  (cost=22.50..22.50 rows=5 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
                      ->  Seq Scan on bundle_superlocs_2
(cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2
loops=1)
                            Filter: (protobundle_id = 1)
    ->  Hash  (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
          ->  Seq Scan on elements  (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.172..405.243 rows=185535 loops=1)
  Total runtime: 593.843 ms
(12 rows)


Re: performance of IN (subquery)

From
"Marc G. Fournier"
Date:
On Thu, 26 Aug 2004, Kevin Murphy wrote:

> I'm using PG 7.4.3 on Mac OS X.
>
> I am disappointed with the performance of queries like 'select foo from bar
> where baz in (subquery)', or updates like 'update bar set foo = 2 where baz
> in (subquery)'.  PG always seems to want to do a sequential scan of the bar
> table.  I wish there were a way of telling PG, "use the index on baz in your
> plan, because I know that the subquery will return very few results".   Where
> it really matters, I have been constructing dynamic queries by looping over
> the values for baz and building a separate query for each one and combining
> with a UNION (or just directly updating, in the update case).  Depending on
> the size of the bar table, I can get speedups of hundreds or even more than a
> thousand times, but it is a big pain to have to do this.
>
> Any tips?
>
> Thanks,
> Kevin Murphy
>
> Illustrated:
>
> The query I want to do is very slow:
>
> select bundle_id from build.elements
> where elementid in (
> SELECT superlocs_2.element_id
>           FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
>           WHERE bundle_superlocs_2.protobundle_id = 1);
> -----------
>      7644
>      7644
> (2 rows)
> Time: 518.242 ms

what field type is protobundle_id?  if you typecast the '1' to be the
same, does the index get used?

Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: performance of IN (subquery)

From
Paul Tillotson
Date:
Kevin Murphy wrote:

> ------------------------------------------------------------------------
> -------------------------------------------------------------
> Hash Join (cost=70.33..72.86 rows=25 width=4) (actual
> time=583.051..583.059 rows=2 loops=1)
> Hash Cond: ("outer".element_id = "inner".elementid)
> -> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual
> time=0.656..0.658 rows=2 loops=1)
> -> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual
> time=0.615..0.625 rows=2 loops=1)
> Hash Cond: ("outer".superloc_id = "inner".superloc_id)
> -> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000 width=8)
> (actual time=0.004..0.012 rows=9 loops=1)
> -> Hash (cost=22.50..22.50 rows=5 width=4) (actual time=0.076..0.076
> rows=0 loops=1)
> -> Seq Scan on bundle_superlocs_2 (cost=0.00..22.50 rows=5 width=4)
> (actual time=0.024..0.033 rows=2 loops=1)
> Filter: (protobundle_id = 1)
> -> Hash (cost=20.00..20.00 rows=1000 width=8) (actual
> time=581.802..581.802 rows=0 loops=1)
> -> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8) (actual
> time=0.172..405.243 rows=185535 loops=1)

The planner thinks that the sequential scan on elements will return 1000
rows, but it actually returned 185000. Did you ANALYZE this table recently?

Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.

Paul

> Total runtime: 593.843 ms
> (12 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>


Re: performance of IN (subquery)

From
"Arthur Ward"
Date:
> Afterthought: It would be nice if the database was smart enough to
> analyze a table of its own accord when a sequential scan returns more
> than, say, 20 times what it was supposed to.

I've wondered on several occasions if there is any good reason for PG not
to automatically perform an analyze concurrently with a seq scan as it's
happening. That way, no extra disk IO is needed and the stats could say
up-to-date for almost free.

Any hackers around who can say why this might be a bad idea, or is it one
of those things that just needs a volunteer? (I'm not; at least not now.)

Re: performance of IN (subquery)

From
Greg Stark
Date:
"Arthur Ward" <award-postgresql@dominionsciences.com> writes:

> Any hackers around who can say why this might be a bad idea, or is it one
> of those things that just needs a volunteer? (I'm not; at least not now.)

a) that would make plans change spontaneously. I hate being paged in the
middle of the night because some query is suddenly being slow when it had been
performing fine before.

b) Not all sequential scans will actually complete the scan. There could be a
limit imposed or a the sequential scan could be inside a EXISTS. In that case
the scan could be aborted at any point.

What I do think would be easy to do would be to keep statistics on the expense
of various components of the cost estimates. cpu_*_cost, random_page_cost
effective_cache_size, ought to be values that can be solved for empirically
from the timing results.

But that still doesn't have to be done on every query. There's a trade-off
between work done on every query to plan queries and the benefit. Gathering
statistics and storing them on every sequential scan is way too much work
slowing down every query for minimal gain.

--
greg

Re: performance of IN (subquery)

From
Tom Lane
Date:
Paul Tillotson <pntil@shentel.net> writes:
> The planner thinks that the sequential scan on elements will return 1000
> rows, but it actually returned 185000. Did you ANALYZE this table recently?

Or either of the other ones?  All those scan costs look like defaults
:-(

> Afterthought: It would be nice if the database was smart enough to
> analyze a table of its own accord when a sequential scan returns more
> than, say, 20 times what it was supposed to.

I've thought about this before.  One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumberOfBlocks() during planning.  In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
   every planning operation.  I'm not sure this would be significant,
   but I'm not sure it wouldn't be, either.

2. Instability of plans.  Right now, the planner will not change plans
   underneath you --- you have to issue an explicit VACUUM or ANALYZE
   to change the terms of discussion.  That would stop being true if
   physical file size were always taken into account.  Maybe this is a
   problem, or maybe it isn't ... as someone who likes to be able to
   debug planner behavior without actually creating umpteen-gig test
   tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?

            regards, tom lane

Re: performance of IN (subquery)

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I've thought about this before. One simple trick would be to get rid of the
> current pg_class reltuples/relpages fields in favor of a tuples-per-page
> estimate, which could be multiplied by RelationGetNumberOfBlocks() during
> planning.

This would do something interesting to one of the problem cases I have now. I
have trouble testing a particular batch job that generates a large amount of
precalculated denormalized data.

That's because the data is empty when the job starts, so the plpgsql function
that handles the job caches plans based on an empty table. But as the job
proceeds the data grows and I'm afraid the cached plan may start performing
poorly.

In order to test it I need to run it once, analyze, then reload the function,
truncate the data and run it another time. And hope I generated good
representative test data.

I'm thinking of changing to a non-plpgsql implementation. But that's only half
the issue. I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction). So I can't really
get good statistics for this job, not until we're actually in a steady state
in production.

Sometimes I wonder whether I wouldn't rather a more predictable system with
less focus on statistics. The resulting plans would be more predictable and
predictability is a good thing in production systems...

> In the absence of any ANALYZE data the tuples-per-page estimate might be
> pretty bogus, but it couldn't be off by more than an order of magnitude or
> so either way.

I don't see why it couldn't. If you have a table badly in need of vacuuming
(or had one at the time of the last analyze) it could be off by way more than
an order of magnitude.

For that matter, a table that had undergone many deletes and then been
vacuumed would not change length for a long time afterward even as many new
inserts are performed. Until the table is analyzed the estimated table size
could be off by an arbitrary factor.

> The objections that could be raised to this are (AFAICS) two:
>
> 1. Adding at least an lseek() kernel call per table, and per index, to
>    every planning operation.  I'm not sure this would be significant,
>    but I'm not sure it wouldn't be, either.

That seems like something that could be addressed with enough time. A single
value per table, couldn't it be stored in shared memory and only updated
whenever the heap is extended or truncated? Even if you have thousands of
tables that would only be a few kilobytes of shared memory.

> 2. Instability of plans.  Right now, the planner will not change plans
>    underneath you --- you have to issue an explicit VACUUM or ANALYZE
>    to change the terms of discussion.  That would stop being true if
>    physical file size were always taken into account.  Maybe this is a
>    problem, or maybe it isn't ... as someone who likes to be able to
>    debug planner behavior without actually creating umpteen-gig test
>    tables, my world view may be a bit skewed ...

This is what I'm afraid of. As a OLTP application programmer -- web sites, I
admit -- I care a lot more about plan stability than finding optimal plans.

An well written OLTP application will only have a fixed number of queries that
are executed repeatedly with different parameters. I don't care how long the
queries take as long as they're always "fast enough".

Every time a new plan is tried there's a chance that it will be wrong. It only
takes one wrong plan out of the hundreds of queries to bring down the entire
application.

Ideally I would want a guarantee that every query would *always* result in the
same plan. Once I've tested them and approved the plans I want to know that
only those approved plans will ever run, and I want to be present and be able
to verify new plans before they go into production.

I doubt I'm going to convince anyone today, but I think there will be a
gradual change in mindset as the new binary protocol becomes more popular. And
postgres takes over some of mysql's web mindshare.

--
greg

Re: performance of IN (subquery)

From
Gaetano Mendola
Date:
Tom Lane wrote:


> 2. Instability of plans.  Right now, the planner will not change plans
>    underneath you --- you have to issue an explicit VACUUM or ANALYZE
>    to change the terms of discussion.  That would stop being true if
>    physical file size were always taken into account.  Maybe this is a
>    problem, or maybe it isn't ... as someone who likes to be able to
>    debug planner behavior without actually creating umpteen-gig test
>    tables, my world view may be a bit skewed ...

Did you forgot the autovacuum daemon ? I didn't see anyone bitten or
paged during the night for autovacuum daemon job.


Regards
Gaetano Mendola



Re: performance of IN (subquery)

From
Jon Lapham
Date:
Tom Lane wrote:
> I've thought about this before.  One simple trick would be to get rid
> of the current pg_class reltuples/relpages fields in favor of a
> tuples-per-page estimate, which could be multiplied by
> RelationGetNumberOfBlocks() during planning.  In the absence of any
> ANALYZE data the tuples-per-page estimate might be pretty bogus, but
> it couldn't be off by more than an order of magnitude or so either way.
> And in any case we'd have a guaranteed up-to-date number of blocks.
>
> The objections that could be raised to this are (AFAICS) two:
> [snip]
> 2. Instability of plans.  Right now, the planner will not change plans
>    underneath you --- you have to issue an explicit VACUUM or ANALYZE
>    to change the terms of discussion.  That would stop being true if
>    physical file size were always taken into account.  Maybe this is a
>    problem, or maybe it isn't ... as someone who likes to be able to
>    debug planner behavior without actually creating umpteen-gig test
>    tables, my world view may be a bit skewed ...
>
> It's certainly doable if we decide the pluses outweigh the minuses.
> Thoughts?

My first reaction is to wonder if this would give performance exactly
equal to running a true ANALYZE in every situation?  If not, then you
would end up with an automated pseudo-ANALYZE (performance-wise).

In my opinion, it is almost a feature that non-ANALYZE-d tables give
such horrendous performance, it kicks you in the butt to do some
thinking about when to correctly deal with ANALYZEing.

So, in short, I think it is a huge win if we could have automatic
ANALYZE with true ANALYZE performance, but a huge loss if the automatic
ANALYZE performance is not exactly as good as a true ANALYZE.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


Re: performance of IN (subquery)

From
Kevin Murphy
Date:
Thanks all for the reminders about analyzing, and I apologize for wasting
everyone's time.  The main table did indeed need to be analyzed (I had
truncated it and repopulated it with "insert ... select" but forgotten to
analyze).  The other tables are very small temporary tables, and I assumed,
whether correctly or not, that analyzing would not be helpful for them.

All this is happening in the context of an algorithm in a PL/PGSQL function.
The three temporary tables are reused thousands of times.  I wasn't sure if
it would be better to truncate them between uses to keep them small or just
allow them to grow.  Unfortunately for the Tree of Knowledge, performance is
now more than adequate, so I may not do this experiment.

Thanks,
Kevin Murphy

Re: performance of IN (subquery)

From
Paul Tillotson
Date:
>2. Instability of plans.  Right now, the planner will not change plans
>   underneath you --- you have to issue an explicit VACUUM or ANALYZE
>   to change the terms of discussion.  That would stop being true if
>   physical file size were always taken into account.  Maybe this is a
>   problem, or maybe it isn't ... as someone who likes to be able to
>   debug planner behavior without actually creating umpteen-gig test
>   tables, my world view may be a bit skewed ...
>
>It's certainly doable if we decide the pluses outweigh the minuses.
>Thoughts?
>
>
>
Even though Ijust proposed it, now I think that the idea of having
postgres automatically gather statistics only when an estimate is
grossly wrong is not a good one. This will just obscure the need to run
analyze, and many people will then always be using moderately bad
statistics without realizing it.

(I think I used postgres in a production environment for about six
months without even knowing about the ANALYZE command--with small data
sets it's not obvious--and it was still blazingly fast compared to foxpro.)

If pgautovacuum does ANALYZE and we can get most people to use
pgautovacuum, I think this problem will go away.

Another possibility is simply to make postgres generate a warning, saying:

WARNING: statistics for table xxxxxx are off by a factor of y. You
should ANALYZE this table.

Paul

Re: performance of IN (subquery)

From
Tom Lane
Date:
Jon Lapham <lapham@jandr.org> writes:
> Tom Lane wrote:
>> I've thought about this before.  One simple trick would be to get rid
>> of the current pg_class reltuples/relpages fields in favor of a
>> tuples-per-page estimate, which could be multiplied by
>> RelationGetNumberOfBlocks() during planning.

> My first reaction is to wonder if this would give performance exactly
> equal to running a true ANALYZE in every situation?

No, this is quite orthogonal to ANALYZE (and several orders of magnitude
cheaper...)  It would fix a lot of the simpler cases, though, since the
first-order effects like table size would be right.

            regards, tom lane

Re: performance of IN (subquery)

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I'm not about to run analyze in the middle of the data generation
> (which wouldn't work anyways since it's in a transaction).

Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
The cached-plan business is a problem, I agree, but I think it's
orthogonal to this particular discussion (and you can always use EXECUTE
if you have to).

>> In the absence of any ANALYZE data the tuples-per-page estimate might be
>> pretty bogus, but it couldn't be off by more than an order of magnitude or
>> so either way.

> I don't see why it couldn't. If you have a table badly in need of vacuuming
> (or had one at the time of the last analyze) it could be off by way more than
> an order of magnitude.

Well, I was actually thinking of the physical tuples-per-page stat
(perhaps better expressed as an average tuple size), but you are right
that the fraction of dead tuples is also something to think about.
We don't model that explicitly ATM but maybe we should.  The original
VACUUM-based stats code couldn't really do much with it, since VACUUM
would leave no dead tuples behind in the first place; but separate
ANALYZE could definitely make an estimate of the fraction of dead tuples.

> Ideally I would want a guarantee that every query would *always*
> result in the same plan. Once I've tested them and approved the plans
> I want to know that only those approved plans will ever run, and I
> want to be present and be able to verify new plans before they go into
> production.

> I doubt I'm going to convince anyone today,

Nope, you aren't.  The above seems to me to be a recipe for degradation
of performance over time, precisely because the plans wouldn't change in
the face of changes in the situation.  I've resisted adding "planner
hints" to the language for this reason, and I'm certainly not eager to
offer any hard guarantees about plans not changing.

            regards, tom lane

Re: performance of IN (subquery)

From
Bruce Momjian
Date:
Is there anything for the TODO here?

---------------------------------------------------------------------------

Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > I'm not about to run analyze in the middle of the data generation
> > (which wouldn't work anyways since it's in a transaction).
>
> Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
> The cached-plan business is a problem, I agree, but I think it's
> orthogonal to this particular discussion (and you can always use EXECUTE
> if you have to).
>
> >> In the absence of any ANALYZE data the tuples-per-page estimate might be
> >> pretty bogus, but it couldn't be off by more than an order of magnitude or
> >> so either way.
>
> > I don't see why it couldn't. If you have a table badly in need of vacuuming
> > (or had one at the time of the last analyze) it could be off by way more than
> > an order of magnitude.
>
> Well, I was actually thinking of the physical tuples-per-page stat
> (perhaps better expressed as an average tuple size), but you are right
> that the fraction of dead tuples is also something to think about.
> We don't model that explicitly ATM but maybe we should.  The original
> VACUUM-based stats code couldn't really do much with it, since VACUUM
> would leave no dead tuples behind in the first place; but separate
> ANALYZE could definitely make an estimate of the fraction of dead tuples.
>
> > Ideally I would want a guarantee that every query would *always*
> > result in the same plan. Once I've tested them and approved the plans
> > I want to know that only those approved plans will ever run, and I
> > want to be present and be able to verify new plans before they go into
> > production.
>
> > I doubt I'm going to convince anyone today,
>
> Nope, you aren't.  The above seems to me to be a recipe for degradation
> of performance over time, precisely because the plans wouldn't change in
> the face of changes in the situation.  I've resisted adding "planner
> hints" to the language for this reason, and I'm certainly not eager to
> offer any hard guarantees about plans not changing.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: performance of IN (subquery)

From
Joseph Shraibman
Date:

Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
>
>>I'm not about to run analyze in the middle of the data generation
>>(which wouldn't work anyways since it's in a transaction).
>
>
> Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
> The cached-plan business is a problem, I agree, but I think it's
> orthogonal to this particular discussion (and you can always use EXECUTE
> if you have to).
>

  How does EXECUTE solve the cached-plan business?

Re: performance of IN (subquery)

From
Markus Bertheau
Date:
On Fri, 27 Aug 2004 11:09:26 -0400, Joseph Shraibman
<jks@selectacast.net> wrote:

>   How does EXECUTE solve the cached-plan business?

It re-plans the query at every run.

--
Markus Bertheau <mbertheau@gmail.com>

Re: performance of IN (subquery)

From
Duane Lee - EGOVX
Date:

Have you thought about using existence checking:  WHERE EXISTS (SELECT '1' FROM FOO2 WHERE BAZ = BAZ2)

If the index exists on BAZ2 you might get away with a quick index only check.

Duane

-----Original Message-----
From: Kevin Murphy [mailto:murphy@genome.chop.edu]
Sent: Thursday, August 26, 2004 3:24 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] performance of IN (subquery)

I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from
bar where baz in (subquery)', or updates like 'update bar set foo = 2
where baz in (subquery)'.  PG always seems to want to do a sequential
scan of the bar table.  I wish there were a way of telling PG, "use the
index on baz in your plan, because I know that the subquery will return
very few results".   Where it really matters, I have been constructing
dynamic queries by looping over the values for baz and building a
separate query for each one and combining with a UNION (or just
directly updating, in the update case).  Depending on the size of the
bar table, I can get speedups of hundreds or even more than a thousand
times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
       7644
       7644
(2 rows)
Time: 518.242 ms

The subquery is fast:

SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1;
------------
       41209
       25047
(2 rows)
Time: 3.268 ms

And using indexes on the main table is fast:

select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
       7644
       7644
(2 rows)
Time: 2.468 ms

The plan for the slow query:

egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#            
                                                  QUERY PLAN            
                         \

------------------------------------------------------------------------
-------------------------------------------------------------
  Hash Join  (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
    Hash Cond: ("outer".element_id = "inner".elementid)
    ->  HashAggregate  (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
          ->  Hash Join  (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
                Hash Cond: ("outer".superloc_id = "inner".superloc_id)
                ->  Seq Scan on superlocs_2  (cost=0.00..20.00 rows=1000
width=8) (actual time=0.004..0.012 rows=9 loops=1)
                ->  Hash  (cost=22.50..22.50 rows=5 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
                      ->  Seq Scan on bundle_superlocs_2 
(cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2
loops=1)
                            Filter: (protobundle_id = 1)
    ->  Hash  (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
          ->  Seq Scan on elements  (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.172..405.243 rows=185535 loops=1)
  Total runtime: 593.843 ms
(12 rows)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Re: performance of IN (subquery)

From
Joseph Shraibman
Date:
According to the docs is specifically doesn't.

http://www.postgresql.org/docs/7.4/static/sql-prepare.html

When the PREPARE statement is executed, the specified statement is
parsed, rewritten, and planned. When an EXECUTE command is subsequently
issued, the prepared statement need only be executed. Thus, the parsing,
rewriting, and planning stages are only performed once, instead of every
time the statement is executed.



Markus Bertheau wrote:
> On Fri, 27 Aug 2004 11:09:26 -0400, Joseph Shraibman
> <jks@selectacast.net> wrote:
>
>
>>  How does EXECUTE solve the cached-plan business?
>
>
> It re-plans the query at every run.
>

Re: performance of IN (subquery)

From
Tom Lane
Date:
Joseph Shraibman <jks@selectacast.net> writes:
> Markus Bertheau wrote:
>> It re-plans the query at every run.

> According to the docs is specifically doesn't.

You're confusing SQL EXECUTE statement with the quite different plpgsql
EXECUTE statement ...

            regards, tom lane

Re: performance of IN (subquery)

From
"Matthew T. O'Connor"
Date:
Paul Tillotson wrote:
> If pgautovacuum does ANALYZE and we can get most people to use
> pgautovacuum, I think this problem will go away.

It does.  We won't get most people to use it for 8.0 since it is still a
contrib project, but hopefully it will be built-in for 8.1.


Re: performance of IN (subquery)

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > I'm not about to run analyze in the middle of the data generation
> > (which wouldn't work anyways since it's in a transaction).
>
> Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
> The cached-plan business is a problem, I agree, but I think it's
> orthogonal to this particular discussion (and you can always use EXECUTE
> if you have to).

It's orthogonal. My point was that I have a bigger problem, but even if I
address it by switching away from plpgsql, or I guess by using EXECUTE, I
would still have a problem. I didn't realize you could run analyze in a
transaction, but even being able to I wouldn't really want to have to do that
repeatedly during the job.

This new approach would actually complete the fix, a perl or plpgsql EXECUTE
implementation would gradually shift statistics during the job.

Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.

> but separate ANALYZE could definitely make an estimate of the fraction of
> dead tuples.

With analyze in a transaction I'm not clear what the semantics should be
though. I suppose it should only count tuples visible to the transaction
analyze?

> Nope, you aren't.  The above seems to me to be a recipe for degradation
> of performance over time, precisely because the plans wouldn't change in
> the face of changes in the situation.

A gradual degradation is ok. A gradual degradation means I can schedule a
nightly analyze and report on any changed plans and either automatically
accept them or manually approve them individually.

A sudden degradation is much more dangerous. Even if it's rare, a sudden
degradation means an outage in prime time.

As I said, it doesn't matter to me if every query is 10% slower than possible,
as long as no query takes 1000% as long as necessary even if it's a 1 in 1000
occurrence.

> I've resisted adding "planner hints" to the language for this reason, and
> I'm certainly not eager to offer any hard guarantees about plans not
> changing.

I just want to control _when_ they change. Eventually you'll come around. I
think it'll be a slow gradual change in thinking as the user-base changes
though. Not something I'll change with a single argument in one day.

--
greg

Re: performance of IN (subquery)

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> It's orthogonal. My point was that I have a bigger problem, but even if I
> address it by switching away from plpgsql, or I guess by using EXECUTE, I
> would still have a problem. I didn't realize you could run analyze in a
> transaction, but even being able to I wouldn't really want to have to do that
> repeatedly during the job.

Why not?  Given the sampling behavior that's been in there for a release
or two, ANALYZE is pretty cheap on large tables; certainly much cheaper
than any processing you might be doing that's going to grovel over the
whole table.

> Except that the first thing the job does is delete all the old records. This
> is inside a transaction. So an estimate based on the heap size would be off by
> a factor of two by the time the job is done.

Could you use TRUNCATE?  I dunno if locking the table is okay for you.
It is transaction safe though.

> With analyze in a transaction I'm not clear what the semantics should be
> though. I suppose it should only count tuples visible to the transaction
> analyze?

It currently uses SnapshotNow, so would see committed tuples of other
transactions plus uncommitted ones of the present transaction.  This is
not exactly the same thing as the transaction's snapshot, but close.

> A sudden degradation is much more dangerous. Even if it's rare, a sudden
> degradation means an outage in prime time.

[ shrug ]  You can get a sudden degradation with fixed plans, too.  All
it takes is an addition of a lot of rows in some table that had been
small.

            regards, tom lane

Re: performance of IN (subquery)

From
Gaetano Mendola
Date:
Greg Stark wrote:

> Ideally I would want a guarantee that every query would *always* result in the
> same plan. Once I've tested them and approved the plans I want to know that
> only those approved plans will ever run, and I want to be present and be able
> to verify new plans before they go into production.

What you are saying is "never run an ANALYZE" or if you do it you have to re-test
all your plans. "*always*" the same plan is a non sense because the plan depends on the
data distribution, do you test your plans for each given histogram slice ?

> I doubt I'm going to convince anyone today...

For sure not me.


Regards
Gaetano Mendola






Re: performance of IN (subquery)

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > Except that the first thing the job does is delete all the old records. This
> > is inside a transaction. So an estimate based on the heap size would be off by
> > a factor of two by the time the job is done.
>
> Could you use TRUNCATE?  I dunno if locking the table is okay for you.
> It is transaction safe though.

Well, if necessary I could, but if I can do it without downtime all the
better. In any case I think I'll be ok with a factor of 2 misestimation. I was
just giving an example use case for you to chew on when analyzing this new
proposal.

I'm not sure where I stand with the idea. I like the idea that table sizes
would always be fairly reasonable even without statistics. But I also have a
really strong desire for plan stability.

> [ shrug ]  You can get a sudden degradation with fixed plans, too.  All
> it takes is an addition of a lot of rows in some table that had been
> small.

Well, presumably I should be aware if my data distribution is changing
drastically. That's under my control. At least the performance change will be
proportionate to the distribution change.

With plans changing on the fly I could have a query that degrades 1% for every
row added and then suddenly becomes 10x slower when I add a 17th extra row. Of
course such a system isn't perfectly tuned, or the optimizer issue should be
found and fixed. But I would rather find out about it without having my
application fail.

--
greg