Thread: Optimizing a huge_table/tiny_table join

Optimizing a huge_table/tiny_table join

From
Date:



I want to optimize this simple join:

SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) = UPPER( t.id )

huge_table has about 2.5 million records, can be assumed as fixed, and
has the following index:

CREATE INDEX huge_table_index ON huge_table( UPPER( id ) );

...while tiny_table changes with each user request, and typically will
contain on the order of 100-1000 records.  For this analysis, I put
300 records in tiny_table, resulting in 505 records in the join.

I tried several approaches.  In order of increasing speed of
execution:

1. executed as shown above, with enable_seqscan on: about 100 s.

2. executed as shown above, with enable_seqscan off: about 10 s.

3. executed with a LIMIT 6000 clause added to the SELECT statement, and
   enable_seqscan on: about 5 s.

4. executed with a LIMIT 600 clause added to the SELECT statement, and
   enable_seqscan on: less than 1 s.



Clearly, using LIMIT is the way to go.  Unfortunately I *do* want all
the records that would have been produced without the LIMIT clause,
and I don't have a formula for the limit that will guarantee this.  I
could use a very large value (e.g. 20x the size of tiny_table, as in
approach 3 above) which would make the probability of hitting the
limit very small, but unfortunately, the query plan in this case is
different from the query plan when the limit is just above the
expected number of results (approach 4 above).

The query plan for the fastest approach is this:

                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Limit  (cost=0.01..2338.75 rows=600 width=84)
   ->  Nested Loop  (cost=0.01..14766453.89 rows=3788315 width=84)
         ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)
         ->  Index Scan using huge_table_index on huge_table h  (cost=0.01..48871.80 rows=12628 width=46)
               Index Cond: (upper(("outer".id)::text) = upper((h.id)::text))



How can I *force* this query plan even with a higher limit value?

I found, by dumb trial and error, that in this case the switch happens
at LIMIT 5432, which, FWIW, is about 0.2% of the size of huge_table.
Is there a simpler way to determine this limit (hopefully
programmatically)?


Alternatively, I could compute the value for LIMIT as 2x the number of
records in tiny_table, and if the number of records found is *exactly*
this number, I would know that (most likely) some records were left
out.  In this case, I could use the fact that, according to the query
plan above, the scan of tiny_table is sequential to infer which
records in tiny_table were disregarded when the limit was reached, and
then repeat the query with only these left over records in tiny_table.

What's your opinion of this strategy?  Is there a good way to improve
it?

Many thanks in advance!

kj

PS:  FWIW, the query plan for the query with LIMIT 6000 is this:

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=19676.75..21327.99 rows=6000 width=84)
   ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
         Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
         ->  Seq Scan on huge_table h  (cost=0.00..51292.43 rows=2525543 width=46)
         ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
               ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)

------------=_1148485808-20617-3--


Re: Optimizing a huge_table/tiny_table join

From
"Joshua D. Drake"
Date:
> kj
>
> PS:  FWIW, the query plan for the query with LIMIT 6000 is this:

What is the explain analyze?

>
>                                      QUERY PLAN
> -------------------------------------------------------------------------------------
>  Limit  (cost=19676.75..21327.99 rows=6000 width=84)
>    ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
>          Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
>          ->  Seq Scan on huge_table h  (cost=0.00..51292.43 rows=2525543 width=46)
>          ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
>                ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)
>
> ------------=_1148485808-20617-3--
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>


--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: Optimizing a huge_table/tiny_table join

From
Tom Lane
Date:
<kynn@panix.com> writes:
>  Limit  (cost=19676.75..21327.99 rows=6000 width=84)
>    ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
>          Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
>          ->  Seq Scan on huge_table h  (cost=0.00..51292.43 rows=2525543 width=46)
>          ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
>                ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)

Um, if huge_table is so much bigger than tiny_table, why are the cost
estimates for seqscanning them only about 2.5x different?  There's
something wacko about your statistics, methinks.

            regards, tom lane

Re: Optimizing a huge_table/tiny_table join

From
Date:
On 5/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> <kynn@panix.com> writes:
> >  Limit  (cost=19676.75..21327.99 rows=6000 width=84)
> >    ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
> >          Hash Cond: (upper(("outer".id)::text) upper(("inner".id)::text))
> >          ->  Seq Scan on huge_table h  (cost= 0.00..51292.43 rows=2525543 width=46)
> >          ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
> >                ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)
>
> Um, if huge_table is so much bigger than tiny_table, why are the cost
> estimates for seqscanning them only about 2.5x different?  There's
> something wacko about your statistics, methinks.



Well, they're not my statistics; they're explain's.  You mean there's
a bug in explain?  I agree that it makes no sense that the costs don't
differ as much as one would expect, but you can see right there the
numbers of rows for the two tables.  At any rate, how would one go
about finding an explanation for these strange stats?

More bewildering still (and infuriating as hell--because it means that
all of my work for yesterday has been wasted) is that I can no longer
reproduce the best query plan I posted earlier, even though the tables
have not changed at all.  (Hence I can't post the explain analyze for
the best query plan, which Josh Drake asked for.)  No matter what
value I use for LIMIT, the query planner now insists on sequentially
scanning huge_table and ignoring the available index.  (If I turn off
enable_seqscan, I get the second worst query plan I posted yesterday.)

Anyway, I take it that there is no way to bypass the optimizer and
instruct PostgreSQL exactly how one wants the search performed?

Thanks!

kj

Re: Optimizing a huge_table/tiny_table join

From
Andrew Sullivan
Date:
On Thu, May 25, 2006 at 12:31:04PM -0400, kynn@panix.com wrote:
> Well, they're not my statistics; they're explain's.  You mean there's

Explain doesn't get them from nowhere.  How often is the table being
ANALYSEd?

> More bewildering still (and infuriating as hell--because it means that
> all of my work for yesterday has been wasted) is that I can no longer
> reproduce the best query plan I posted earlier, even though the tables
> have not changed at all.  (Hence I can't post the explain analyze for

I find that very hard to believe.  Didn't change _at all_?  Are you
sure no VACUUMs or anything are happening automatically?

> Anyway, I take it that there is no way to bypass the optimizer and
> instruct PostgreSQL exactly how one wants the search performed?

No, there isn't.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
        --Philip Greenspun

Re: Optimizing a huge_table/tiny_table join

From
"Dawid Kuroczko"
Date:
On 5/25/06, kynn@panix.com <kynn@panix.com> wrote:
> Well, they're not my statistics; they're explain's.  You mean there's
> a bug in explain?  I agree that it makes no sense that the costs don't
> differ as much as one would expect, but you can see right there the
> numbers of rows for the two tables.  At any rate, how would one go
> about finding an explanation for these strange stats?

Well, the query planner uses statistics to deduce the best plan
possible.  Explain includes this statistical data in its output.
See:
http://www.postgresql.org/docs/8.1/interactive/planner-stats.html
...for information about what it is all about.

The idea is that your statistics are probably not detailed enough
to help the planner.  See ALTER TABLE SET STATISTICS to change
that.

> More bewildering still (and infuriating as hell--because it means that
> all of my work for yesterday has been wasted) is that I can no longer
> reproduce the best query plan I posted earlier, even though the tables
> have not changed at all.  (Hence I can't post the explain analyze for
> the best query plan, which Josh Drake asked for.)  No matter what
> value I use for LIMIT, the query planner now insists on sequentially
> scanning huge_table and ignoring the available index.  (If I turn off
> enable_seqscan, I get the second worst query plan I posted yesterday.)
>
> Anyway, I take it that there is no way to bypass the optimizer and
> instruct PostgreSQL exactly how one wants the search performed?

There is no way to bypass.  But there are many ways to tune it.



Hmm, there is a probability (though statistics are more probable
go) that you're using some older version of PostgreSQL, and you're
hitting same problem as I did:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php

Tom has provided back then a patch, which fixed it:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php

...but I don't remember when it made into release.

   Regfa

Re: Optimizing a huge_table/tiny_table join

From
Jim Nasby
Date:
On May 25, 2006, at 12:07 PM, Dawid Kuroczko wrote:
> On 5/25/06, kynn@panix.com <kynn@panix.com> wrote:
>> Well, they're not my statistics; they're explain's.  You mean there's
>> a bug in explain?  I agree that it makes no sense that the costs
>> don't
>> differ as much as one would expect, but you can see right there the
>> numbers of rows for the two tables.  At any rate, how would one go
>> about finding an explanation for these strange stats?
>
> Well, the query planner uses statistics to deduce the best plan
> possible.  Explain includes this statistical data in its output.
> See:
> http://www.postgresql.org/docs/8.1/interactive/planner-stats.html
> ...for information about what it is all about.
>
> The idea is that your statistics are probably not detailed enough
> to help the planner.  See ALTER TABLE SET STATISTICS to change
> that.

http://www.pervasive-postgres.com/lp/newsletters/2006/
Insights_postgres_Mar.asp#4 might also be worth your time to read.

> Hmm, there is a probability (though statistics are more probable
> go) that you're using some older version of PostgreSQL, and you're
> hitting same problem as I did:
>
> http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php
>
> Tom has provided back then a patch, which fixed it:
>
> http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php
>
> ...but I don't remember when it made into release.

According to cvs, it's been in since 8.1 and 8.0.4:

Revision 1.111.4.2: download - view: text, markup, annotated - select
for diffs
Fri Jul 22 19:12:33 2005 UTC (10 months ago) by tgl
Branches: REL8_0_STABLE
CVS tags: REL8_0_8, REL8_0_7, REL8_0_6, REL8_0_5, REL8_0_4
Diff to: previous 1.111.4.1: preferred, colored; branchpoint 1.111:
preferred, colored; next MAIN 1.112: preferred, colored
Changes since revision 1.111.4.1: +18 -37 lines

Fix compare_fuzzy_path_costs() to behave a bit more sanely.  The
original
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost
becomes
the name of the game.  Instead, compare startup and total costs fuzzily
but independently.  This changes the plan selected for two queries in
the
regression tests; adjust expected-output files for resulting changes in
row order.  Per reports from Dawid Kuroczko and Sam Mason.

Revision 1.124: download - view: text, markup, annotated - select for
diffs
Fri Jul 22 19:12:01 2005 UTC (10 months ago) by tgl
Branches: MAIN
CVS tags: REL8_1_0BETA3, REL8_1_0BETA2, REL8_1_0BETA1
Diff to: previous 1.123: preferred, colored
Changes since revision 1.123: +18 -37 lines

Fix compare_fuzzy_path_costs() to behave a bit more sanely.  The
original
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost
becomes
the name of the game.  Instead, compare startup and total costs fuzzily
but independently.  This changes the plan selected for two queries in
the
regression tests; adjust expected-output files for resulting changes in
row order.  Per reports from Dawid Kuroczko and Sam Mason.

--
Jim C. Nasby, Database Architect                decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"



Re: Optimizing a huge_table/tiny_table join

From
Mark Kirkwood
Date:
Tom Lane wrote:
> <kynn@panix.com> writes:
>>  Limit  (cost=19676.75..21327.99 rows=6000 width=84)
>>    ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
>>          Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
>>          ->  Seq Scan on huge_table h  (cost=0.00..51292.43 rows=2525543 width=46)
>>          ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
>>                ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)
>
> Um, if huge_table is so much bigger than tiny_table, why are the cost
> estimates for seqscanning them only about 2.5x different?  There's
> something wacko about your statistics, methinks.
>

This suggests that tiny_table is very wide (i.e a lot of columns
compared to huge_table), or else has thousands of dead tuples.

Do you want to post the descriptions for these tables?

If you are running 8.1.x, then the output of 'ANALYZE VERBOSE
tiny_table' is of interest too.

If you are running a pre-8.1 release, then lets see 'VACUUM VERBOSE
tiny_table'.

Note that after either of these, your plans may be altered (as ANALYZE
will recompute your stats for tiny_table, and VACUUM may truncate pages
full of dead tuples at the end of it)!


Re: Optimizing a huge_table/tiny_table join

From
"Kynn Jones"
Date:


On 5/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
<kynn@panix.com> writes:
>  Limit  (cost=19676.75..21327.99 rows=6000 width=84)
>    ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
>          Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
>          ->  Seq Scan on huge_table h  (cost= 0.00..51292.43 rows=2525543 width=46)
>          ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
>                ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)

Um, if huge_table is so much bigger than tiny_table, why are the cost
estimates for seqscanning them only about 2.5x different?  There's
something wacko about your statistics, methinks.
 
 
You mean there's a bug in explain?  I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables, which are exactly as I described.  At any rate, how would one go about finding an explanation for these strange stats?
 
More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan, even though the tables have not changed at all.  (Hence I can't post the explain analyze for the best query plan.)  No matter what value I use for LIMIT, the query planner now insists on sequentially scanning huge_table and ignoring the available index.  (If I turn off enable_seqscan, I get the second worst query plan I posted yesterday.)
 
Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed?
 
Thanks!
 
kj