Thread: question about executing JOINs

question about executing JOINs

From
Josh Burdick
Date:
      We're using Postgres 7.2.1 in a biology lab.  Some of our joins
involving selecting from two tables, but the fields we're selecting
aren't equal, they're just nearby.

select "tName" as "chrom",
    [...stuff deleted...]
       "tStart" - "snp_start" as "distance_from_start",
       "snp_start" - "tEnd" as "distance_from_end"
from   ucsc_ref_seq_ali_hg12 join ucsc_snp_tsc_hg12
       on ucsc_snp_tsc_hg12."snp_chrom" = ucsc_ref_seq_ali_hg12."tName"
where "snp_start" >= "tStart" - 1000000
      and "snp_start" <= "tEnd" + 1000000;

    The problem is that the planner comes up with:

Merge Join  (cost=201234.34..17463319.54 rows=85230539 width=51)
  ->  Sort  (cost=1870.74..1870.74 rows=15165 width=29)
        ->  Seq Scan on ucsc_ref_seq_ali_hg12  (cost=0.00..817.65
rows=15165 width=29)
  ->  Sort  (cost=199363.59..199363.59 rows=1145280 width=22)
        ->  Seq Scan on ucsc_snp_tsc_hg12  (cost=0.00..20996.80
rows=1145280 width=22)

    which doesn't finish after 45 minutes, and presumably would take a
while to finish.  However, by temporarily doing

set enable_mergejoin=off;
set enable_hashjoin=off;

    the planner says

Nested Loop  (cost=0.00..198183643.06 rows=85230539 width=51)
  ->  Seq Scan on ucsc_ref_seq_ali_hg12  (cost=0.00..817.65 rows=15165
width=29)
  ->  Index Scan using ucsc_snp_tsc_hg12_chrom_start on
ucsc_snp_tsc_hg12  (cost=0.00..12962.39 rows=4713 width=22)

    which takes four minutes.
    So, in practice, we just save the output of that in a table (using
CREATE TABLE AS), which works.  It's just a bit awkward to have to
fiddle with those switches; it would be nice if the planner were a bit
cleverer in this case.
    Is there a runtime parameter that seems likely to fix this?
 (perhaps CPU_INDEX_TUPLE_COST).
    The planner is assuming that it's a cross join: that we need all
pairs of records.  It's not taking into account the WHERE clause which
restricts to a tiny fraction of the records.  Perhaps the planner should
assume that a nested loop over an index scan only looks at 1% of its
records?
    This would make cross joins more expensive.  But most of my cross
joins have been accidental, when I've left out a join condition, so I
don't really mind them taking longer, because I'll just stop them after
20 minutes anyway :)
    Note that this isn't a question of join order (I think); it's just a
question of which way to execute the join.
    Thanks,
    Josh

--
Josh Burdick
jburdick@gradient.cis.upenn.edu
http://www.cis.upenn.edu/~jburdick



Re: question about executing JOINs

From
Tom Lane
Date:
Josh Burdick <jburdick@gradient.cis.upenn.edu> writes:
> Nested Loop  (cost=0.00..198183643.06 rows=85230539 width=51)
>   ->  Seq Scan on ucsc_ref_seq_ali_hg12  (cost=0.00..817.65 rows=15165
> width=29)
>   ->  Index Scan using ucsc_snp_tsc_hg12_chrom_start on
> ucsc_snp_tsc_hg12  (cost=0.00..12962.39 rows=4713 width=22)

What's the actual numbers of rows involved?  (EXPLAIN ANALYZE output
would be far more useful than plain EXPLAIN.)

Do you have a feeling for the number of rows that would be produced by
just the JOIN/ON condition (no constraint on snp_start)?  How does that
compare to EXPLAIN's estimate of that number of rows?

>     The planner is assuming that it's a cross join: that we need all
> pairs of records.  It's not taking into account the WHERE clause which
> restricts to a tiny fraction of the records.

No it isn't, and yes it is, but it evidently is making a bad estimate of
the fraction of rows eliminated by those clauses.  I'd like to find out
just what its estimate of that fraction is and what the correct value
would be.

> Perhaps the planner should assume that a nested loop over an index
> scan only looks at 1% of its records?

Arbitrary assumptions designed to fix one example tend to break other
examples ...

            regards, tom lane

Re: question about executing JOINs

From
Josh Burdick
Date:
Tom Lane wrote:

>Josh Burdick <jburdick@gradient.cis.upenn.edu> writes:
>
>
>>Nested Loop  (cost=0.00..198183643.06 rows=85230539 width=51)
>>  ->  Seq Scan on ucsc_ref_seq_ali_hg12  (cost=0.00..817.65 rows=15165
>>width=29)
>>  ->  Index Scan using ucsc_snp_tsc_hg12_chrom_start on
>>ucsc_snp_tsc_hg12  (cost=0.00..12962.39 rows=4713 width=22)
>>
>>
>
>What's the actual numbers of rows involved?  (EXPLAIN ANALYZE output
>would be far more useful than plain EXPLAIN.)
>
>
    The two tables have 15485 and 1095497 rows, respectively.
    The join returns 11135953 records, which is humongous, but is still
only 0.066% of the number of pairs of rows.  (Which is why I stopped
EXPLAIN ANALYZE after 45 minutes.)

>Do you have a feeling for the number of rows that would be produced by
>just the JOIN/ON condition (no constraint on snp_start)?  How does that
>compare to EXPLAIN's estimate of that number of rows?
>
>
    The JOIN/ON condition is on a column with 24 possible values.  I'm
not quite sure how close the planner would be on that case, but it would
be a lot closer.

>
>
>>    The planner is assuming that it's a cross join: that we need all
>>pairs of records.  It's not taking into account the WHERE clause which
>>restricts to a tiny fraction of the records.
>>
>>
>
>No it isn't, and yes it is, but it evidently is making a bad estimate of
>the fraction of rows eliminated by those clauses.  I'd like to find out
>just what its estimate of that fraction is and what the correct value
>would be.
>
>
    The planner's original estimates of how many rows are needed from
each table (15165 and 1145280) are quite close; pretty much all rows are
needed from each table.  It estimates that the MERGE JOIN will return
85230539 rows, and 85230539 / (15165 * 1145280) = 0.49%.  (As opposed to
0.066%.)

    For what its worth, if you just multiply the number of rows together
for the original query,  you get 17368171200, while for the version
using the index scan, you get 71472645, which is a lot less, and within
a factor of 10 of the actual number of rows returned.  So perhaps that
number could be factored into the cost estimate better somehow.

>
>
>>Perhaps the planner should assume that a nested loop over an index
>>scan only looks at 1% of its records?
>>
>>
>
>Arbitrary assumptions designed to fix one example tend to break other
>examples ...
>
>
    Fair enough.  I think the current hack is good enough -- if you
really need to override the planner, you can.  And the eventual fix
should be to make the planner smarter.

>            regards, tom lane
>
>
>
>
    Hope this helps,
    Josh

--
Josh Burdick
jburdick@gradient.cis.upenn.edu
http://www.cis.upenn.edu/~jburdick