Re: question about executing JOINs - Mailing list pgsql-general

From Tom Lane
Subject Re: question about executing JOINs
Date
Msg-id 24975.1034469048@sss.pgh.pa.us
Whole thread Raw
In response to question about executing JOINs  (Josh Burdick <jburdick@gradient.cis.upenn.edu>)
List pgsql-general
Josh Burdick <jburdick@gradient.cis.upenn.edu> writes:
> Nested Loop  (cost=0.00..198183643.06 rows=85230539 width=51)
>   ->  Seq Scan on ucsc_ref_seq_ali_hg12  (cost=0.00..817.65 rows=15165
> width=29)
>   ->  Index Scan using ucsc_snp_tsc_hg12_chrom_start on
> ucsc_snp_tsc_hg12  (cost=0.00..12962.39 rows=4713 width=22)

What's the actual numbers of rows involved?  (EXPLAIN ANALYZE output
would be far more useful than plain EXPLAIN.)

Do you have a feeling for the number of rows that would be produced by
just the JOIN/ON condition (no constraint on snp_start)?  How does that
compare to EXPLAIN's estimate of that number of rows?

>     The planner is assuming that it's a cross join: that we need all
> pairs of records.  It's not taking into account the WHERE clause which
> restricts to a tiny fraction of the records.

No it isn't, and yes it is, but it evidently is making a bad estimate of
the fraction of rows eliminated by those clauses.  I'd like to find out
just what its estimate of that fraction is and what the correct value
would be.

> Perhaps the planner should assume that a nested loop over an index
> scan only looks at 1% of its records?

Arbitrary assumptions designed to fix one example tend to break other
examples ...

            regards, tom lane

pgsql-general by date:

Previous
From: "Robert Fitzpatrick"
Date:
Subject: Re: Removing {"="} privledges
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Removing {"="} privledges