Re: Inner join question - Mailing list pgsql-general

From Tom Lane
Subject Re: Inner join question
Date
Msg-id 23454.1077229582@sss.pgh.pa.us
Whole thread Raw
In response to Re: Inner join question  (Randall Skelton <skelton@brutus.uwaterloo.ca>)
List pgsql-general
Randall Skelton <skelton@brutus.uwaterloo.ca> writes:
> Nevertheless, it still takes longer than I would like.  As requested:

> Merge Join  (cost=517417.89..2795472.80 rows=177664640 width=32)
> (actual time=64878.04..64936.41 rows=142 loops=1)
>    ->  Index Scan using cal_quat_1_timestamp on cal_quat_1 tq1
> (cost=0.00..50549.03 rows=13329 width=16) (actual time=73.29..129.66
> rows=142 loops=1)
>    ->  Sort  (cost=517417.89..517417.89 rows=2665818 width=16) (actual
> time=62310.53..63727.33 rows=1020155 loops=1)
>          ->  Seq Scan on cal_quat_2 tq2  (cost=0.00..43638.18
> rows=2665818 width=16) (actual time=14.12..13462.19 rows=2665818
> loops=1)
> Total runtime: 65424.79 msec

I think the problem is the gross misestimation of the number of rows
involved --- first within the timestamp interval (13329 vs actual 142)
and then for the join result (177664640 is just silly).  With more
accurate estimates you would probably have gotten the double indexscan
plan that you really want.

The estimates look remarkably default-ish, however --- if I'm doing the
math correctly, the selectivity is being estimated as 0.005 at each
step, which just happens to be the default estimate in the absence of
any statistics.  Have you ANALYZEd these tables lately?  If you have,
try increasing the statistics target for the timestamp rows (see ALTER
TABLE) and analyze again.

            regards, tom lane

pgsql-general by date:

Previous
From: "Roopali Sharma"
Date:
Subject: pg_restore problem
Next
From: Tom Lane
Date:
Subject: Re: pg_restore problem