Thread: Nested loop join and date range query

Nested loop join and date range query

From

"Ian Burrell"

Date:

02 May 2006, 22:56:13

We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are
having performance problems and running for very long times.  The
commonality seems to be PostgreSQL 8.1 is choosing to use a nested
loop join because it estimates there will be only be a single row.
There are really thousands of rows and the nested loop version takes
much longer.

Even with the bad plan, the test query runs quickly.  The real query
is much more complicated and we have had to kill it after running for
24 hours.

SELECT
    MAX(titles.name) AS title_name,
    MAX(providers.short_name) AS provider_short_name,
    SUM(x.xtns) AS xtns,
    SUM(x.rev) AS rev
FROM xtns_by_mso_title_wk x
INNER JOIN providers providers ON x.provider_no = providers.provider_no
INNER JOIN titles titles ON x.title_no = titles.title_no
WHERE x.mso_no = 50
    AND x.week BETWEEN '20060423 00:00:00' AND  '20060423 00:00:00'
GROUP BY x.title_no, x.provider_no

The EXPLAIN ANALYZE looks like:

 GroupAggregate  (cost=11.63..11.67 rows=1 width=61) (actual
time=1440.550..1467.602 rows=3459 loops=1)
   ->  Sort  (cost=11.63..11.64 rows=1 width=61) (actual
time=1440.515..1446.634 rows=3934 loops=1)
         Sort Key: x.title_no, x.provider_no
         ->  Nested Loop  (cost=0.00..11.62 rows=1 width=61) (actual
time=7.900..1422.686 rows=3934 loops=1)
               ->  Nested Loop  (cost=0.00..7.38 rows=1 width=49)
(actual time=7.877..1373.392 rows=3934 loops=1)
                     ->  Index Scan using unq_xtns_by_mso_title_wk on
xtns_by_mso_title_wk x  (cost=0.00..4.12 rows=1 width=26) (actual
time=7.827..1297.681 rows=3934 loops=1)
                           Index Cond: ((week >= '2006-04-23
00:00:00'::timestamp without time zone) AND (week <= '2006-04-23
00:00:00'::timestamp without time zone) AND (mso_no = 50))
                     ->  Index Scan using pk_titles on titles
(cost=0.00..3.25 rows=1 width=27) (actual time=0.010..0.012 rows=1
loops=3934)
                           Index Cond: ("outer".title_no = titles.title_no)
               ->  Index Scan using pk_providers on providers
(cost=0.00..4.23 rows=1 width=16) (actual time=0.004..0.005 rows=1
loops=3934)
                     Index Cond: ("outer".provider_no = providers.provider_no)

If it is searching over multiple weeks (week BETWEEN '20060417
00:00:00' AND  '20060423 00:00:00'), it estimates better and uses a
hash join.

 GroupAggregate  (cost=7848.20..7878.48 rows=156 width=61) (actual
time=117.761..145.910 rows=3459 loops=1)
   ->  Sort  (cost=7848.20..7852.08 rows=1552 width=61) (actual
time=117.735..123.823 rows=3934 loops=1)
         Sort Key: x.title_no, x.provider_no
         ->  Hash Join  (cost=5.95..7765.94 rows=1552 width=61)
(actual time=6.539..102.825 rows=3934 loops=1)
               Hash Cond: ("outer".provider_no = "inner".provider_no)
               ->  Nested Loop  (cost=0.00..7736.71 rows=1552
width=49) (actual time=5.117..86.980 rows=3934 loops=1)
     ->  Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on
xtns_by_mso_title_wk x  (cost=0.00..2677.04 rows=1552 width=26)
(actual time=5.085..18.065 rows=3934 loops=1)
                           Index Cond: ((week >= '2006-04-17
00:00:00'::timestamp without time zone) AND (week <= '2006-04-23
00:00:00'::timestamp without time zone) AND (mso_no = 50))
                     ->  Index Scan using pk_titles on titles
(cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.010 rows=1
loops=3934)
                           Index Cond: ("outer".title_no = titles.title_no)
               ->  Hash  (cost=5.16..5.16 rows=316 width=16) (actual
time=1.356..1.356 rows=325 loops=1)
                     ->  Seq Scan on providers  (cost=0.00..5.16
rows=316 width=16) (actual time=0.008..0.691 rows=325 loops=1)

If the week range is replace by an equals (week = '20060423
00:00:00'), it also uses a hash join.  Unforuntately, the queries are
automatically generated and changing them to use an equals could be
problematic.

 GroupAggregate  (cost=7828.75..7859.32 rows=157 width=61) (actual
time=98.330..125.370 rows=3459 loops=1)
   ->  Sort  (cost=7828.75..7832.67 rows=1567 width=61) (actual
time=98.303..104.055 rows=3934 loops=1)
         Sort Key: x.title_no, x.provider_no
         ->  Hash Join  (cost=5.95..7745.60 rows=1567 width=61)
(actual time=1.785..83.830 rows=3934 loops=1)
               Hash Cond: ("outer".provider_no = "inner".provider_no)
               ->  Nested Loop  (cost=0.00..7716.14 rows=1567
width=49) (actual time=0.170..68.338 rows=3934 loops=1)
     ->  Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on
xtns_by_mso_title_wk x  (cost=0.00..2607.56 rows=1567 width=26)
(actual time=0.138..11.993 rows=3934 loops=1)
                           Index Cond: ((week = '2006-04-23
00:00:00'::timestamp without time zone) AND (mso_no = 50))
                     ->  Index Scan using pk_titles on titles
(cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.008 rows=1
loops=3934)
                           Index Cond: ("outer".title_no = titles.title_no)
               ->  Hash  (cost=5.16..5.16 rows=316 width=16) (actual
time=1.565..1.565 rows=325 loops=1)
                     ->  Seq Scan on providers  (cost=0.00..5.16
rows=316 width=16) (actual time=0.008..0.677 rows=325 loops=1)

Does anyone have some suggestions to try? The most worrying thing is
that when the statistics are off, it can do a  pathological query.

 - Ian

Re: Nested loop join and date range query

From

Tom Lane

Date:

03 May 2006, 03:04:13

"Ian Burrell" <ianburrell@gmail.com> writes:
> We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are
> having performance problems and running for very long times.  The
> commonality seems to be PostgreSQL 8.1 is choosing to use a nested
> loop join because it estimates there will be only be a single row.

>                      ->  Index Scan using unq_xtns_by_mso_title_wk on
> xtns_by_mso_title_wk x  (cost=0.00..4.12 rows=1 width=26) (actual
> time=7.827..1297.681 rows=3934 loops=1)
>                            Index Cond: ((week >= '2006-04-23
> 00:00:00'::timestamp without time zone) AND (week <= '2006-04-23
> 00:00:00'::timestamp without time zone) AND (mso_no = 50))

We've already noted that there's a problem with estimating zero-width
ranges (too lazy to search the archives, but this has come up at least
twice recently).  Can you modify your app to generate something like

    week >= x and week < x+1

instead of

    week >= x and week <= x

?  My recollection is that the fix will probably be complicated
enough to not get back-patched into 8.1.

BTW, AFAIK the same problem exists in 7.4.  What kind of estimates/plans
were you getting for this case in 7.4?

            regards, tom lane

Re: Nested loop join and date range query

From

"Ian Burrell"

Date:

03 May 2006, 17:54:12

On 5/2/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Ian Burrell" <ianburrell@gmail.com> writes:
> > We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are
> > having performance problems and running for very long times.  The
> > commonality seems to be PostgreSQL 8.1 is choosing to use a nested
> > loop join because it estimates there will be only be a single row.
>
> We've already noted that there's a problem with estimating zero-width
> ranges (too lazy to search the archives, but this has come up at least
> twice recently).  Can you modify your app to generate something like
>
>         week >= x and week < x+1
>
> instead of
>
>         week >= x and week <= x
>

I am working on modifying the SQL generation code to replace the
zero-width range with an equals.

Does BETWEEN have the same bug?

> ?  My recollection is that the fix will probably be complicated
> enough to not get back-patched into 8.1.
>
> BTW, AFAIK the same problem exists in 7.4.  What kind of estimates/plans
> were you getting for this case in 7.4?
>

We get similar rows=1 estimates on 7.4.  7.4 doesn't choose to use the
nested loop joins so it performs fine.

We have been getting similar rows=1 estimates and nested loop joins
with some other queries. But I think those are caused by not
frequently analyzing log type tables and then searching for recent
days which it doesn't think exist.

 - Ian