Thread: BUG #15717: Index not used when ordering by left joined table column

BUG #15717: Index not used when ordering by left joined table column

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15717
Logged by:          Alex
Email address:      cdalxndr@yahoo.com
PostgreSQL version: 10.0
Operating system:   Windows 10
Description:

The following query runs slow, as it seq scans tables, without using any
index:
 select
     wp.id as id
 from
     web_page wp
     left join web_url_path url on wp.id=url.page
 order by
     url.priority asc
 limit 1

Explain analyze:
"Limit  (cost=180140.55..180140.55 rows=1 width=8) (actual
time=2402.310..2402.310 rows=1 loops=1)"
"  ->  Sort  (cost=180140.55..185578.31 rows=2175106 width=8) (actual
time=2402.309..2402.309 rows=1 loops=1)"
"        Sort Key: url.priority"
"        Sort Method: top-N heapsort  Memory: 25kB"
"        ->  Hash Right Join  (cost=68981.25..169265.02 rows=2175106
width=8) (actual time=588.010..2232.158 rows=2159417 loops=1)"
"              Hash Cond: (url.page = wp.id)"
"              ->  Seq Scan on web_url_path url  (cost=0.00..47376.06
rows=2175106 width=8) (actual time=0.044..403.351 rows=2175106 loops=1)"
"              ->  Hash  (cost=33480.89..33480.89 rows=2163789 width=4)
(actual time=581.473..581.473 rows=2158888 loops=1)"
"                    Buckets: 131072  Batches: 32  Memory Usage: 3403kB"
"                    ->  Seq Scan on web_page wp  (cost=0.00..33480.89
rows=2163789 width=4) (actual time=0.076..251.148 rows=2158888 loops=1)"
"Planning time: 0.295 ms"
"Execution time: 2402.956 ms"

Note that I have indexes (btree) on columns "web_url_path.priority" and
"web_url_path.page".
I was expecting the "priority" column index would be used to retrieve first
url then do a reverse join to get the corresponding page, and continue if no
page is found.

Note that the following query uses the "priority" index:
select u.id
from web_url_path u
order by u.priority
limit 1

Explain analyze:
"Limit  (cost=0.43..0.50 rows=1 width=8) (actual time=0.014..0.014 rows=1
loops=1)"
"  ->  Index Scan using web_url_path_priority_idx on web_url_path u
(cost=0.43..144920.91 rows=2175106 width=8) (actual time=0.014..0.014 rows=1
loops=1)"
"Planning time: 0.114 ms"
"Execution time: 0.026 ms"


PG Bug reporting form <noreply@postgresql.org> writes:
> The following query runs slow, as it seq scans tables, without using any
> index:
>  select
>      wp.id as id
>  from
>      web_page wp
>      left join web_url_path url on wp.id=url.page
>  order by
>      url.priority asc
>  limit 1

> I was expecting the "priority" column index would be used to retrieve first
> url then do a reverse join to get the corresponding page, and continue if no
> page is found.

It would probably do that if you used a plain join, or a right join.
But a nestloop plan cannot be driven from the inside of an outer join,
and that's the case this query presents.

            regards, tom lane



Indeed, with an inner join the index is used.
Thanks for the assistance.



On Thursday, March 28, 2019, 3:19:30 AM GMT+2, Tom Lane <tgl@sss.pgh.pa.us> wrote:


PG Bug reporting form <noreply@postgresql.org> writes:
> The following query runs slow, as it seq scans tables, without using any
> index:
>  select
>      wp.id as id
>  from
>      web_page wp
>      left join web_url_path url on wp.id=url.page
>  order by
>      url.priority asc
>  limit 1

> I was expecting the "priority" column index would be used to retrieve first
> url then do a reverse join to get the corresponding page, and continue if no
> page is found.

It would probably do that if you used a plain join, or a right join.
But a nestloop plan cannot be driven from the inside of an outer join,
and that's the case this query presents.

            regards, tom lane