Thread: Need help with optimising simple query

Need help with optimising simple query

From
Nandakumar M
Date:
Hi,

I am having a query that has an order by and a limit clause. The
column on which I am doing order by is indexed (default b tree index).
However the index is not being used. On tweaking the query a bit I
found that when I use left join index is not used whereas when I use
inner join the index is used.

Unfortunately, the behaviour we expect is that of left join only. My
question is, is there any way to modify/improve the query to improve
the query speed or is this the best that is possible for this case.

Please find below a simplified version of the queries. I tried the
queries on 9.3 and 10 versions and both gave similar results.


Table structure

performance_test=# \d+ child
                                                Table "public.child"
 Column |  Type  | Collation | Nullable |              Default
     | Storage  | Stats target | Description
--------+--------+-----------+----------+-----------------------------------+----------+--------------+-------------
 id     | bigint |           | not null |
nextval('child_id_seq'::regclass) | plain    |              |
 name   | text   |           | not null |
     | extended |              |
Indexes:
    "child_pkey" PRIMARY KEY, btree (id)
    "child_name_unique" UNIQUE CONSTRAINT, btree (name)
Referenced by:
    TABLE "parent" CONSTRAINT "parent_child_id_fkey" FOREIGN KEY
(child_id) REFERENCES child(id)


performance_test=# \d+ parent
                                                 Table "public.parent"
  Column  |  Type  | Collation | Nullable |              Default
        | Storage  | Stats target | Description

----------+--------+-----------+----------+------------------------------------+----------+--------------+-------------
 id       | bigint |           | not null |
nextval('parent_id_seq'::regclass) | plain    |              |
 name     | text   |           | not null |
        | extended |              |
 child_id | bigint |           |          |
        | plain    |              |
Indexes:
    "parent_pkey" PRIMARY KEY, btree (id)
    "parent_name_unique" UNIQUE CONSTRAINT, btree (name)
    "parent_child_id_idx" btree (child_id)
Foreign-key constraints:
    "parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id)



Query used to populate data

performance_test=# insert into child(name) select concat('child ',
gen.id) as name  from (select generate_series(1,100000) as id) as gen;

performance_test=# insert into parent(name, child_id) select
concat('parent ', gen.id) as name, (id%100000) + 1  from (select
generate_series(1,1000000) as id) as gen;


Left join with order by using child name

performance_test=# explain analyze select * from parent left join
child on parent.child_id = child.id order by child.name limit 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=69318.55..69318.58 rows=10 width=59) (actual
time=790.708..790.709 rows=10 loops=1)
   ->  Sort  (cost=69318.55..71818.55 rows=1000000 width=59) (actual
time=790.705..790.706 rows=10 loops=1)
         Sort Key: child.name
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Hash Left Join  (cost=3473.00..47708.91 rows=1000000
width=59) (actual time=51.066..401.028 rows=1000000 loops=1)
               Hash Cond: (parent.child_id = child.id)
               ->  Seq Scan on parent  (cost=0.00..17353.00
rows=1000000 width=29) (actual time=0.026..67.848 rows=1000000
loops=1)
               ->  Hash  (cost=1637.00..1637.00 rows=100000 width=19)
(actual time=50.879..50.879 rows=100000 loops=1)
                     Buckets: 65536  Batches: 2  Memory Usage: 3053kB
                     ->  Seq Scan on child  (cost=0.00..1637.00
rows=100000 width=19) (actual time=0.018..17.281 rows=100000 loops=1)
 Planning time: 1.191 ms
 Execution time: 790.797 ms
(12 rows)



Inner join with sorting according to child name

performance_test=# explain analyze select * from parent inner join
child on parent.child_id = child.id order by child.name limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.84..2.03 rows=10 width=59) (actual time=0.156..0.193
rows=10 loops=1)
   ->  Nested Loop  (cost=0.84..119132.56 rows=1000000 width=59)
(actual time=0.154..0.186 rows=10 loops=1)
         ->  Index Scan using child_name_unique on child
(cost=0.42..5448.56 rows=100000 width=19) (actual time=0.126..0.126
rows=1 loops=1)
         ->  Index Scan using parent_child_id_idx on parent
(cost=0.42..1.04 rows=10 width=29) (actual time=0.019..0.045 rows=10
loops=1)
               Index Cond: (child_id = child.id)
 Planning time: 0.941 ms
 Execution time: 0.283 ms
(7 rows)




Version

performance_test=# select version();
                                                              version

-----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)


Any help from Postgres experts would be great.

Thanks,
Nanda


Re: Need help with optimising simple query

From
Tom Lane
Date:
Nandakumar M <m.nanda92@gmail.com> writes:
> I am having a query that has an order by and a limit clause. The
> column on which I am doing order by is indexed (default b tree index).
> However the index is not being used. On tweaking the query a bit I
> found that when I use left join index is not used whereas when I use
> inner join the index is used.

The reason the index isn't being used is that the sort order the query
requests isn't the same as the order provided by the index.  Here:

> performance_test=# explain analyze select * from parent left join
> child on parent.child_id = child.id order by child.name limit 10;

you're asking to sort by a column that will include null values for
child.name anywhere that there's a parent row without a match for
child_id.  Those rows aren't even represented in the index on child.name,
much less placed in the right order.

            regards, tom lane


Re: Need help with optimising simple query

From
Nandakumar M
Date:
Hi Tom,

Is there something that I can do to improve the performance of such
queries (where ordering is done based on child table column and join
is left join)? Maybe a combined index or something like that? Or is it
possible to modify the query to get same result but execute faster.
One ad-hoc optimisation (which gives somewhat better performance) that
came to mind is to have a sub query for child table like

performance_test=# explain analyze select * from parent left join
(select * from child order by name limit 10) as child on
parent.child_id = child.id order by child.name limit 10;


    QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=42714.84..42714.86 rows=10 width=59) (actual
time=311.623..311.624 rows=10 loops=1)
   ->  Sort  (cost=42714.84..45214.84 rows=1000000 width=59) (actual
time=311.622..311.622 rows=10 loops=1)
         Sort Key: child.name
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Hash Left Join  (cost=1.19..21105.20 rows=1000000
width=59) (actual time=0.120..204.386 rows=1000000 loops=1)
               Hash Cond: (parent.child_id = child.id)
               ->  Seq Scan on parent  (cost=0.00..17353.00
rows=1000000 width=29) (actual time=0.073..73.052 rows=1000000
loops=1)
               ->  Hash  (cost=1.06..1.06 rows=10 width=19) (actual
time=0.035..0.035 rows=10 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Limit  (cost=0.42..0.96 rows=10 width=19)
(actual time=0.014..0.027 rows=10 loops=1)
                           ->  Index Scan using child_name_unique on
child  (cost=0.42..5448.56 rows=100000 width=19) (actual
time=0.013..0.024 rows=10 loops=1)
 Planning time: 0.505 ms
 Execution time: 311.682 ms
(13 rows)

Time: 312.673 ms

Is there something I can do that will improve the query performance
much more than this?

Thanks.

Regards,
Nanda

On Mon, 9 Jul 2018, 19:53 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
>
> Nandakumar M <m.nanda92@gmail.com> writes:
> > I am having a query that has an order by and a limit clause. The
> > column on which I am doing order by is indexed (default b tree index).
> > However the index is not being used. On tweaking the query a bit I
> > found that when I use left join index is not used whereas when I use
> > inner join the index is used.
>
> The reason the index isn't being used is that the sort order the query
> requests isn't the same as the order provided by the index.  Here:
>
> > performance_test=# explain analyze select * from parent left join
> > child on parent.child_id = child.id order by child.name limit 10;
>
> you're asking to sort by a column that will include null values for
> child.name anywhere that there's a parent row without a match for
> child_id.  Those rows aren't even represented in the index on child.name,
> much less placed in the right order.
>
>                         regards, tom lane


Re: Need help with optimising simple query

From
Jim Finnerty
Date:
 I didn't find any CREATE TABLE's in your description, or else I would have
tried it with the sequences and all that, but I think this ought to work. 

postgres=# explain select * from ((select * from parent inner join child on
parent.child_id = child.id limit 10) union all (select * from parent left
outer join child on parent.child_id = child.id where child.id is null limit
10)) as v limit 10;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..3.72 rows=10 width=88)
   ->  Append  (cost=0.15..7.29 rows=20 width=88)
         ->  Limit  (cost=0.15..2.46 rows=10 width=88)
               ->  Nested Loop  (cost=0.15..246.54 rows=1070 width=88)
                     ->  Seq Scan on parent  (cost=0.00..20.70 rows=1070
width=48)
                     ->  Index Scan using child_pkey on child 
(cost=0.15..0.21 rows=1 width=40)
                           Index Cond: (id = parent.child_id)
         ->  Limit  (cost=0.15..4.63 rows=10 width=88)
               ->  Nested Loop Anti Join  (cost=0.15..239.71 rows=535
width=88)
                     ->  Seq Scan on parent parent_1  (cost=0.00..20.70
rows=1070 width=48)
                     ->  Index Scan using child_pkey on child child_1 
(cost=0.15..0.21 rows=1 width=40)
                           Index Cond: (parent_1.child_id = id)
(12 rows)





-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Need help with optimising simple query

From
Jim Finnerty
Date:
...on second thought, the placement of the IS NULL predicate isn't quite
right, but if you futz with it a bit I think you can make it work



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html