Need help with optimising simple query - Mailing list pgsql-performance

From Nandakumar M
Subject Need help with optimising simple query
Date
Msg-id CANcFUu5a9W_MORQa5Tv9vsofN_w_irU9-OBQtN34YmL7p4gP_g@mail.gmail.com
Whole thread Raw
Responses Re: Need help with optimising simple query  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Need help with optimising simple query  (Jim Finnerty <jfinnert@amazon.com>)
List pgsql-performance
Hi,

I am having a query that has an order by and a limit clause. The
column on which I am doing order by is indexed (default b tree index).
However the index is not being used. On tweaking the query a bit I
found that when I use left join index is not used whereas when I use
inner join the index is used.

Unfortunately, the behaviour we expect is that of left join only. My
question is, is there any way to modify/improve the query to improve
the query speed or is this the best that is possible for this case.

Please find below a simplified version of the queries. I tried the
queries on 9.3 and 10 versions and both gave similar results.


Table structure

performance_test=# \d+ child
                                                Table "public.child"
 Column |  Type  | Collation | Nullable |              Default
     | Storage  | Stats target | Description
--------+--------+-----------+----------+-----------------------------------+----------+--------------+-------------
 id     | bigint |           | not null |
nextval('child_id_seq'::regclass) | plain    |              |
 name   | text   |           | not null |
     | extended |              |
Indexes:
    "child_pkey" PRIMARY KEY, btree (id)
    "child_name_unique" UNIQUE CONSTRAINT, btree (name)
Referenced by:
    TABLE "parent" CONSTRAINT "parent_child_id_fkey" FOREIGN KEY
(child_id) REFERENCES child(id)


performance_test=# \d+ parent
                                                 Table "public.parent"
  Column  |  Type  | Collation | Nullable |              Default
        | Storage  | Stats target | Description

----------+--------+-----------+----------+------------------------------------+----------+--------------+-------------
 id       | bigint |           | not null |
nextval('parent_id_seq'::regclass) | plain    |              |
 name     | text   |           | not null |
        | extended |              |
 child_id | bigint |           |          |
        | plain    |              |
Indexes:
    "parent_pkey" PRIMARY KEY, btree (id)
    "parent_name_unique" UNIQUE CONSTRAINT, btree (name)
    "parent_child_id_idx" btree (child_id)
Foreign-key constraints:
    "parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id)



Query used to populate data

performance_test=# insert into child(name) select concat('child ',
gen.id) as name  from (select generate_series(1,100000) as id) as gen;

performance_test=# insert into parent(name, child_id) select
concat('parent ', gen.id) as name, (id%100000) + 1  from (select
generate_series(1,1000000) as id) as gen;


Left join with order by using child name

performance_test=# explain analyze select * from parent left join
child on parent.child_id = child.id order by child.name limit 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=69318.55..69318.58 rows=10 width=59) (actual
time=790.708..790.709 rows=10 loops=1)
   ->  Sort  (cost=69318.55..71818.55 rows=1000000 width=59) (actual
time=790.705..790.706 rows=10 loops=1)
         Sort Key: child.name
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Hash Left Join  (cost=3473.00..47708.91 rows=1000000
width=59) (actual time=51.066..401.028 rows=1000000 loops=1)
               Hash Cond: (parent.child_id = child.id)
               ->  Seq Scan on parent  (cost=0.00..17353.00
rows=1000000 width=29) (actual time=0.026..67.848 rows=1000000
loops=1)
               ->  Hash  (cost=1637.00..1637.00 rows=100000 width=19)
(actual time=50.879..50.879 rows=100000 loops=1)
                     Buckets: 65536  Batches: 2  Memory Usage: 3053kB
                     ->  Seq Scan on child  (cost=0.00..1637.00
rows=100000 width=19) (actual time=0.018..17.281 rows=100000 loops=1)
 Planning time: 1.191 ms
 Execution time: 790.797 ms
(12 rows)



Inner join with sorting according to child name

performance_test=# explain analyze select * from parent inner join
child on parent.child_id = child.id order by child.name limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.84..2.03 rows=10 width=59) (actual time=0.156..0.193
rows=10 loops=1)
   ->  Nested Loop  (cost=0.84..119132.56 rows=1000000 width=59)
(actual time=0.154..0.186 rows=10 loops=1)
         ->  Index Scan using child_name_unique on child
(cost=0.42..5448.56 rows=100000 width=19) (actual time=0.126..0.126
rows=1 loops=1)
         ->  Index Scan using parent_child_id_idx on parent
(cost=0.42..1.04 rows=10 width=29) (actual time=0.019..0.045 rows=10
loops=1)
               Index Cond: (child_id = child.id)
 Planning time: 0.941 ms
 Execution time: 0.283 ms
(7 rows)




Version

performance_test=# select version();
                                                              version

-----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)


Any help from Postgres experts would be great.

Thanks,
Nanda


pgsql-performance by date:

Previous
From: Mariel Cherkassky
Date:
Subject: Re: where can I download the binaries of plpython extension
Next
From: Tom Lane
Date:
Subject: Re: Need help with optimising simple query