Thread: Need help with optimising simple query
Hi, I am having a query that has an order by and a limit clause. The column on which I am doing order by is indexed (default b tree index). However the index is not being used. On tweaking the query a bit I found that when I use left join index is not used whereas when I use inner join the index is used. Unfortunately, the behaviour we expect is that of left join only. My question is, is there any way to modify/improve the query to improve the query speed or is this the best that is possible for this case. Please find below a simplified version of the queries. I tried the queries on 9.3 and 10 versions and both gave similar results. Table structure performance_test=# \d+ child Table "public.child" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+--------+-----------+----------+-----------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('child_id_seq'::regclass) | plain | | name | text | | not null | | extended | | Indexes: "child_pkey" PRIMARY KEY, btree (id) "child_name_unique" UNIQUE CONSTRAINT, btree (name) Referenced by: TABLE "parent" CONSTRAINT "parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id) performance_test=# \d+ parent Table "public.parent" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ----------+--------+-----------+----------+------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('parent_id_seq'::regclass) | plain | | name | text | | not null | | extended | | child_id | bigint | | | | plain | | Indexes: "parent_pkey" PRIMARY KEY, btree (id) "parent_name_unique" UNIQUE CONSTRAINT, btree (name) "parent_child_id_idx" btree (child_id) Foreign-key constraints: "parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id) Query used to populate data performance_test=# insert into child(name) select concat('child ', gen.id) as name from (select generate_series(1,100000) as id) as gen; performance_test=# insert into parent(name, child_id) select concat('parent ', gen.id) as name, (id%100000) + 1 from (select generate_series(1,1000000) as id) as gen; Left join with order by using child name performance_test=# explain analyze select * from parent left join child on parent.child_id = child.id order by child.name limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=69318.55..69318.58 rows=10 width=59) (actual time=790.708..790.709 rows=10 loops=1) -> Sort (cost=69318.55..71818.55 rows=1000000 width=59) (actual time=790.705..790.706 rows=10 loops=1) Sort Key: child.name Sort Method: top-N heapsort Memory: 27kB -> Hash Left Join (cost=3473.00..47708.91 rows=1000000 width=59) (actual time=51.066..401.028 rows=1000000 loops=1) Hash Cond: (parent.child_id = child.id) -> Seq Scan on parent (cost=0.00..17353.00 rows=1000000 width=29) (actual time=0.026..67.848 rows=1000000 loops=1) -> Hash (cost=1637.00..1637.00 rows=100000 width=19) (actual time=50.879..50.879 rows=100000 loops=1) Buckets: 65536 Batches: 2 Memory Usage: 3053kB -> Seq Scan on child (cost=0.00..1637.00 rows=100000 width=19) (actual time=0.018..17.281 rows=100000 loops=1) Planning time: 1.191 ms Execution time: 790.797 ms (12 rows) Inner join with sorting according to child name performance_test=# explain analyze select * from parent inner join child on parent.child_id = child.id order by child.name limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.84..2.03 rows=10 width=59) (actual time=0.156..0.193 rows=10 loops=1) -> Nested Loop (cost=0.84..119132.56 rows=1000000 width=59) (actual time=0.154..0.186 rows=10 loops=1) -> Index Scan using child_name_unique on child (cost=0.42..5448.56 rows=100000 width=19) (actual time=0.126..0.126 rows=1 loops=1) -> Index Scan using parent_child_id_idx on parent (cost=0.42..1.04 rows=10 width=29) (actual time=0.019..0.045 rows=10 loops=1) Index Cond: (child_id = child.id) Planning time: 0.941 ms Execution time: 0.283 ms (7 rows) Version performance_test=# select version(); version ----------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit (1 row) Any help from Postgres experts would be great. Thanks, Nanda
Nandakumar M <m.nanda92@gmail.com> writes: > I am having a query that has an order by and a limit clause. The > column on which I am doing order by is indexed (default b tree index). > However the index is not being used. On tweaking the query a bit I > found that when I use left join index is not used whereas when I use > inner join the index is used. The reason the index isn't being used is that the sort order the query requests isn't the same as the order provided by the index. Here: > performance_test=# explain analyze select * from parent left join > child on parent.child_id = child.id order by child.name limit 10; you're asking to sort by a column that will include null values for child.name anywhere that there's a parent row without a match for child_id. Those rows aren't even represented in the index on child.name, much less placed in the right order. regards, tom lane
Hi Tom, Is there something that I can do to improve the performance of such queries (where ordering is done based on child table column and join is left join)? Maybe a combined index or something like that? Or is it possible to modify the query to get same result but execute faster. One ad-hoc optimisation (which gives somewhat better performance) that came to mind is to have a sub query for child table like performance_test=# explain analyze select * from parent left join (select * from child order by name limit 10) as child on parent.child_id = child.id order by child.name limit 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=42714.84..42714.86 rows=10 width=59) (actual time=311.623..311.624 rows=10 loops=1) -> Sort (cost=42714.84..45214.84 rows=1000000 width=59) (actual time=311.622..311.622 rows=10 loops=1) Sort Key: child.name Sort Method: top-N heapsort Memory: 26kB -> Hash Left Join (cost=1.19..21105.20 rows=1000000 width=59) (actual time=0.120..204.386 rows=1000000 loops=1) Hash Cond: (parent.child_id = child.id) -> Seq Scan on parent (cost=0.00..17353.00 rows=1000000 width=29) (actual time=0.073..73.052 rows=1000000 loops=1) -> Hash (cost=1.06..1.06 rows=10 width=19) (actual time=0.035..0.035 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Limit (cost=0.42..0.96 rows=10 width=19) (actual time=0.014..0.027 rows=10 loops=1) -> Index Scan using child_name_unique on child (cost=0.42..5448.56 rows=100000 width=19) (actual time=0.013..0.024 rows=10 loops=1) Planning time: 0.505 ms Execution time: 311.682 ms (13 rows) Time: 312.673 ms Is there something I can do that will improve the query performance much more than this? Thanks. Regards, Nanda On Mon, 9 Jul 2018, 19:53 Tom Lane, <tgl@sss.pgh.pa.us> wrote: > > Nandakumar M <m.nanda92@gmail.com> writes: > > I am having a query that has an order by and a limit clause. The > > column on which I am doing order by is indexed (default b tree index). > > However the index is not being used. On tweaking the query a bit I > > found that when I use left join index is not used whereas when I use > > inner join the index is used. > > The reason the index isn't being used is that the sort order the query > requests isn't the same as the order provided by the index. Here: > > > performance_test=# explain analyze select * from parent left join > > child on parent.child_id = child.id order by child.name limit 10; > > you're asking to sort by a column that will include null values for > child.name anywhere that there's a parent row without a match for > child_id. Those rows aren't even represented in the index on child.name, > much less placed in the right order. > > regards, tom lane
I didn't find any CREATE TABLE's in your description, or else I would have tried it with the sequences and all that, but I think this ought to work. postgres=# explain select * from ((select * from parent inner join child on parent.child_id = child.id limit 10) union all (select * from parent left outer join child on parent.child_id = child.id where child.id is null limit 10)) as v limit 10; QUERY PLAN --------------------------------------------------------------------------------------------------------- Limit (cost=0.15..3.72 rows=10 width=88) -> Append (cost=0.15..7.29 rows=20 width=88) -> Limit (cost=0.15..2.46 rows=10 width=88) -> Nested Loop (cost=0.15..246.54 rows=1070 width=88) -> Seq Scan on parent (cost=0.00..20.70 rows=1070 width=48) -> Index Scan using child_pkey on child (cost=0.15..0.21 rows=1 width=40) Index Cond: (id = parent.child_id) -> Limit (cost=0.15..4.63 rows=10 width=88) -> Nested Loop Anti Join (cost=0.15..239.71 rows=535 width=88) -> Seq Scan on parent parent_1 (cost=0.00..20.70 rows=1070 width=48) -> Index Scan using child_pkey on child child_1 (cost=0.15..0.21 rows=1 width=40) Index Cond: (parent_1.child_id = id) (12 rows) ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
...on second thought, the placement of the IS NULL predicate isn't quite right, but if you futz with it a bit I think you can make it work ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html