[BUGS] BUG #14496: Cost of comparator is not taken into account in sorting - Mailing list pgsql-bugs
From | zszabo@chemaxon.com |
---|---|
Subject | [BUGS] BUG #14496: Cost of comparator is not taken into account in sorting |
Date | |
Msg-id | 20170116111616.1431.40748@wrigleys.postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14496 Logged by: Zsuzsanna Szabo Email address: zszabo@chemaxon.com PostgreSQL version: 9.5.4 Operating system: ubuntu linux Description: Dear PostgreSQL Support, We are developing a chemical extension to PostgreSQL with a user defined type Molecule. We define a btree operator class for this Molecule type to enable sorting by that type: CREATE OR REPLACE FUNCTION molecule_relevance_compare(Molecule, Molecule) RETURNS integer AS 'MODULE_PATHNAME' LANGUAGE C RETURNS NULL ON NULL INPUT IMMUTABLE COST 40000; CREATE OPERATOR CLASS molecule_btree_ops DEFAULT FOR TYPE Molecule USING btree AS OPERATOR 1 &%<%&, OPERATOR 2 &%<=%&, OPERATOR 3 &%=%&, OPERATOR 4 &%>=%&, OPERATOR 5 &%>%&, FUNCTION 1 molecule_relevance_compare(Molecule, Molecule); Sorting is performed correctly by PostgreSQL if ORDER BY Molecule is used, our function molecule_relevance_compare() is called for performing the sort. There is no problem with that. Our problem is that the cost of sorting by the Molecule column does not take into account the cost of the comparison by molecule_relevance_compare() at all! We have defined a pre-sorted index on the Molecule type called sortedchemindex: CREATE OPERATOR CLASS sortedchemindex_int_ops DEFAULT FOR TYPE Molecule USING sortedchemindex AS OPERATOR 1 &%<%&, OPERATOR 2 &%<=%&, OPERATOR 3 &%=%&, OPERATOR 4 &%>=%&, OPERATOR 5 &%>%&, OPERATOR 6 |>|, OPERATOR 7 |<|, OPERATOR 8 |=|, OPERATOR 9 |~>| (Molecule, sim_filter), OPERATOR 10 |<~| (sim_filter, Molecule), OPERATOR 11 |<~| (Molecule, sim_filter), OPERATOR 12 |~>| (sim_filter, Molecule), OPERATOR 13 |~>| (Molecule, sim_order), OPERATOR 14 |<~| (sim_order, Molecule), OPERATOR 15 |<~| (Molecule, sim_order), OPERATOR 16 |~>| (sim_order, Molecule); We also set amcanorderby to true in sortedchemindex enrty of pg_am. The index is used nicely for simple sorted queries, like in the example below. The table is defined as: postgres=# \d pbch Table "public.pbch" Column | Type | Modifiers --------+---------------------+--------------------------------------------------- mol | molecule('default') | id | integer | not null default nextval('pbch_id_seq'::regclass) Indexes: "pbch_id_idx" PRIMARY KEY, btree (id) "pbch_index" sortedchemindex (mol) And we get the explain plan: postgres=# explain select mol from pbch where 'Br' |<| mol order by mol; QUERY PLAN ----------------------------------------------------------------------------------- Index Scan using pbch_index on pbch (cost=102.60..17729.49 rows=2349 width=1146) Index Cond: ('Br'::molecule |<| mol) (2 rows) No sorting is performed, great! However on joint queries the plan optimizer decides to use hash join and sorting although it is much more expensive than performing a nested loop and rely on the sorting in the index: postgres=# explain select * from pbch p, real_table r where 'Br' |<| p.mol and r.val < 0.8 and p.id = r.id order by mol; QUERY PLAN ------------------------------------------------------------------------------------------------- Sort (cost=18531.27..18535.95 rows=1872 width=1158) Sort Key: p.mol -> Hash Join (cost=775.10..18429.52 rows=1872 width=1158) Hash Cond: (p.id = r.id) -> Index Scan using pbch_index on pbch p (cost=102.60..17729.49 rows=2349 width=1150) Index Cond: ('Br'::molecule |<| mol) -> Hash (cost=423.50..423.50 rows=19920 width=8) -> Seq Scan on real_table r (cost=0.00..423.50 rows=19920 width=8) Filter: (val < '0.8'::double precision) (9 rows) postgres=# select * from pbch p, real_table r where 'Br' |<| p.mol and r.val < 0.8 and p.id = r.id order by mol; Time: 5881,363 ms If I set not to use sorting explicitely: postgres=# set enable_sort to off; SET Then the plan becomes: postgres=# explain select * from pbch p, real_table r where 'Br' |<| p.mol and r.val < 0.8 and p.id = r.id order by mol; QUERY PLAN -------------------------------------------------------------------------------------------- Nested Loop (cost=102.89..19203.30 rows=1872 width=1158) -> Index Scan using pbch_index on pbch p (cost=102.60..17729.49 rows=2349 width=1150) Index Cond: ('Br'::molecule |<| mol) -> Index Scan using real_table_id_idx on real_table r (cost=0.29..0.62 rows=1 width=8) Index Cond: (id = p.id) Filter: (val < '0.8'::double precision) (6 rows) And the execution is much faster because the expensive comparisons performed by molecule_relevance_compare() do not need to be performed: postgres=# select * from pbch p, real_table r where 'Br' |<| p.mol and r.val < 0.8 and p.id = r.id order by mol; Time: 74,368 ms It can be seen that the cost of molecule_relevance_compare() has no effect on the sorting cost because if I reset it to another value the hash join execution plan remains the same with the same costs. The below examle illustrates it: 1. molecule_relevance_compare() cost set to 40000: postgres=# explain select molecule_relevance_compare(mol, 'C') from pbch; QUERY PLAN ----------------------------------------------------------------- Seq Scan on pbch (cost=0.00..2324917.10 rows=23210 width=1146) (1 row) postgres=# explain select * from pbch p join real_table r on p.id = r.id where 'Br' |<| p.mol and r.val < 0.8 order by mol; QUERY PLAN ------------------------------------------------------------------------------------------------- Sort (cost=18531.27..18535.95 rows=1872 width=1153) Sort Key: p.mol -> Hash Join (cost=775.10..18429.52 rows=1872 width=1153) Hash Cond: (p.id = r.id) -> Index Scan using pbch_index on pbch p (cost=102.60..17729.49 rows=2349 width=1145) Index Cond: ('Br'::molecule |<| mol) -> Hash (cost=423.50..423.50 rows=19920 width=8) -> Seq Scan on real_table r (cost=0.00..423.50 rows=19920 width=8) Filter: (val < '0.8'::double precision) (9 rows) 2. molecule_relevance_compare() cost set to 80000: postgres=# explain select molecule_relevance_compare(mol, 'C') from pbch; QUERY PLAN ----------------------------------------------------------------- Seq Scan on pbch (cost=0.00..4645917.10 rows=23210 width=1141) (1 row) Cost increased from 2324917.10 to 4645917.10! postgres=# explain select * from pbch p join real_table r on p.id = r.id where 'Br' |<| p.mol and r.val < 0.8 order by mol; QUERY PLAN ------------------------------------------------------------------------------------------------- Sort (cost=18531.27..18535.95 rows=1872 width=1153) Sort Key: p.mol -> Hash Join (cost=775.10..18429.52 rows=1872 width=1153) Hash Cond: (p.id = r.id) -> Index Scan using pbch_index on pbch p (cost=102.60..17729.49 rows=2349 width=1145) Index Cond: ('Br'::molecule |<| mol) -> Hash (cost=423.50..423.50 rows=19920 width=8) -> Seq Scan on real_table r (cost=0.00..423.50 rows=19920 width=8) Filter: (val < '0.8'::double precision) (9 rows) But there is no change in the cost here!!! Please reply and let me know when this problem can be fixed. Best regards, Zsuzsanna Szabo Developer at chemaxon.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: