Home > mailing lists

Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY - Mailing list pgsql-hackers

From	Andrew Gierth
Subject	Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Date	July 5, 2014 23:56:10
Msg-id	87egxzbn01.fsf@news-spur.riddles.org.uk Whole thread Raw
Responses	Re: Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Spent some time analyzing a severe performance regression on 9.1->9.3
upgrade for a user on IRC. Narrowed it down to this:

commit 807a40c5 fixed a bug in handling of (new in 9.2) functionality
of ScalarArrayOpExpr in btree index quals, forcing the results of
scans including such a qual to be treated as unordered (because the
order can in fact be wrong).

However, this kills any chance of using the same index _without_ the
SAOP to get the benefit of its ordering.

For example:

regression=# create index on tenk1 (ten,unique1,thousand);
CREATE INDEX
regression=# set enable_sort = false;
SET
regression=# explain analyze select * from tenk1 where thousand in (19,29,39,49,57,66,77,8,90,12,22,32) and (ten >= 5)
and(ten > 5 or unique1 > 5000) order by ten,unique1 limit 1; 
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10000000294.71..10000000294.71 rows=1 width=244) (actual time=0.889..0.891 rows=1 loops=1)
   ->  Sort  (cost=10000000294.71..10000000294.81 rows=42 width=244) (actual time=0.884..0.884 rows=1 loops=1)
         Sort Key: ten, unique1
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on tenk1  (cost=44.34..294.50 rows=42 width=244) (actual time=0.194..0.687 rows=80
loops=1)
               Recheck Cond: (thousand = ANY ('{19,29,39,49,57,66,77,8,90,12,22,32}'::integer[]))
               Filter: ((ten >= 5) AND ((ten > 5) OR (unique1 > 5000)))
               Rows Removed by Filter: 40
               Heap Blocks: exact=100
               ->  Bitmap Index Scan on tenk1_thous_tenthous  (cost=0.00..44.33 rows=120 width=0) (actual
time=0.148..0.148rows=120 loops=1) 
                     Index Cond: (thousand = ANY ('{19,29,39,49,57,66,77,8,90,12,22,32}'::integer[]))
 Planning time: 0.491 ms
 Execution time: 1.023 ms
(13 rows)

(real case had larger rowcounts of course, but showed the same effect
of using a Sort plan even when enable_sort=false.)

Obviously one can create a (ten,unique1) index (which would otherwise
be almost completely redundant with the 3-column one), but that wastes
space and eliminates some index-only-scan opportunities. Similarly one
can hack the query, e.g. using WHERE (thousand+0) IN (...) but that is
as ugly as all sin.

I've experimented with the attached patch, which detects when this
situation might have occurred and does another pass to try and build
ordered scans without the SAOP condition. However, the results may not
be quite ideal, because at least in some test queries (not all) the
scan with the SAOP included in the indexquals is being costed higher
than the same scan with the SAOP moved to a Filter, which seems
unreasonable.  (One of the affected queries is the regression test for
the original bug, which this patch does not yet try and fix and
therefore currently fails regression on.)

Thoughts?

--
Andrew (irc:RhodiumToad)

Attachment

saop.patch

pgsql-hackers by date:

From: Christoph Moench-Tegeder
Date: 05 July 2014, 20:41:35
Subject: 9.4 documentation: duplicate paragraph in logical decoding example

From: David Rowley
Date: 06 July 2014, 00:16:33
Subject: Re: Allowing join removals for more join types

Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY - Mailing list pgsql-hackers

Attachment

Previous

Next