Home > mailing lists

Thread: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition withsingle item

[HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition withsingle item

From

Dima Pavlov

Date:

23 July 2017, 21:22:11

Hello,

The problems I tried to solve here:

1. Improve perfomance for index search ANY(ARRAY[...]) condition with single item

2. I saw tons of users code like: if len(array) == 1: sql += '{}'.format(array[0]) else: sql += 'ANY(ARRAY[{}])'.format(array)

So there will be less lines of code and it will be clearer.

3. Confusing moment that "IN" works well with single item, and "ANY(ARRAY[])" doesn't without any real reason.

The problem was discussed on stackoverflow:

https://stackoverflow.com/questions/45061966/index-usage-with-single-item-anyarray

That's my first patch so I will be grateful for constructive criticism.

---------------------------------------------------------------------------

CREATE TABLE public.t (id serial, a integer, b integer);

INSERT INTO t(a, b)

SELECT round(random()*1000), round(random()*1000)

FROM generate_series(1, 1000000);

CREATE INDEX "i_1" ON public.t USING btree (a, b);

CREATE INDEX "i_2" ON public.t USING btree (b);

---------------------------------------------------------------------------

If "a = 50" in the first query, everything is ok, appropriate index "i_1" is used:

SELECT * FROM t WHERE a = 50 ORDER BY b LIMIT 1

"Limit (cost=0.42..4.03 rows=1 width=12) (actual time=0.085..0.085 rows=1 loops=1)"

" Buffers: shared hit=1 read=3"

" -> Index Scan using i_1 on t (cost=0.42..4683.12 rows=1300 width=12) (actual time=0.084..0.084 rows=1 loops=1)"

" Index Cond: (a = 50)"

" Buffers: shared hit=1 read=3"

"Planning time: 0.637 ms"

"Execution time: 0.114 ms"

---------------------------------------------------------------------------

With "a IN (50)" result is the same:

SELECT * FROM t WHERE a IN (50) ORDER BY b LIMIT 1

"Limit (cost=0.42..4.03 rows=1 width=12) (actual time=0.058..0.058 rows=1 loops=1)"

" Buffers: shared hit=4"

" -> Index Scan using i_1 on t (cost=0.42..4683.12 rows=1300 width=12) (actual time=0.056..0.056 rows=1 loops=1)"

" Index Cond: (a = 50)"

" Buffers: shared hit=4"

"Planning time: 0.287 ms"

"Execution time: 0.105 ms"

---------------------------------------------------------------------------

The problem is when I try to use "a = ANY(ARRAY[50])". Wrong index "i_2" is used instead of "i_1" and execution time becomes x25 longer:

SELECT * FROM t WHERE a = ANY(ARRAY[50]) ORDER BY b LIMIT 1

"Limit (cost=0.42..38.00 rows=1 width=12) (actual time=2.591..2.591 rows=1 loops=1)"

" Buffers: shared hit=491 read=4"

" -> Index Scan using i_2 on t (cost=0.42..48853.65 rows=1300 width=12) (actual time=2.588..2.588 rows=1 loops=1)"

" Filter: (a = ANY ('{50}'::integer[]))"

" Rows Removed by Filter: 520"

" Buffers: shared hit=491 read=4"

"Planning time: 0.251 ms"

"Execution time: 2.627 ms"

Attachment

improve-single-item-array.patch

Re: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item

From

Tom Lane

Date:

23 July 2017, 22:55:47

Dima Pavlov <imyfess@gmail.com> writes:
> The problem was discussed on stackoverflow:
> https://stackoverflow.com/questions/45061966/index-usage-with-single-item-anyarray

Usually, we're not very happy with submissions that reference external
pages for justification; we like to have all the relevant material in our
mail archives.

> That's my first patch so I will be grateful for constructive criticism.

I think it would be better to do this in the planner, see specifically
eval_const_expressions.  That would allow the transformation to succeed
in more cases, like where the array is coming from a parameter rather
than an ARRAY[] construct.  That is, you should be able to do this not
only for ARRAY[x] but for any single-element constant array.
        regards, tom lane

Re: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item

From

Tom Lane

Date:

23 July 2017, 23:07:30

I wrote:
> Dima Pavlov <imyfess@gmail.com> writes:
>> That's my first patch so I will be grateful for constructive criticism.

> I think it would be better to do this in the planner, see specifically
> eval_const_expressions.

BTW, currently eval_const_expressions doesn't know anything about
ScalarArrayOpExpr, but there's a patch pending to improve that:

https://commitfest.postgresql.org/14/1136/

You should probably build your revised patch as a follow-on to that
one, else we're going to have merge conflicts.
        regards, tom lane