Thread: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized

where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized

From

Grzegorz Jaśkiewicz

Date:

26 January 2009, 10:41:29

Hey folks,

I have question really for all mighty developers, but don't want to
spam -hackers with it.

why :
select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search.
or select * from foo where (x,y) in ((1,2),(1,2),(1,2),(1,2),(1,2),(1,2),(1,2));

never gets optimized by planner, etc ?

Is it just not worth optimizing from pg side? I am sure, it would make
sense to actually reorder these values, so that index/whatnot could
pick it up faster.

Just another one of those, 'why' (not) questions from my side.

thanks.

--
GJ

Re: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized

From

Filip Rembiałkowski

Date:

26 January 2009, 12:59:00

2009/1/26 Grzegorz Jaśkiewicz <gryzman@gmail.com>

Hey folks,

I have question really for all mighty developers, but don't want to
spam -hackers with it.

why :
select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search.
or select * from foo where (x,y) in ((1,2),(1,2),(1,2),(1,2),(1,2),(1,2),(1,2));

never gets optimized by planner, etc ?

I would guess that optimizing silly-written queries was always a low-priority task...

IMHO this is good topic for -hackers list.. and probably not so hard to implement :)

BTW, test on CVS HEAD:

CREATE TABLE atest(id integer primary key);
insert into atest select x from generate_series(1,100000) x(x);
ANALYZE atest;
EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,5);

shows that second query is 2.5 times faster than the first ( 0.170 ms / 0.070 ms).

Is it just not worth optimizing from pg side? I am sure, it would make
sense to actually reorder these values, so that index/whatnot could
pick it up faster.

Just another one of those, 'why' (not) questions from my side.

thanks.

--
GJ

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Filip Rembiałkowski

Re: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized

From

Grzegorz Jaśkiewicz

Date:

26 January 2009, 13:12:17

On Mon, Jan 26, 2009 at 1:58 PM, Filip Rembiałkowski
<plk.zuber@gmail.com> wrote:
> 2009/1/26 Grzegorz Jaśkiewicz <gryzman@gmail.com>
>>
>> Hey folks,
>>
>> I have question really for all mighty developers, but don't want to
>> spam -hackers with it.
>>
>> why :
>> select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search.
>> or select * from foo where (x,y) in
>> ((1,2),(1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
>>
>> never gets optimized by planner, etc ?
>
> I would guess that optimizing silly-written queries was always a
> low-priority task...

> EXPLAIN ANALYZE SELECT * FROM atest where id in
> (1,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
> EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,5);
>
> shows that second query is 2.5 times faster than the first ( 0.170 ms /
> 0.070 ms).

the difference isn't so small than :)
silly or not, sometimes you end up with such collection passed on in
some silly languages.

--
GJ

Re: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized

From

Tom Lane

Date:

26 January 2009, 14:36:49

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> why :
> select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search.
> never gets optimized by planner, etc ?

We actually did that, before 8.0 or thereabouts.  It was removed because
it cost a lot of planner cycles for *every* query, and helped for only
a small number of stupidly-written ones.

In general, you can probably remove such duplicates on the client side
more efficiently than the database could anyway (realizing that whatever
it might do has to be datatype-independent).

            regards, tom lane