Re: Any disadvantages of using =ANY(ARRAY()) instead of IN? - Mailing list pgsql-performance

From geirB
Subject Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Date
Msg-id d4f2a2d4-0938-4b5c-8d4f-e65d8abf8bf6@q2g2000vbv.googlegroups.com
Whole thread Raw
In response to Any disadvantages of using =ANY(ARRAY()) instead of IN?  (Clemens Eisserer <linuxhippy@gmail.com>)
List pgsql-performance
I've encoutered similar issues myself (with UNION so far), so I tried
to build a simple test case, which may or may not cover Clemens's
case.

Test case 1 and 2 illustrates the issue, and case 3-9 are variations.

My observation: Looks like the optimizer cannot be close friends with
both UNION and IN/JOIN at the same time.
Actually - it looks like the UNION SELECT kids don't wanna share the
IN/JOIN toy we gave them, but are happy when they get their own toys
to play with ;)


DROP TABLE IF EXISTS table1;
CREATE TABLE table1 AS  SELECT i AS id FROM generate_series(1, 300000)
S(i);
CREATE INDEX ON table1(id);
ANALYZE table1;


-- Test 1: Slow. IN()
SELECT * FROM (
       SELECT * FROM table1
       UNION
       SELECT * FROM table1
) Q WHERE id IN (SELECT id FROM table1 LIMIT 10);


-- Test 2: Fast. ANY(ARRAY())
SELECT * FROM (
       SELECT * FROM table1
       UNION
       SELECT * FROM table1
) Q WHERE id = ANY(ARRAY(SELECT id FROM table1 LIMIT 10));






-- Test 3: Fast. Duplicate IN. Symptom fix? Or would you call it a
"better" query in terms of sql? -except for the unnecessary subquery,
which I kept for readability.
SELECT * FROM (
    SELECT * FROM table1
    WHERE id IN (SELECT id FROM table1 LIMIT 10)
    UNION
    SELECT * FROM table1
    WHERE id IN (SELECT id FROM table1 LIMIT 10)
) Q;


-- Test 4: Fast. Duplicate JOIN CTE.
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT * FROM (
     SELECT * FROM table1 JOIN id_list USING(id)
     UNION
     SELECT * FROM table1 JOIN id_list USING(id)
) Q;


-- Test 5: Slow. IN(CTE)
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT * FROM (
       SELECT * FROM table1
       UNION
       SELECT * FROM table1
) Q WHERE id IN (SELECT * FROM id_list);


-- Test 6: Slow. IN(explicit id list)
SELECT * FROM (
       SELECT * FROM table1
       UNION
       SELECT * FROM table1
) Q WHERE id IN (SELECT
UNNEST('{100001,100002,100003,100004,100005,100006,100007,100008,100009,10010}'::BIGINT[] )
AS id);


-- Test 7: Slow. IN(UNNEST(ARRAY())
SELECT * FROM (
       SELECT * FROM table1
       UNION
       SELECT * FROM table1
) Q WHERE id IN (SELECT UNNEST(ARRAY(SELECT id FROM table1 LIMIT 10))
AS id);


-- Test 8: Slow. JOIN CTE
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT * FROM (
      SELECT * FROM table1
      UNION
      SELECT * FROM table1
) Q JOIN id_list USING(id);


-- Test 9: Fast. JOIN CTE + UNION ALL/DISTINCT (not quite the same
query)
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT DISTINCT * FROM (
      SELECT * FROM table1
      UNION ALL
      SELECT * FROM table1
) Q JOIN id_list USING(id);


--
Geir Bostad
9.1.3(x64,win)

pgsql-performance by date:

Previous
From: geirB
Date:
Subject: Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Next
From: Robert Haas
Date:
Subject: Re: heavly load system spec