Re: [SQL] OFFSET impact on Performance??? - Mailing list pgsql-performance

From PFC
Subject Re: [SQL] OFFSET impact on Performance???
Date
Msg-id opsk9iwzf2th1vuj@musicbox
Whole thread Raw
In response to Re: [SQL] OFFSET impact on Performance???  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: [SQL] OFFSET impact on Performance???
List pgsql-performance

> for example,
> http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
> see OPERATIONS and EXAMPLE USAGE:
>

    Thanks, I already know this documentation and have used intarray before
(I find it absolutely fabulous in the right application, it has a great
potential for getting out of tight situations which would involve huge
unmanageable pivot or attributes tables). Its only drawback is that the
gist index creation time is slow and sometimes just... takes forever until
the disk is full.
    However, it seems that integer && integer[] does not exist :

> SELECT * FROM table WHERE id && int[]

explain analyze select * from temp t where id &&
( '{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[] );
ERREUR:  L'operateur n'existe pas : integer && integer[]
ASTUCE : Aucun operateur correspond au nom donne et aux types d'arguments.
Vous devez ajouter des conversions explicites de type.

    I have already used this type of intarray indexes, but you have to create
a special gist index with the gist__int_ops on the column, and the column
has to be an array. In my case the column is just a SERIAL PRIMARY KEY,
and should stay this way, and I don't want to create a functional index in
array[id] just for this feature ; so I guess I can't use the && operator.
Am I mistaken ? My index is the standard btree here.

    It would be nice if the =ANY() could use the index just like IN does ;
besides at planning time the length of the array is known which makes it
behave quite just like IN().

    So I'll use either an EXECUTE'd plpgsql-generated query (IN (....)) ,
which I don't like because it's a kludge ; or this other solution which I
find more elegant :

CREATE OR REPLACE FUNCTION tools.array_srf( INTEGER[] )
         RETURNS SETOF INTEGER        RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql        AS
$$
DECLARE
    _data    ALIAS FOR $1;
    _i        INTEGER;
BEGIN
    FOR _i IN 1..icount(_data) LOOP
        RETURN NEXT _data[_i];
    END LOOP;
    RETURN;
END;
$$;

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id
=ANY( '{1,2,3,4,5,6,7,8,9,10,11,12}' );
  Seq Scan on "temp" t  (cost=0.00..5165.52 rows=65536 width=8) (actual
time=0.030..173.319 rows=12 loops=1)
    Filter: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))
  Total runtime: 173.391 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id
IN( 1,2,3,4,5,6,7,8,9,10,11,12 );
  Index Scan using temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey on "temp" t  (cost=0.00..36.49 rows=12 width=8) (actual
time=0.046..0.137 rows=12 loops=1)
    Index Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5)
OR (id = 6) OR (id = 7) OR (id = 8) OR (id = 9) OR (id = 10) OR (id = 11)
OR (id = 12))
  Total runtime: 0.292 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id in (select * from
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}'));
  Nested Loop  (cost=15.00..620.20 rows=200 width=8) (actual
time=0.211..0.368 rows=12 loops=1)
    ->  HashAggregate  (cost=15.00..15.00 rows=200 width=4) (actual
time=0.160..0.173 rows=12 loops=1)
          ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.127..0.139 rows=12 loops=1)
    ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..3.01 rows=1
width=8) (actual time=0.010..0.012 rows=1 loops=12)
          Index Cond: (t.id = "outer".array_srf)
  Total runtime: 0.494 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t, (select * from
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}')) foo where foo.array_srf =
t.id;

  Merge Join  (cost=62.33..2824.80 rows=1000 width=12) (actual
time=0.215..0.286 rows=12 loops=1)
    Merge Cond: ("outer".id = "inner".array_srf)
    ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..2419.79
rows=131072 width=8) (actual time=0.032..0.056 rows=13 loops=1)
    ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=0.169..0.173 rows=12 loops=1)
          Sort Key: array_srf.array_srf
          ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.127..0.139 rows=12 loops=1)
  Total runtime: 0.391 ms

-----------------------------------------------------------------------------------
Note that the meaning is different ; the IN removes duplicates in the
array but the join does not.



pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: [SQL] OFFSET impact on Performance???
Next
From: Oleg Bartunov
Date:
Subject: Re: [SQL] OFFSET impact on Performance???