Thread: Querying with arrays

Querying with arrays

From
Tim Dudgeon
Date:
I'm considering using arrays to handle managing "lists" of rows (I know this may not be the best approach, but bear
withme).<br /><br /> I create a table for my lists like this:<b
id="docs-internal-guid-b52591dd-f159-1b1d-dee1-e0c932294de6"style="font-weight:normal;"><span
style="font-size:15px;font-family:Arial;color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></span></b><br
/><br/> create table lists (<br />  id SERIAL PRIMARY KEY,<br />  hits INTEGER[] NOT NULL<br /> );<br /><br /> Then I
caninsert the results of a query into that table as a new list of hits<br /><br /> INSERT INTO lists (hits)<br />
SELECTarray_agg(id)<br /> FROM some_table<br /> WHERE ...;<br /><br /> Now the problem part. How to best use that array
ofprimary key values to restore the data at a later stage. Conceptually I'm wanting this:<br /><br /> SELECT * from
some_table<br /> WHERE id <is in the list of ids in the array in the lists table>;<br /><br /> These both work by
arereally slow:<br /><br /> SELECT t1.*<br /> FROM some_table t1<br /> WHERE t1.id IN (SELECT unnest(hits) from lists
WHEREid = 2);<br /><br /> SELECT t1.*<br /> FROM some_table t1<br /> JOIN lists l ON t1.id = any(l.hits)<br /> WHERE
l.id= 2;<br /><br /> Is there an efficient way to do this, or is this a dead end?<br /><br /> Thanks<br /> Tim<br /><br
/><br/><br /><br /><br /><br /> 

Re: Querying with arrays

From
Tom Lane
Date:
Tim Dudgeon <tdudgeon.ml@gmail.com> writes:
> I'm considering using arrays to handle managing "lists" of rows (I know 
> this may not be the best approach, but bear with me).

> I create a table for my lists like this:**

> create table lists (
>   id SERIAL PRIMARY KEY,
>   hits INTEGER[] NOT NULL
> );

> Then I can insert the results of a query into that table as a new list 
> of hits

> INSERT INTO lists (hits)
> SELECT array_agg(id)
> FROM some_table
> WHERE ...;

> Now the problem part. How to best use that array of primary key values 
> to restore the data at a later stage. Conceptually I'm wanting this:

> SELECT * from some_table
> WHERE id <is in the list of ids in the array in the lists table>;

> These both work by are really slow:

> SELECT t1.*
> FROM some_table t1
> WHERE t1.id IN (SELECT unnest(hits) from lists WHERE id = 2);

> SELECT t1.*
> FROM some_table t1
> JOIN lists l ON t1.id = any(l.hits)
> WHERE l.id = 2;

> Is there an efficient way to do this, or is this a dead end?

You could create a GIN index on lists.hits and then do

SELECT t1.*
FROM some_table t1
JOIN lists l ON array[t1.id] <@ l.hits
WHERE l.id = 2;

How efficient that will be remains to be determined though;
if the l.id condition will eliminate a lot of matches it
could still be kind of slow.

(ISTR some talk of teaching the planner to convert =ANY(array)
conditions to this form automatically when there's a suitable
index, but for now you'd have to write it out like this.)
        regards, tom lane



Re: Querying with arrays

From
Tim Dudgeon
Date:
On 27/11/2014 14:54, Tom Lane wrote:
> You could create a GIN index on lists.hits and then do
>
> SELECT t1.*
> FROM some_table t1
> JOIN lists l ON array[t1.id] <@ l.hits
> WHERE l.id = 2;
>
> How efficient that will be remains to be determined though;
> if the l.id condition will eliminate a lot of matches it
> could still be kind of slow.
Afraid that was *much* slower.

Tim



Re: Querying with arrays

From
Tim Dudgeon
Date:
Looking into this further I don't seem able to get the index used.
I created this simple example:

create table lists (  id SERIAL PRIMARY KEY,  name VARCHAR(32) NOT NULL,  hits INTEGER[] NOT NULL
);

CREATE INDEX idx_lists_hits ON lists USING gin (hits);

INSERT INTO lists (name, hits) VALUES ('list1-10', 
ARRAY[1,2,3,4,5,6,7,8,9,10]);

explain analyze SELECT id, name FROM lists
WHERE hits @> array[7];


The plan for the query is this:

"Seq Scan on lists  (cost=0.00..16.88 rows=3 width=86) (actual 
time=0.006..0.008 rows=1 loops=1)"
"  Filter: (hits @> '{7}'::integer[])"
"Planning time: 0.058 ms"
"Execution time: 0.025 ms"

What am I doing wrong?

Tim




On 27/11/2014 11:54, Tom Lane wrote:
> Tim Dudgeon <tdudgeon.ml@gmail.com> writes:
>> I'm considering using arrays to handle managing "lists" of rows (I know
>> this may not be the best approach, but bear with me).
>> I create a table for my lists like this:**
>> create table lists (
>>    id SERIAL PRIMARY KEY,
>>    hits INTEGER[] NOT NULL
>> );
>> Then I can insert the results of a query into that table as a new list
>> of hits
>> INSERT INTO lists (hits)
>> SELECT array_agg(id)
>> FROM some_table
>> WHERE ...;
>> Now the problem part. How to best use that array of primary key values
>> to restore the data at a later stage. Conceptually I'm wanting this:
>> SELECT * from some_table
>> WHERE id <is in the list of ids in the array in the lists table>;
>> These both work by are really slow:
>> SELECT t1.*
>> FROM some_table t1
>> WHERE t1.id IN (SELECT unnest(hits) from lists WHERE id = 2);
>> SELECT t1.*
>> FROM some_table t1
>> JOIN lists l ON t1.id = any(l.hits)
>> WHERE l.id = 2;
>> Is there an efficient way to do this, or is this a dead end?
> You could create a GIN index on lists.hits and then do
>
> SELECT t1.*
> FROM some_table t1
> JOIN lists l ON array[t1.id] <@ l.hits
> WHERE l.id = 2;
>
> How efficient that will be remains to be determined though;
> if the l.id condition will eliminate a lot of matches it
> could still be kind of slow.
>
> (ISTR some talk of teaching the planner to convert =ANY(array)
> conditions to this form automatically when there's a suitable
> index, but for now you'd have to write it out like this.)
>
>             regards, tom lane




Re: Querying with arrays

From
Achilleas Mantzios
Date:
On 04/12/2014 15:42, Tim Dudgeon wrote:
> Looking into this further I don't seem able to get the index used.
> I created this simple example:
>
> create table lists (
>   id SERIAL PRIMARY KEY,
>   name VARCHAR(32) NOT NULL,
>   hits INTEGER[] NOT NULL
> );
>
> CREATE INDEX idx_lists_hits ON lists USING gin (hits);
>
> INSERT INTO lists (name, hits) VALUES ('list1-10', ARRAY[1,2,3,4,5,6,7,8,9,10]);
>
> explain analyze SELECT id, name FROM lists
> WHERE hits @> array[7];
>
>
> The plan for the query is this:
>
> "Seq Scan on lists  (cost=0.00..16.88 rows=3 width=86) (actual time=0.006..0.008 rows=1 loops=1)"
> "  Filter: (hits @> '{7}'::integer[])"
> "Planning time: 0.058 ms"
> "Execution time: 0.025 ms"
>
> What am I doing wrong?
>

Maybe your test table is tiny?

> Tim
>
>
>
>
> On 27/11/2014 11:54, Tom Lane wrote:
>> Tim Dudgeon <tdudgeon.ml@gmail.com> writes:
>>> I'm considering using arrays to handle managing "lists" of rows (I know
>>> this may not be the best approach, but bear with me).
>>> I create a table for my lists like this:**
>>> create table lists (
>>>    id SERIAL PRIMARY KEY,
>>>    hits INTEGER[] NOT NULL
>>> );
>>> Then I can insert the results of a query into that table as a new list
>>> of hits
>>> INSERT INTO lists (hits)
>>> SELECT array_agg(id)
>>> FROM some_table
>>> WHERE ...;
>>> Now the problem part. How to best use that array of primary key values
>>> to restore the data at a later stage. Conceptually I'm wanting this:
>>> SELECT * from some_table
>>> WHERE id <is in the list of ids in the array in the lists table>;
>>> These both work by are really slow:
>>> SELECT t1.*
>>> FROM some_table t1
>>> WHERE t1.id IN (SELECT unnest(hits) from lists WHERE id = 2);
>>> SELECT t1.*
>>> FROM some_table t1
>>> JOIN lists l ON t1.id = any(l.hits)
>>> WHERE l.id = 2;
>>> Is there an efficient way to do this, or is this a dead end?
>> You could create a GIN index on lists.hits and then do
>>
>> SELECT t1.*
>> FROM some_table t1
>> JOIN lists l ON array[t1.id] <@ l.hits
>> WHERE l.id = 2;
>>
>> How efficient that will be remains to be determined though;
>> if the l.id condition will eliminate a lot of matches it
>> could still be kind of slow.
>>
>> (ISTR some talk of teaching the planner to convert =ANY(array)
>> conditions to this form automatically when there's a suitable
>> index, but for now you'd have to write it out like this.)
>>
>>             regards, tom lane
>
>
>


-- 
Achilleas Mantzios
Head of IT DEV
IT DEPT
Dynacom Tankers Mgmt




Re: Querying with arrays

From
Gerardo Herzig
Date:
> 
> The plan for the query is this:
> 
> "Seq Scan on lists  (cost=0.00..16.88 rows=3 width=86) (actual
> time=0.006..0.008 rows=1 loops=1)"
> "  Filter: (hits @> '{7}'::integer[])"
> "Planning time: 0.058 ms"
> "Execution time: 0.025 ms"
> 
> What am I doing wrong?
> 
> Tim
> 

With so few rows to read, it is actually slower to do a index scan followed by the table scan (in order to actually
readthe data).
 
Thats why it is using a seq scan.

Gerardo