Thread: Querying with arrays
I'm considering using arrays to handle managing "lists" of rows (I know this may not be the best approach, but bear withme).<br /><br /> I create a table for my lists like this:<b id="docs-internal-guid-b52591dd-f159-1b1d-dee1-e0c932294de6"style="font-weight:normal;"><span style="font-size:15px;font-family:Arial;color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></span></b><br /><br/> create table lists (<br /> id SERIAL PRIMARY KEY,<br /> hits INTEGER[] NOT NULL<br /> );<br /><br /> Then I caninsert the results of a query into that table as a new list of hits<br /><br /> INSERT INTO lists (hits)<br /> SELECTarray_agg(id)<br /> FROM some_table<br /> WHERE ...;<br /><br /> Now the problem part. How to best use that array ofprimary key values to restore the data at a later stage. Conceptually I'm wanting this:<br /><br /> SELECT * from some_table<br /> WHERE id <is in the list of ids in the array in the lists table>;<br /><br /> These both work by arereally slow:<br /><br /> SELECT t1.*<br /> FROM some_table t1<br /> WHERE t1.id IN (SELECT unnest(hits) from lists WHEREid = 2);<br /><br /> SELECT t1.*<br /> FROM some_table t1<br /> JOIN lists l ON t1.id = any(l.hits)<br /> WHERE l.id= 2;<br /><br /> Is there an efficient way to do this, or is this a dead end?<br /><br /> Thanks<br /> Tim<br /><br /><br/><br /><br /><br /><br />
Tim Dudgeon <tdudgeon.ml@gmail.com> writes: > I'm considering using arrays to handle managing "lists" of rows (I know > this may not be the best approach, but bear with me). > I create a table for my lists like this:** > create table lists ( > id SERIAL PRIMARY KEY, > hits INTEGER[] NOT NULL > ); > Then I can insert the results of a query into that table as a new list > of hits > INSERT INTO lists (hits) > SELECT array_agg(id) > FROM some_table > WHERE ...; > Now the problem part. How to best use that array of primary key values > to restore the data at a later stage. Conceptually I'm wanting this: > SELECT * from some_table > WHERE id <is in the list of ids in the array in the lists table>; > These both work by are really slow: > SELECT t1.* > FROM some_table t1 > WHERE t1.id IN (SELECT unnest(hits) from lists WHERE id = 2); > SELECT t1.* > FROM some_table t1 > JOIN lists l ON t1.id = any(l.hits) > WHERE l.id = 2; > Is there an efficient way to do this, or is this a dead end? You could create a GIN index on lists.hits and then do SELECT t1.* FROM some_table t1 JOIN lists l ON array[t1.id] <@ l.hits WHERE l.id = 2; How efficient that will be remains to be determined though; if the l.id condition will eliminate a lot of matches it could still be kind of slow. (ISTR some talk of teaching the planner to convert =ANY(array) conditions to this form automatically when there's a suitable index, but for now you'd have to write it out like this.) regards, tom lane
On 27/11/2014 14:54, Tom Lane wrote: > You could create a GIN index on lists.hits and then do > > SELECT t1.* > FROM some_table t1 > JOIN lists l ON array[t1.id] <@ l.hits > WHERE l.id = 2; > > How efficient that will be remains to be determined though; > if the l.id condition will eliminate a lot of matches it > could still be kind of slow. Afraid that was *much* slower. Tim
Looking into this further I don't seem able to get the index used. I created this simple example: create table lists ( id SERIAL PRIMARY KEY, name VARCHAR(32) NOT NULL, hits INTEGER[] NOT NULL ); CREATE INDEX idx_lists_hits ON lists USING gin (hits); INSERT INTO lists (name, hits) VALUES ('list1-10', ARRAY[1,2,3,4,5,6,7,8,9,10]); explain analyze SELECT id, name FROM lists WHERE hits @> array[7]; The plan for the query is this: "Seq Scan on lists (cost=0.00..16.88 rows=3 width=86) (actual time=0.006..0.008 rows=1 loops=1)" " Filter: (hits @> '{7}'::integer[])" "Planning time: 0.058 ms" "Execution time: 0.025 ms" What am I doing wrong? Tim On 27/11/2014 11:54, Tom Lane wrote: > Tim Dudgeon <tdudgeon.ml@gmail.com> writes: >> I'm considering using arrays to handle managing "lists" of rows (I know >> this may not be the best approach, but bear with me). >> I create a table for my lists like this:** >> create table lists ( >> id SERIAL PRIMARY KEY, >> hits INTEGER[] NOT NULL >> ); >> Then I can insert the results of a query into that table as a new list >> of hits >> INSERT INTO lists (hits) >> SELECT array_agg(id) >> FROM some_table >> WHERE ...; >> Now the problem part. How to best use that array of primary key values >> to restore the data at a later stage. Conceptually I'm wanting this: >> SELECT * from some_table >> WHERE id <is in the list of ids in the array in the lists table>; >> These both work by are really slow: >> SELECT t1.* >> FROM some_table t1 >> WHERE t1.id IN (SELECT unnest(hits) from lists WHERE id = 2); >> SELECT t1.* >> FROM some_table t1 >> JOIN lists l ON t1.id = any(l.hits) >> WHERE l.id = 2; >> Is there an efficient way to do this, or is this a dead end? > You could create a GIN index on lists.hits and then do > > SELECT t1.* > FROM some_table t1 > JOIN lists l ON array[t1.id] <@ l.hits > WHERE l.id = 2; > > How efficient that will be remains to be determined though; > if the l.id condition will eliminate a lot of matches it > could still be kind of slow. > > (ISTR some talk of teaching the planner to convert =ANY(array) > conditions to this form automatically when there's a suitable > index, but for now you'd have to write it out like this.) > > regards, tom lane
On 04/12/2014 15:42, Tim Dudgeon wrote: > Looking into this further I don't seem able to get the index used. > I created this simple example: > > create table lists ( > id SERIAL PRIMARY KEY, > name VARCHAR(32) NOT NULL, > hits INTEGER[] NOT NULL > ); > > CREATE INDEX idx_lists_hits ON lists USING gin (hits); > > INSERT INTO lists (name, hits) VALUES ('list1-10', ARRAY[1,2,3,4,5,6,7,8,9,10]); > > explain analyze SELECT id, name FROM lists > WHERE hits @> array[7]; > > > The plan for the query is this: > > "Seq Scan on lists (cost=0.00..16.88 rows=3 width=86) (actual time=0.006..0.008 rows=1 loops=1)" > " Filter: (hits @> '{7}'::integer[])" > "Planning time: 0.058 ms" > "Execution time: 0.025 ms" > > What am I doing wrong? > Maybe your test table is tiny? > Tim > > > > > On 27/11/2014 11:54, Tom Lane wrote: >> Tim Dudgeon <tdudgeon.ml@gmail.com> writes: >>> I'm considering using arrays to handle managing "lists" of rows (I know >>> this may not be the best approach, but bear with me). >>> I create a table for my lists like this:** >>> create table lists ( >>> id SERIAL PRIMARY KEY, >>> hits INTEGER[] NOT NULL >>> ); >>> Then I can insert the results of a query into that table as a new list >>> of hits >>> INSERT INTO lists (hits) >>> SELECT array_agg(id) >>> FROM some_table >>> WHERE ...; >>> Now the problem part. How to best use that array of primary key values >>> to restore the data at a later stage. Conceptually I'm wanting this: >>> SELECT * from some_table >>> WHERE id <is in the list of ids in the array in the lists table>; >>> These both work by are really slow: >>> SELECT t1.* >>> FROM some_table t1 >>> WHERE t1.id IN (SELECT unnest(hits) from lists WHERE id = 2); >>> SELECT t1.* >>> FROM some_table t1 >>> JOIN lists l ON t1.id = any(l.hits) >>> WHERE l.id = 2; >>> Is there an efficient way to do this, or is this a dead end? >> You could create a GIN index on lists.hits and then do >> >> SELECT t1.* >> FROM some_table t1 >> JOIN lists l ON array[t1.id] <@ l.hits >> WHERE l.id = 2; >> >> How efficient that will be remains to be determined though; >> if the l.id condition will eliminate a lot of matches it >> could still be kind of slow. >> >> (ISTR some talk of teaching the planner to convert =ANY(array) >> conditions to this form automatically when there's a suitable >> index, but for now you'd have to write it out like this.) >> >> regards, tom lane > > > -- Achilleas Mantzios Head of IT DEV IT DEPT Dynacom Tankers Mgmt
> > The plan for the query is this: > > "Seq Scan on lists (cost=0.00..16.88 rows=3 width=86) (actual > time=0.006..0.008 rows=1 loops=1)" > " Filter: (hits @> '{7}'::integer[])" > "Planning time: 0.058 ms" > "Execution time: 0.025 ms" > > What am I doing wrong? > > Tim > With so few rows to read, it is actually slower to do a index scan followed by the table scan (in order to actually readthe data). Thats why it is using a seq scan. Gerardo