Thread: integer[] indexing.

integer[] indexing.

From
Dawid Kuroczko
Date:
I have a large table with a column:
ids integer[] not null

most of these entries (over 95%) contain only one array element, some
can contain up to 10 array elements.  seqscan is naturally slow.  GIST
on int_array works nice, but GIST isn't exactly a speed daemon when it
comes to updating.

So I thought, why not create partial indexes?

CREATE INDEX one_element_array_index ON table ((ids[1])) WHERE icount(ids) <= 1;
CREATE INDEX many_element_array_index ON table USING GIST (ids) WHERE
icount(ids) > 1;

Now, if I select WHERE icount(ids) <= 1 AND ids[1] = 33 I get
lightning fast results.
If I select WHERE icount(ids) > 1 AND ids && '{33}' -- I get them even faster.

But when I phrase the query:

SELECT * FROM table WHERE (icount(ids) <= 1 AND ids[1] = 33) OR
(icount(ids) > 1 AND ids && '{33}');

Planner insists on using seqscan.  Even with enable_seqscan = off;

Any hints, comments? :)  [ I think thsese partial indexes take best of
two worlds, only if planner wanted to take advantage of it... :) ]

   Regards,
       Dawid

Re: integer[] indexing.

From
Pierre-Frédéric Caillaud
Date:
    disclaimer : brainless proposition

(SELECT * FROM table WHERE (icount(ids) <= 1 AND ids[1] = 33)
UNION ALL
(SELECT * FROM table WHERE (icount(ids) > 1 AND ids && '{33}'));



Re: integer[] indexing.

From
Harald Fuchs
Date:
In article <opsfjonlc0cq72hf@musicbox>,
=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?= <lists@boutiquenumerique.com> writes:

>     disclaimer : brainless proposition

> (SELECT * FROM table WHERE (icount(ids) <= 1 AND ids[1] = 33)
> UNION ALL
> (SELECT * FROM table WHERE (icount(ids) > 1 AND ids && '{33}'));

I guess my proposition is even more brainless :-)

If 95% of all records have only one value, how about putting the first
(and most often only) value into a separate column with a btree index
on it?  Something like that:

  CREATE TABLE tbl (
    -- other columns
    id1 INT NOT NULL,
    idN INT[] NULL
  );

  CREATE INDEX tbl_id1_ix ON tbl (id1);

If id1 is selective enough, you probably don't need another index on idn.

Re: integer[] indexing.

From
Tom Lane
Date:
Dawid Kuroczko <qnex42@gmail.com> writes:
> But when I phrase the query:

> SELECT * FROM table WHERE (icount(ids) <= 1 AND ids[1] = 33) OR
> (icount(ids) > 1 AND ids && '{33}');

> Planner insists on using seqscan.  Even with enable_seqscan = off;

The OR-index-scan mechanism isn't currently smart enough to use partial
indexes that are only valid for some of the OR'd clauses rather than all
of them.  Feel free to fix it ;-).  (This might not even be very hard;
I haven't looked.)

            regards, tom lane