Thread: limit over attribute size if index over it exists

limit over attribute size if index over it exists

From
"pajai"
Date:
Hi everybody,

I have got an issue with PostgreSQL. There is a limitation on the
column length of a tuple, in case there is an index over it. In the
actual project I am working on, I meet such a situation. I have got an
attribute over which I am doing a search (that is, I need an index over
it), but this attribute can be in some cases very large (100KB+).

The log message I get from Postgres, if I try to insert a tuple with
such a big attribute (e.g. 10K) is the following:

ERROR:   index row requires 15704 bytes, maximum size is 8191

(PostgreSQL 8.07 under Linux. The index is a btree index.)

I have thought of a possible workaround. I would like to know if it
seems reasonable. The idea would be to build a hash, on the client
side, over the problematic column (let's say column a). I then store in
the db the attribute a (without index) and the hash(a) (with an index).
Then when I am doing a select, I use firstly a sub-select to choose all
tuples with the right hash (quick, with index), and then an outer
select to choose the tuple with the right attribute a (slow, sequential
scan, but normally few tuples, because few collisions). Something like
that:

SELECT b
FROM (
  SELECT a, b
  FROM foo
  WHERE hash='<hash(a)>'
) as bar
WHERE bar.a='<a>'

(Actually, in my case the situation is slightly more complicated
because I don't have just one attribute but 2+, so there are some index
types that I cannot use. Anyway the principle is the same).

Does this solution seem reasonable, or is there other (more elegant)
ways to do that?

Thank you in advance.
Cheers,

Pat


Re: limit over attribute size if index over it exists

From
Martijn van Oosterhout
Date:
On Mon, Jun 26, 2006 at 02:52:56AM -0700, pajai wrote:
> I have thought of a possible workaround. I would like to know if it
> seems reasonable. The idea would be to build a hash, on the client
> side, over the problematic column (let's say column a). I then store in
> the db the attribute a (without index) and the hash(a) (with an index).
> Then when I am doing a select, I use firstly a sub-select to choose all
> tuples with the right hash (quick, with index), and then an outer
> select to choose the tuple with the right attribute a (slow, sequential
> scan, but normally few tuples, because few collisions). Something like
> that:

Perhaps you should look into functional indexes. Indexes over a
function.

CREATE INDEX foo_index ON foo( hash(a) );

This index will automatically be used if you make a query like this:

... WHERE hash(a) = 'blah';

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: limit over attribute size if index over it exists

From
Tom Lane
Date:
"pajai" <patrick.jayet@gmail.com> writes:
> I have got an issue with PostgreSQL. There is a limitation on the
> column length of a tuple, in case there is an index over it. In the
> actual project I am working on, I meet such a situation. I have got an
> attribute over which I am doing a search (that is, I need an index over
> it), but this attribute can be in some cases very large (100KB+).

It sounds to me a lot like you may be in need of full-text-index code ---
see contrib/tsearch2.  If you were not trying to roll-your-own text
searcher, please give more details.  I can hardly imagine a situation in
which it is useful to make a btree index on 100KB values.

            regards, tom lane