Thread: limit over attribute size if index over it exists
Hi everybody, I have got an issue with PostgreSQL. There is a limitation on the column length of a tuple, in case there is an index over it. In the actual project I am working on, I meet such a situation. I have got an attribute over which I am doing a search (that is, I need an index over it), but this attribute can be in some cases very large (100KB+). The log message I get from Postgres, if I try to insert a tuple with such a big attribute (e.g. 10K) is the following: ERROR: index row requires 15704 bytes, maximum size is 8191 (PostgreSQL 8.07 under Linux. The index is a btree index.) I have thought of a possible workaround. I would like to know if it seems reasonable. The idea would be to build a hash, on the client side, over the problematic column (let's say column a). I then store in the db the attribute a (without index) and the hash(a) (with an index). Then when I am doing a select, I use firstly a sub-select to choose all tuples with the right hash (quick, with index), and then an outer select to choose the tuple with the right attribute a (slow, sequential scan, but normally few tuples, because few collisions). Something like that: SELECT b FROM ( SELECT a, b FROM foo WHERE hash='<hash(a)>' ) as bar WHERE bar.a='<a>' (Actually, in my case the situation is slightly more complicated because I don't have just one attribute but 2+, so there are some index types that I cannot use. Anyway the principle is the same). Does this solution seem reasonable, or is there other (more elegant) ways to do that? Thank you in advance. Cheers, Pat
On Mon, Jun 26, 2006 at 02:52:56AM -0700, pajai wrote: > I have thought of a possible workaround. I would like to know if it > seems reasonable. The idea would be to build a hash, on the client > side, over the problematic column (let's say column a). I then store in > the db the attribute a (without index) and the hash(a) (with an index). > Then when I am doing a select, I use firstly a sub-select to choose all > tuples with the right hash (quick, with index), and then an outer > select to choose the tuple with the right attribute a (slow, sequential > scan, but normally few tuples, because few collisions). Something like > that: Perhaps you should look into functional indexes. Indexes over a function. CREATE INDEX foo_index ON foo( hash(a) ); This index will automatically be used if you make a query like this: ... WHERE hash(a) = 'blah'; Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
"pajai" <patrick.jayet@gmail.com> writes: > I have got an issue with PostgreSQL. There is a limitation on the > column length of a tuple, in case there is an index over it. In the > actual project I am working on, I meet such a situation. I have got an > attribute over which I am doing a search (that is, I need an index over > it), but this attribute can be in some cases very large (100KB+). It sounds to me a lot like you may be in need of full-text-index code --- see contrib/tsearch2. If you were not trying to roll-your-own text searcher, please give more details. I can hardly imagine a situation in which it is useful to make a btree index on 100KB values. regards, tom lane