Thread: Values larger than 1/3 of a buffer page cannot be indexed.

Values larger than 1/3 of a buffer page cannot be indexed.

From
Viktor Nagy
Date:
hi,

when trying to insert a long-long value, I get the following error:

 index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

is there a way to generate this recommended function index of an md5 hash on an already existing database and tables (filled with data)?

thanks,
Viktor

Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Dmitriy Igrishin
Date:
Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>
hi,

when trying to insert a long-long value, I get the following error:

 index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

is there a way to generate this recommended function index of an md5 hash on an already existing database and tables (filled with data)?
Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

thanks,
Viktor



--
// Dmitriy.


Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Brian Hirt
Date:
On Mar 13, 2011, at 12:05 PM, Dmitriy Igrishin wrote:

Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>
hi,

when trying to insert a long-long value, I get the following error:

 index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

is there a way to generate this recommended function index of an md5 hash on an already existing database and tables (filled with data)?
Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

This probably goes without saying, but you'll have to use col = md5('blahblahblahblah') in your qualifiers to get the benefit of the index.  

--brian

Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Rob Sargent
Date:

Brian Hirt wrote:
> On Mar 13, 2011, at 12:05 PM, Dmitriy Igrishin wrote:
>
>> Hey Viktor,
>>
>> 2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu
>> <mailto:viktor.nagy@toolpart.hu>>
>>
>>     hi,
>>
>>     when trying to insert a long-long value, I get the following error:
>>
>>      index row size 3120 exceeds maximum 2712 for index
>>     "ir_translation_ltns"
>>     HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
>>     Consider a function index of an MD5 hash of the value, or use
>>     full text indexing.
>>
>>     is there a way to generate this recommended function index of an
>>     md5 hash on an already existing database and tables (filled with
>>     data)?
>>
>> Just create index this way, e.g.
>> CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
>> where "tab" and "col" are table and column of which you want
>> to create btree index.
>
> This probably goes without saying, but you'll have to use col =
> md5('blahblahblahblah') in your qualifiers to get the benefit of the
> index.
>
> --brian

Unless the point is to guarantee uniqueness of the "long-long value"s.


Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Merlin Moncure
Date:
On Sun, Mar 13, 2011 at 4:37 PM, Rob Sargent <robjsargent@gmail.com> wrote:
>
>
> Brian Hirt wrote:
>>
>> On Mar 13, 2011, at 12:05 PM, Dmitriy Igrishin wrote:
>>
>>> Hey Viktor,
>>>
>>> 2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu
>>> <mailto:viktor.nagy@toolpart.hu>>
>>>
>>>    hi,
>>>
>>>    when trying to insert a long-long value, I get the following error:
>>>
>>>     index row size 3120 exceeds maximum 2712 for index
>>>    "ir_translation_ltns"
>>>    HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
>>>    Consider a function index of an MD5 hash of the value, or use
>>>    full text indexing.
>>>
>>>    is there a way to generate this recommended function index of an
>>>    md5 hash on an already existing database and tables (filled with
>>>    data)?
>>>
>>> Just create index this way, e.g.
>>> CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
>>> where "tab" and "col" are table and column of which you want
>>> to create btree index.
>>
>> This probably goes without saying, but you'll have to use col =
>> md5('blahblahblahblah') in your qualifiers to get the benefit of the index.
>>
>> --brian
>
> Unless the point is to guarantee uniqueness of the "long-long value"s.

md5 will do that too: the main thing you lose going to hash indexing
is ordering.

merlin

Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Viktor Nagy
Date:
thanks, this worked.

On Sun, Mar 13, 2011 at 7:05 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>
hi,

when trying to insert a long-long value, I get the following error:

 index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

is there a way to generate this recommended function index of an md5 hash on an already existing database and tables (filled with data)?
Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

thanks,
Viktor



--
// Dmitriy.



Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Craig Ringer
Date:
On 03/14/2011 09:25 PM, Merlin Moncure wrote:

>> Unless the point is to guarantee uniqueness of the "long-long value"s.
>
> md5 will do that too: the main thing you lose going to hash indexing
> is ordering.

MD5 will *probably* guarantee the uniqueness of the values. Personally
I'm not a big fan of betting on that, and tend to like to test for
equality against the hash first and then, if the hashes are equal,
against the values.

I have several md5 collisions in one of my tables, so it's far from
impossible. In fact, it's much more likely than you'd expect thanks to
the birthday paradox.

--
Craig Ringer

Re: Values larger than 1/3 of a buffer page cannot be indexed.

From
Merlin Moncure
Date:
On Mon, Mar 28, 2011 at 10:28 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 03/14/2011 09:25 PM, Merlin Moncure wrote:
>
>>> Unless the point is to guarantee uniqueness of the "long-long value"s.
>>
>> md5 will do that too: the main thing you lose going to hash indexing
>> is ordering.
>
> MD5 will *probably* guarantee the uniqueness of the values. Personally I'm
> not a big fan of betting on that, and tend to like to test for equality
> against the hash first and then, if the hashes are equal, against the
> values.
>
> I have several md5 collisions in one of my tables, so it's far from
> impossible. In fact, it's much more likely than you'd expect thanks to the
> birthday paradox.

I'll agree that 128 bit key is just on the cusp of discomfort if you
have a large data set and/or want to be really safe from random
collision, but any larger key and it's just worry for the sake of
worrying.

You mentioned the birthday attack -- let's consult the chart.  For MD5
(128 bits), if you are ok with 10^-15 chance of *one* collision (odds
1 in 1,000,000,000,000,000), you can safely hash up to 100,000,000,000
unique values.  Of course, md5 is not perfectly distributing so the
numbers are not in fact that good,  but those are pretty tall odds.
If you are concerned about safety, you are much better off jumping to
sha-1 (or sha-256 if you're really nervous)  than comparing the base
string, since your chance of random bump is now well below a number of
other low probability events, such as a drive bit error rate (meaning,
your base string comparison is useless) or getting hit by a comet.

Now, if you're worried about intentionally forged data, that's
different, but it doesn't sound like that's the case here.  When NIST
hash competition resolves (my money is on skein), you'll likely be
able to to just pick an output size that matches your requirements.

merlin