Thread: index row requires 10040 bytes, maximum size is 8191
Hi all -
I am trying to create an index on character varying field. The column is just character varying with no limit. I am getting the following error " index row requires 10040 bytes, maximum size is 8191"
What can I do the postgres.conf to handle this error? Appreciate your help
Regards
On Fri, 2010-11-12 at 15:47 -0500, akp geek wrote: > Hi all - > > I am trying to create an index on character varying field. > The > column is just character varying with no limit. I am getting the > following > error " index row requires 10040 bytes, maximum size is 8191" > What can I do the postgres.conf to handle this error? Appreciate your > help You can't. You could create a index on "part" of the data or use full text. JD > > Regards -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On 11/12/2010 12:52 PM, Joshua D. Drake wrote: > On Fri, 2010-11-12 at 15:47 -0500, akp geek wrote: > >> Hi all - >> >> I am trying to create an index on character varying field. >> The >> column is just character varying with no limit. I am getting the >> following >> error " index row requires 10040 bytes, maximum size is 8191" >> What can I do the postgres.conf to handle this error? Appreciate your >> help >> > You can't. You could create a index on "part" of the data or use full > text. > > JD > > In some cases, depending on needs, indexing on the md5 hash of such data can be useful. Cheers, Steve
On 13/11/2010 4:52 AM, Joshua D. Drake wrote: > On Fri, 2010-11-12 at 15:47 -0500, akp geek wrote: >> Hi all - >> >> I am trying to create an index on character varying field. >> The >> column is just character varying with no limit. I am getting the >> following >> error " index row requires 10040 bytes, maximum size is 8191" >> What can I do the postgres.conf to handle this error? Appreciate your >> help > > You can't. You could create a index on "part" of the data or use full > text. Ouch, really? I'd always assumed that btree indexes of big TOASTed values would do a prefix match check then recheck against the heap if there's a match. More fool me for making such an assumption. This doesn't seem like a problem when dealing with fields that're meant to contain big blobs of text, but it's a strong contraindication for the advice usually given on this list to avoid varchar(n) in favour of "text". If a "text" field >8kb in an indexed column will be rejected because it cannot be indexed, that's a reason to set an explicit limit. Additionally, not having such constraints would make it much harder to *add* indexes to "text" columns not already indexed. craig=> create table test ( x text ); craig=> create index test_x on test(x); craig=> insert into test(x) values ( repeat('x', 9000) ); INSERT 0 1 craig=> insert into test(x) values ( repeat('x', 90000) ); INSERT 0 1 craig=> insert into test(x) values ( repeat('x', 900000) ); ERROR: index row requires 10324 bytes, maximum size is 8191 It seems like an index method that's smart enough to do prefix-and-hash comparision, then a heap recheck, would be ideal. It's not really a common enough issue to be a TODO item, though, as this is the first time I've seen it come up on the list. Thoughts, folks? Does this matter in practice, since anything you'd want to index will in practice be small enough or a candidate for full-text indexing? -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On Sat, 2010-11-13 at 09:48 +0800, Craig Ringer wrote: > > Thoughts, folks? Does this matter in practice, since anything you'd want > to index will in practice be small enough or a candidate for full-text > indexing? > I have run into this problem maybe 3 times in my whole career, precisely because if you are dealing with text that big, you move to full text search. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
"Joshua D. Drake" <jd@commandprompt.com> writes: > On Sat, 2010-11-13 at 09:48 +0800, Craig Ringer wrote: >> Thoughts, folks? Does this matter in practice, since anything you'd want >> to index will in practice be small enough or a candidate for full-text >> indexing? > I have run into this problem maybe 3 times in my whole career, precisely > because if you are dealing with text that big, you move to full text > search. Yeah, the real question here is exactly what do you think a btree index on a large text column will get you? It seems fairly unlikely that either simple equality or simple range checks are very useful for such data. I guess there's some use case for uniqueness checks, which we've seen people approximate by unique-indexing the md5 hash of the column value. BTW, the 8K limit applies after possible in-line compression, so the actual data value causing the failure was likely considerably longer than 10K. regards, tom lane
On 11/13/2010 11:15 AM, Tom Lane wrote: > "Joshua D. Drake"<jd@commandprompt.com> writes: >> On Sat, 2010-11-13 at 09:48 +0800, Craig Ringer wrote: >>> Thoughts, folks? Does this matter in practice, since anything you'd want >>> to index will in practice be small enough or a candidate for full-text >>> indexing? > >> I have run into this problem maybe 3 times in my whole career, precisely >> because if you are dealing with text that big, you move to full text >> search. > > Yeah, the real question here is exactly what do you think a btree index > on a large text column will get you? About the only useful case I can see is with text data of very irregular size. The vast majority is small, but there are a few massively bigger items. It'd be nice if the index method had a fallback for items too big to index in this case, such as a prefix match and heap recheck. Of course, I've never run into this in practice, and if I did I'd be wondering if I had my schema design quite right. I can't imagine that the mostly aesthetic improvement of eliminating this indexing limitation would be worth the effort. I'd never ask or want anyone to waste their time on it, and don't intend to myself. Most of the interesting "big text" indexing problems are solved by tsearch and/or functional indexes. -- Craig Ringer
Thanks for all your valuable thoughts . It took me a while to read all your suggestions , still trying to understand it fully.
What we do with the text that I have mentioned is, we have search functionality on the text. The user enters some keywords and then the application should be able to search for all the text that matches the key word.
On Sat, Nov 13, 2010 at 7:21 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 11/13/2010 11:15 AM, Tom Lane wrote:About the only useful case I can see is with text data of very irregular size. The vast majority is small, but there are a few massively bigger items. It'd be nice if the index method had a fallback for items too big to index in this case, such as a prefix match and heap recheck."Joshua D. Drake"<jd@commandprompt.com> writes:On Sat, 2010-11-13 at 09:48 +0800, Craig Ringer wrote:Thoughts, folks? Does this matter in practice, since anything you'd want
to index will in practice be small enough or a candidate for full-text
indexing?I have run into this problem maybe 3 times in my whole career, precisely
because if you are dealing with text that big, you move to full text
search.
Yeah, the real question here is exactly what do you think a btree index
on a large text column will get you?
Of course, I've never run into this in practice, and if I did I'd be wondering if I had my schema design quite right. I can't imagine that the mostly aesthetic improvement of eliminating this indexing limitation would be worth the effort. I'd never ask or want anyone to waste their time on it, and don't intend to myself. Most of the interesting "big text" indexing problems are solved by tsearch and/or functional indexes.
--
Craig Ringer
akp geek <akpgeek@gmail.com> writes: > What we do with the text that I have mentioned is, we have > search functionality on the text. The user enters some keywords and then the > application should be able to search for all the text that matches the key > word. Well, a btree index on the entire string is approximately useless for that anyway. Look into the text search functionality. regards, tom lane
Craig Ringer writes: > >Thoughts, folks? Does this matter in practice, since anything you'd > want to index will in practice be small enough or a candidate for > full-text indexing? Here's my case: the field is meant to hold a person's name, so it's usually well under the 8191-byte limit. I want it indexed so that I can sort on it quickly. I also want it to be robust against abuse, so if someone does enter the text of _War and Peace_ as their name, I want to handle that gracefully. I can achieve that by my own programming, outside of postgresql, and that's fine if I'm the only one who has gotten into this fix. Otoh if it's a common problem then there's a reason why removing the limitation might matter in practice. Regards, --- Michael
On Sun, Sep 18, 2011 at 11:07 PM, Michael Shepanski <michael.shepanski@netpage.com> wrote: > Craig Ringer writes: >> >>Thoughts, folks? Does this matter in practice, since anything you'd >> want to index will in practice be small enough or a candidate for >> full-text indexing? > > Here's my case: the field is meant to hold a person's name, so it's > usually well under the 8191-byte limit. I want it indexed so that I can > sort on it quickly. I also want it to be robust against abuse, so if > someone does enter the text of _War and Peace_ as their name, > I want to handle that gracefully. > > I can achieve that by my own programming, outside of postgresql, > and that's fine if I'm the only one who has gotten into this fix. Otoh > if it's a common problem then there's a reason why removing the > limitation might matter in practice. What if someone enters the text of war and piece * 1,000,000,000 or so? or * 10E26 or something. There has to be a limit where you tell the user, usually sooner rather than later, that some size is just too big. And in the case of a name field, 1,000 is probably a realistically high enough max that it should never be a problem, unless the name is one of the candidates for the very silly party. Allowing users to input some great big huge chunk of text is opening yourself up to a denial of service scenario. someone writes a bot that submits names of 100M over and over, and watch the system just fall over from network, io, or memory overload.