Re: [HACKERS] Index greater than 8k - Mailing list pgsql-general

From Joshua D. Drake
Subject Re: [HACKERS] Index greater than 8k
Date
Msg-id 454828A8.3020105@commandprompt.com
Whole thread Raw
In response to Re: [HACKERS] Index greater than 8k  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: [HACKERS] Index greater than 8k
List pgsql-general
>> We are not storing bytea, a customer is. We are trying to work around
>> customer requirements. The data that is being stored is not always text,
>> sometimes it is binary (a flash file or jpeg). We are using escaped text
>> to be able to search the string contents of that file .
>
> Hmm, have you tried to create a functional trigram index on the
> equivalent of "strings(bytea_column)" or something like that?

I did consider that. I wonder what size we are going to deal with
though. Part of the problem is that some of the data we are dealing with
 is quite large.

>
> I imagine strings(bytea) would be a function that returns the
> concatenation of all pure (7 bit) ASCII strings in the byte sequence.
>
> On the other hand, based on Teodor's comment on pg_trgm, maybe this
> won't be possible at all.
>> Yes we do (and can) expect to find text among the bytes. We have
>> searches running, we are just running into the maximum size issues for
>> certain rows.
>
> Do you mean you actually find stuff based on text attributes in JPEG
> images and the like?  I thought those were compressed ...

Well a jpeg is probably a bad example, but yes they do search jpeg, I am
guessing mostly for header information. A better example would be
postscript files, flash files and of course large amounts of text + Html.

Sincerely,

Joshua D. Drake




--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Encoding, Unicode, locales, etc.
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Index greater than 8k