Thread: Maximum document-size of text-search?

Maximum document-size of text-search?

From

Andreas Joseph Krogh

Date:

22 July 2010, 10:31:44

Hi.
I'm trying to index the contents of word-documents, extracted text,
which leads to quite large documents sometimes. This resutls in the
following Exception:
Caused by: org.postgresql.util.PSQLException: ERROR: index row requires
10376 bytes, maximum size is 8191

I have the following schema:
andreak=# \d origo_search_index
                                        Table "public.origo_search_index"
           Column          |       Type
|                            Modifiers
--------------------------+-------------------+-----------------------------------------------------------------
  id                       | integer           | not null default
nextval('origo_search_index_id_seq'::regclass)
  entity_id                | integer           | not null
  entity_type              | character varying | not null
  field                    | character varying | not null
  search_value             | character varying | not null
  textsearchable_index_col | tsvector          |

     "origo_search_index_fts_idx" gin (textsearchable_index_col)

Triggers:
     update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON
origo_search_index FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger('textsearchable_index_col',
'pg_catalog.english', 'search_value')

I store all the text extracted from the documents in "search_value" and
have the built-in trigger tsvector_update_trigger update the
tsvector-column.

Any hints on how to get around this issue to allow indexing large
documents? I don't see how "only index the first N bytes of the
document" would be of interest to anyone...

BTW: I'm using PG-9.0beta3

--
Andreas Joseph Krogh<andreak@officenet.no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Rosenholmveien 25       | know how to do a thing and to watch         |
1414 Trollåsen          | somebody else doing it wrong, without       |
NORWAY                  | comment.                                    |
                         |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+

RESOLVED: Re: Maximum document-size of text-search?

From

Andreas Joseph Krogh

Date:

22 July 2010, 10:55:26

On 07/22/2010 03:31 PM, Andreas Joseph Krogh wrote:
> Hi.
> I'm trying to index the contents of word-documents, extracted text,
> which leads to quite large documents sometimes. This resutls in the
> following Exception:
> Caused by: org.postgresql.util.PSQLException: ERROR: index row
> requires 10376 bytes, maximum size is 8191
>
> I have the following schema:
> andreak=# \d origo_search_index
>                                        Table "public.origo_search_index"
>           Column          |       Type
> |                            Modifiers
> --------------------------+-------------------+-----------------------------------------------------------------
>
>  id                       | integer           | not null default
> nextval('origo_search_index_id_seq'::regclass)
>  entity_id                | integer           | not null
>  entity_type              | character varying | not null
>  field                    | character varying | not null
>  search_value             | character varying | not null
>  textsearchable_index_col | tsvector          |
>
>     "origo_search_index_fts_idx" gin (textsearchable_index_col)
>
> Triggers:
>     update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON
> origo_search_index FOR EACH ROW EXECUTE PROCEDURE
> tsvector_update_trigger('textsearchable_index_col',
> 'pg_catalog.english', 'search_value')
>
> I store all the text extracted from the documents in "search_value"
> and have the built-in trigger tsvector_update_trigger update the
> tsvector-column.
>
> Any hints on how to get around this issue to allow indexing large
> documents? I don't see how "only index the first N bytes of the
> document" would be of interest to anyone...
>
> BTW: I'm using PG-9.0beta3

Never mind... I was having a btree index on search_value too, which of
course caused the problem.

--
Andreas Joseph Krogh<andreak@officenet.no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Rosenholmveien 25       | know how to do a thing and to watch         |
1414 Trollåsen          | somebody else doing it wrong, without       |
NORWAY                  | comment.                                    |
                         |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+