Home > mailing lists

faster ts_headline - Mailing list pgsql-hackers

From	Marcin Mańk
Subject	faster ts_headline
Date	November 20, 2012 14:50:56
Msg-id	CAK61fk4nkzfSim=DdEAK_BV_zOU6jTV4_Txk12sgn1_F-2+zyw@mail.gmail.com Whole thread Raw
List	pgsql-hackers

Tree view

Hello,
I've started implementing a system for faster headline generation. WIP
patch is attached.

The idea is to make a new type currently called hltext (different
names welcome), that stores the text along with the lexization result.
It conceptually stores an array of tuples like
(word text, type int, lexemes text[] )

A console log is also attached - it shows 5x preformance increase. The
problem is not academic, I have such long texts in an app, making 20
headlines takes 3s+.

The patch lacks documentation, regression tests, and most auxillary
functions (especially I/O functions).


I have a question about the I/O functions of the new type. What format
to choose?

I could make the input function read something like 'english: the
text' where english is the name of the text search configuration . The
input function would do the lexizing.

I could make it read some custom format, which would contain the
tokens, token types and lexemes. Can I use flex/bison, or is there a
good reason not to, and I should make it a hand-made parser?

finally, I could make the type actually "create type
hltex_element(word text, type int, lexemes text[] )", by manually
filling in the applicable catalogs, and make the user make columns as
hltext_element[]. Is there a nice way to manipulate objects of such a
type from within the backend? Is there an example? I suppose that in
this case storage would not be as efficient as I made it.

which one to choose? Other ideas?

Regards
Marcin Mańk

Attachment

pgsql-hackers by date:

From: Andres Freund
Date: 20 November 2012, 14:44:37
Subject: Re: logical changeset generation v3 - Source for Slony

From: Kohei KaiGai
Date: 20 November 2012, 16:23:43
Subject: Re: FDW for PostgreSQL

faster ts_headline - Mailing list pgsql-hackers

Attachment

Previous

Next