Re: Text search - Mailing list pgsql-general

From Chris Roffler
Subject Re: Text search
Date
Msg-id 3984722a1003160757t5dc1dc39nb72bdf99324a84b6@mail.gmail.com
Whole thread Raw
In response to Re: Text search  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Richard

thanks for the pointers .... unfortunately its not just attribute names.

Here is what I am thinking of doing;

In a first step I run a query 

SELECT id  FROM time_series WHERE
 to_tsvector(xml_string)
 @@
 to_tsquery( anystring );

 then I load the actual xml string into memory for each id found and use xpath to search the document in memory. This will at least use my text index on the first hit.

Thanks
Chris 



On Tue, Mar 16, 2010 at 4:16 PM, Richard Huxton <dev@archonet.com> wrote:
On 16/03/10 13:49, Richard Huxton wrote:
You could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.

Actually, if it's only attribute names you're interested in you could do it with xpath

Something like (untested):

ALTER TABLE time_series ADD attr_names text;

UPDATE time_series SET attr_names = array_to_string(
   xpath('*/Attribute/Name/text()', external_attributes)
   ,' '
);

CREATE INDEX fti_attr_names ON time_series USING gin(
 to_tsvector('simple', attr_names)
);

SELECT * FROM time_series WHERE
 to_tsvector('simple', attr_names)
 @@
 to_tsquery('simple', 'attribute22');

I'd probably just store the tsvector rather than text unless the text is of some use in itself.

If you plan to do anything with the attributes it'd still be better to split them out into their own table though.


--
 Richard Huxton
 Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Arnaud Lesauvage
Date:
Subject: Re: UPDATE with JOIN not using index
Next
From: tv@fuzzy.cz
Date:
Subject: Re: UPDATE with JOIN not using index