Thread: Full text search ts_heading strange result

Full text search ts_heading strange result

From
Johann Spies
Date:
I am beginning to use the full text search facilities in Postgresql
(9.0) and find the result of this query a bit strange:

query:

SELECT  ts_headline('simple',title, to_tsquery('kerkreg|(church & polity)'))
from akb_articles A
where A.tsv@@ 'kerkreg|(church & polity)'

Result

"Kerkvereniging en <b>Kerkreg</b>: Geskiedenis, beginsel en praktyk.(<b>Church</b> unity and <b>church</b> polity:
History,principle and practice.)" 

Why is 'polity' not highlighted?

Regards
Johann

--
Johann Spies                            Telefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology
Universiteit Stellenbosch.

     "Look not every man on his own things, but every man
      also on the things of others."        Philippians 2:4
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan
wiedit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie
dokumentgeensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis
envee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit
uithierdie e-pos en/of die oopmaak van enige l��s aangeheg by hierdie e-pos nie. 

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged and is intended only for the person to
whomit is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy
thisdocument in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail.
TheUniversity does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any
filesattached to this e-mail. 

Re: Full text search ts_heading strange result

From
Tom Lane
Date:
Johann Spies <jspies@sun.ac.za> writes:
> I am beginning to use the full text search facilities in Postgresql
> (9.0) and find the result of this query a bit strange:

> query:

> SELECT  ts_headline('simple',title, to_tsquery('kerkreg|(church & polity)'))
> from akb_articles A
> where A.tsv@@ 'kerkreg|(church & polity)'

> Result

> "Kerkvereniging en <b>Kerkreg</b>: Geskiedenis, beginsel en praktyk.(<b>Church</b> unity and <b>church</b> polity:
History,principle and practice.)" 

> Why is 'polity' not highlighted?

I believe the problem is that the one-argument form of to_tsquery() uses
the default TS configuration, which you have probably not got set to
"simple".  For me, the default TS configuration is "english", which will
stem "polity" as "politi":

regression=# select to_tsquery('(polity & church)');
     to_tsquery
---------------------
 'politi' & 'church'
(1 row)

However the "simple" configuration doesn't do anything to that lexeme:

regression=# select to_tsquery('simple', '(polity & church)');
     to_tsquery
---------------------
 'polity' & 'church'
(1 row)

So what you've got is ts_headline() parsing the given title against
the "simple" configuration and getting "polity", but the tsquery is
looking for "politi", hence no match.

In short: omit the 'simple' argument from the ts_headline call, and
things should play together better.  You could alternatively insert
to_tsquery('simple', '(polity & church)'), but that won't exactly
match what the @@ in WHERE is doing: that's going to use the default
configuration.

            regards, tom lane

Re: Full text search ts_heading strange result

From
Johann Spies
Date:
Hallo Tom,

> I believe the problem is that the one-argument form of to_tsquery() uses
> the default TS configuration, which you have probably not got set to
> "simple".  For me, the default TS configuration is "english", which will
> stem "polity" as "politi":
>
> regression=# select to_tsquery('(polity & church)');
>      to_tsquery
> ---------------------
>  'politi' & 'church'
> (1 row)
>
> However the "simple" configuration doesn't do anything to that lexeme:

Thanks for the explanation.  I am working with a multi-language database
and that was the reason for using the 'simple' configuration.

I have asked, in an earlier message on this list, advice on how to
handle full text searches in a multi-language database, but got no
reaction to it.  If there is a better way than using the 'simple'
configuration in this case, I would gladly try it.

Regards
Johann.

--
Johann Spies                            Telefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology
Universiteit Stellenbosch.

     "If any of you lack wisdom, let him ask of God, that
      giveth to all men liberally, and upbraideth not; and
      it shall be given him."              James 1:5
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan
wiedit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie
dokumentgeensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis
envee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit
uithierdie e-pos en/of die oopmaak van enige l��s aangeheg by hierdie e-pos nie. 

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged and is intended only for the person to
whomit is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy
thisdocument in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail.
TheUniversity does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any
filesattached to this e-mail. 

Re: Full text search ts_heading strange result

From
Craig Ringer
Date:
On 07/26/2012 02:14 PM, Johann Spies wrote:
> Hallo Tom,
>
>> I believe the problem is that the one-argument form of to_tsquery() uses
>> the default TS configuration, which you have probably not got set to
>> "simple".  For me, the default TS configuration is "english", which will
>> stem "polity" as "politi":
>>
>> regression=# select to_tsquery('(polity & church)');
>>       to_tsquery
>> ---------------------
>>   'politi' & 'church'
>> (1 row)
>>
>> However the "simple" configuration doesn't do anything to that lexeme:
> Thanks for the explanation.  I am working with a multi-language database
> and that was the reason for using the 'simple' configuration.
>
> I have asked, in an earlier message on this list, advice on how to
> handle full text searches in a multi-language database, but got no
> reaction to it.  If there is a better way than using the 'simple'
> configuration in this case, I would gladly try it.
You'll need to store language information alongside each text value if
you want to do anything more sophisticated. If you have mixed languages
within a single text value or if you don't store information about the
language a text value is in then you're largely out of luck.

--
Craig Ringer

Re: Full text search ts_heading strange result

From
Johann Spies
Date:
On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote:

> You'll need to store language information alongside each text value
> if you want to do anything more sophisticated.

I was afraid that that will be the case :)

I will have to update more than 320000 entries which currently have
inconsistent language indications and some of them none at all.

Thanks for responding.

Regards
Johann

--
Johann Spies                            Telefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology
Universiteit Stellenbosch.

     "If any of you lack wisdom, let him ask of God, that
      giveth to all men liberally, and upbraideth not; and
      it shall be given him."              James 1:5
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan
wiedit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie
dokumentgeensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis
envee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit
uithierdie e-pos en/of die oopmaak van enige l��s aangeheg by hierdie e-pos nie. 

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged and is intended only for the person to
whomit is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy
thisdocument in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail.
TheUniversity does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any
filesattached to this e-mail. 

Re: Full text search ts_heading strange result

From
Tom Lane
Date:
Johann Spies <jspies@sun.ac.za> writes:
> On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote:
>> You'll need to store language information alongside each text value
>> if you want to do anything more sophisticated.

> I was afraid that that will be the case :)

I'm not sure that there's anything horribly wrong with the strategy
of using "simple" for everything.  You won't get language-aware stemming,
but maybe you don't need that.  The problem with what you originally
posted was not that "simple" was inadequate, but that you weren't
applying it consistently --- you didn't have
default_text_search_configuration set to match.

            regards, tom lane