Thread: Full text search ts_heading strange result
I am beginning to use the full text search facilities in Postgresql (9.0) and find the result of this query a bit strange: query: SELECT ts_headline('simple',title, to_tsquery('kerkreg|(church & polity)')) from akb_articles A where A.tsv@@ 'kerkreg|(church & polity)' Result "Kerkvereniging en <b>Kerkreg</b>: Geskiedenis, beginsel en praktyk.(<b>Church</b> unity and <b>church</b> polity: History,principle and practice.)" Why is 'polity' not highlighted? Regards Johann -- Johann Spies Telefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. "Look not every man on his own things, but every man also on the things of others." Philippians 2:4 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wiedit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokumentgeensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis envee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uithierdie e-pos en/of die oopmaak van enige l��s aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whomit is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy thisdocument in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. TheUniversity does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any filesattached to this e-mail.
Johann Spies <jspies@sun.ac.za> writes: > I am beginning to use the full text search facilities in Postgresql > (9.0) and find the result of this query a bit strange: > query: > SELECT ts_headline('simple',title, to_tsquery('kerkreg|(church & polity)')) > from akb_articles A > where A.tsv@@ 'kerkreg|(church & polity)' > Result > "Kerkvereniging en <b>Kerkreg</b>: Geskiedenis, beginsel en praktyk.(<b>Church</b> unity and <b>church</b> polity: History,principle and practice.)" > Why is 'polity' not highlighted? I believe the problem is that the one-argument form of to_tsquery() uses the default TS configuration, which you have probably not got set to "simple". For me, the default TS configuration is "english", which will stem "polity" as "politi": regression=# select to_tsquery('(polity & church)'); to_tsquery --------------------- 'politi' & 'church' (1 row) However the "simple" configuration doesn't do anything to that lexeme: regression=# select to_tsquery('simple', '(polity & church)'); to_tsquery --------------------- 'polity' & 'church' (1 row) So what you've got is ts_headline() parsing the given title against the "simple" configuration and getting "polity", but the tsquery is looking for "politi", hence no match. In short: omit the 'simple' argument from the ts_headline call, and things should play together better. You could alternatively insert to_tsquery('simple', '(polity & church)'), but that won't exactly match what the @@ in WHERE is doing: that's going to use the default configuration. regards, tom lane
Hallo Tom, > I believe the problem is that the one-argument form of to_tsquery() uses > the default TS configuration, which you have probably not got set to > "simple". For me, the default TS configuration is "english", which will > stem "polity" as "politi": > > regression=# select to_tsquery('(polity & church)'); > to_tsquery > --------------------- > 'politi' & 'church' > (1 row) > > However the "simple" configuration doesn't do anything to that lexeme: Thanks for the explanation. I am working with a multi-language database and that was the reason for using the 'simple' configuration. I have asked, in an earlier message on this list, advice on how to handle full text searches in a multi-language database, but got no reaction to it. If there is a better way than using the 'simple' configuration in this case, I would gladly try it. Regards Johann. -- Johann Spies Telefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. "If any of you lack wisdom, let him ask of God, that giveth to all men liberally, and upbraideth not; and it shall be given him." James 1:5 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wiedit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokumentgeensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis envee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uithierdie e-pos en/of die oopmaak van enige l��s aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whomit is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy thisdocument in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. TheUniversity does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any filesattached to this e-mail.
On 07/26/2012 02:14 PM, Johann Spies wrote: > Hallo Tom, > >> I believe the problem is that the one-argument form of to_tsquery() uses >> the default TS configuration, which you have probably not got set to >> "simple". For me, the default TS configuration is "english", which will >> stem "polity" as "politi": >> >> regression=# select to_tsquery('(polity & church)'); >> to_tsquery >> --------------------- >> 'politi' & 'church' >> (1 row) >> >> However the "simple" configuration doesn't do anything to that lexeme: > Thanks for the explanation. I am working with a multi-language database > and that was the reason for using the 'simple' configuration. > > I have asked, in an earlier message on this list, advice on how to > handle full text searches in a multi-language database, but got no > reaction to it. If there is a better way than using the 'simple' > configuration in this case, I would gladly try it. You'll need to store language information alongside each text value if you want to do anything more sophisticated. If you have mixed languages within a single text value or if you don't store information about the language a text value is in then you're largely out of luck. -- Craig Ringer
On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote: > You'll need to store language information alongside each text value > if you want to do anything more sophisticated. I was afraid that that will be the case :) I will have to update more than 320000 entries which currently have inconsistent language indications and some of them none at all. Thanks for responding. Regards Johann -- Johann Spies Telefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. "If any of you lack wisdom, let him ask of God, that giveth to all men liberally, and upbraideth not; and it shall be given him." James 1:5 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wiedit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokumentgeensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis envee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uithierdie e-pos en/of die oopmaak van enige l��s aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whomit is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy thisdocument in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. TheUniversity does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any filesattached to this e-mail.
Johann Spies <jspies@sun.ac.za> writes: > On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote: >> You'll need to store language information alongside each text value >> if you want to do anything more sophisticated. > I was afraid that that will be the case :) I'm not sure that there's anything horribly wrong with the strategy of using "simple" for everything. You won't get language-aware stemming, but maybe you don't need that. The problem with what you originally posted was not that "simple" was inadequate, but that you weren't applying it consistently --- you didn't have default_text_search_configuration set to match. regards, tom lane