Thread: Ellipses around result fragment of ts_headline
It would be very useful if there were an option to have ts_headline append ellipses before or after a result fragement based on the position of the fragment in the source document. For instance, when running ts_headline(doc, query) it will correctly return a fragment with words highlighted, however, there's no easy way to determine whether this returned fragment is at the beginning or end of the original doc, and add the necessary ellipses. Searches such as postgresql.org ALWAYS add ellipses before or after the fragment regardless of whether or not ellipses are warranted. In my opinion always adding ellipses to the fragment is deceptive to the user, in many of my search result cases, the fragment is at the beginning of the doc, and would confuse the user to always see ellipses. So you can see how useful the feature described above would be beneficial to the accuracy of the search result fragment.
I think we currently do that. We add ellipses only when we encounter a new fragment. So there should not be ellipses if we are at the end of the document or if that is the first fragment (includes the beginning of the document). Here is the code in generateHeadline, ts_parse.c that adds the ellipses: if (!infrag) { /* start of a new fragment */ infrag = 1; numfragments ++; /* adda fragment delimitor if this is after the first one */ if (numfragments > 1) { memcpy(ptr, prs->fragdelim, prs->fragdelimlen); ptr += prs->fragdelimlen; } } It is possible that there is a bug that needs to be fixed. Can you show me an example where you found that? -Sushant. On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote: > It would be very useful if there were an option to have ts_headline append > ellipses before or after a result fragement based on the position of the > fragment in the source document. For instance, when running ts_headline(doc, > query) it will correctly return a fragment with words highlighted, however, > there's no easy way to determine whether this returned fragment is at the > beginning or end of the original doc, and add the necessary ellipses. > > Searches such as postgresql.org ALWAYS add ellipses before or after the > fragment regardless of whether or not ellipses are warranted. In my opinion > always adding ellipses to the fragment is deceptive to the user, in many of > my search result cases, the fragment is at the beginning of the doc, and > would confuse the user to always see ellipses. So you can see how useful the > feature described above would be beneficial to the accuracy of the search > result fragment. > > > > >
Interesting, it could be that you already do it, but the documentation makes no reference to a fragment delimiter, so there's no way that I can see to add one. The documentation for ts_headline only lists StartSel, StopSel, MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no option for a fragment delimiter. In my case I do: SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords = 17') as copy, ts_rank(v1.text_search, query) AS rank FROM (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A') || setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search FROM search.v_searchable_content b1) v1, plainto_tsquery($1)query WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER BY rank DESC, title Now, this use of ts_headline correctly returns me highlighted fragmented search results, but there will be no fragment delimiter for the headline. Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17') to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you can clearly see this would always occur, and not be intelligent regarding the fragments. I hope that you're correct and that it is implemented, and not documented >-----Original Message----- >From: Sushant Sinha [mailto:sushant354@gmail.com] >Sent: Saturday, February 14, 2009 4:07 PM >To: Asher Snyder >Cc: pgsql-hackers@postgresql.org >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline > >I think we currently do that. We add ellipses only when we encounter a >new fragment. So there should not be ellipses if we are at the end of >the document or if that is the first fragment (includes the beginning of >the document). Here is the code in generateHeadline, ts_parse.c that >adds the ellipses: > > if (!infrag) > { > > /* start of a new fragment */ > infrag = 1; > numfragments ++; > /* add a fragment delimitor if this is after the first >one */ > if (numfragments > 1) > { > memcpy(ptr, prs->fragdelim, prs->fragdelimlen); > ptr += prs->fragdelimlen; > } > > } > >It is possible that there is a bug that needs to be fixed. Can you show >me an example where you found that? > >-Sushant. > > > > >On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote: >> It would be very useful if there were an option to have ts_headline >append >> ellipses before or after a result fragement based on the position of >the >> fragment in the source document. For instance, when running >ts_headline(doc, >> query) it will correctly return a fragment with words highlighted, >however, >> there's no easy way to determine whether this returned fragment is at >the >> beginning or end of the original doc, and add the necessary ellipses. >> >> Searches such as postgresql.org ALWAYS add ellipses before or after >the >> fragment regardless of whether or not ellipses are warranted. In my >opinion >> always adding ellipses to the fragment is deceptive to the user, in >many of >> my search result cases, the fragment is at the beginning of the doc, >and >> would confuse the user to always see ellipses. So you can see how >useful the >> feature described above would be beneficial to the accuracy of the >search >> result fragment. >> >> >> >> >>
Sushant Sinha <sushant354@gmail.com> writes: > I think we currently do that. ... since about four months ago. 2008-10-17 14:05 teodor * doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h,src/test/regress/expected/tsearch.out,src/test/regress/sql/tsearch.sql:Improve headelinegeneration. Nowheadline can contain several fragments a-la Google.Sushant Sinha <sushant354@gmail.com> regards, tom lane
The documentation in 8.4dev has information on FragmentDelimiter http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html If you do not specify MaxFragments > 0, then the default headline generator kicks in. The default headline generator does not have any fragment delimiter. So it is correct that you will not see any delimiter. I think you are looking for the default headline generator to add ellipses as well depending on where the fragment is. I do not what other people opinion on this is. -Sushant. On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote: > Interesting, it could be that you already do it, but the documentation makes > no reference to a fragment delimiter, so there's no way that I can see to > add one. The documentation for ts_headline only lists StartSel, StopSel, > MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no > option for a fragment delimiter. > > In my case I do: > > SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords = > 17') as copy, ts_rank(v1.text_search, query) AS rank FROM > (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A') > || > setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search > FROM search.v_searchable_content b1) v1, > plainto_tsquery($1) query > WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER > BY rank DESC, title > > Now, this use of ts_headline correctly returns me highlighted fragmented > search results, but there will be no fragment delimiter for the headline. > Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17') > to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you > can clearly see this would always occur, and not be intelligent regarding > the fragments. I hope that you're correct and that it is implemented, and > not documented > > >-----Original Message----- > >From: Sushant Sinha [mailto:sushant354@gmail.com] > >Sent: Saturday, February 14, 2009 4:07 PM > >To: Asher Snyder > >Cc: pgsql-hackers@postgresql.org > >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline > > > >I think we currently do that. We add ellipses only when we encounter a > >new fragment. So there should not be ellipses if we are at the end of > >the document or if that is the first fragment (includes the beginning of > >the document). Here is the code in generateHeadline, ts_parse.c that > >adds the ellipses: > > > > if (!infrag) > > { > > > > /* start of a new fragment */ > > infrag = 1; > > numfragments ++; > > /* add a fragment delimitor if this is after the first > >one */ > > if (numfragments > 1) > > { > > memcpy(ptr, prs->fragdelim, prs->fragdelimlen); > > ptr += prs->fragdelimlen; > > } > > > > } > > > >It is possible that there is a bug that needs to be fixed. Can you show > >me an example where you found that? > > > >-Sushant. > > > > > > > > > >On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote: > >> It would be very useful if there were an option to have ts_headline > >append > >> ellipses before or after a result fragement based on the position of > >the > >> fragment in the source document. For instance, when running > >ts_headline(doc, > >> query) it will correctly return a fragment with words highlighted, > >however, > >> there's no easy way to determine whether this returned fragment is at > >the > >> beginning or end of the original doc, and add the necessary ellipses. > >> > >> Searches such as postgresql.org ALWAYS add ellipses before or after > >the > >> fragment regardless of whether or not ellipses are warranted. In my > >opinion > >> always adding ellipses to the fragment is deceptive to the user, in > >many of > >> my search result cases, the fragment is at the beginning of the doc, > >and > >> would confuse the user to always see ellipses. So you can see how > >useful the > >> feature described above would be beneficial to the accuracy of the > >search > >> result fragment. > >> > >> > >> > >> > >> > >
Yes, you are correct in your assumption that I'm looking for a single fragment to also have the option to add a fragment delimiter based on its position in the document. >-----Original Message----- >From: Sushant Sinha [mailto:sushant354@gmail.com] >Sent: Saturday, February 14, 2009 4:41 PM >To: Asher Snyder >Cc: pgsql-hackers@postgresql.org >Subject: RE: [HACKERS] Ellipses around result fragment of ts_headline > >The documentation in 8.4dev has information on FragmentDelimiter >http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html > >If you do not specify MaxFragments > 0, then the default headline >generator kicks in. The default headline generator does not have any >fragment delimiter. So it is correct that you will not see any >delimiter. > >I think you are looking for the default headline generator to add >ellipses as well depending on where the fragment is. I do not what >other people opinion on this is. > >-Sushant. > >On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote: >> Interesting, it could be that you already do it, but the documentation >makes >> no reference to a fragment delimiter, so there's no way that I can see >to >> add one. The documentation for ts_headline only lists StartSel, >StopSel, >> MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be >no >> option for a fragment delimiter. >> >> In my case I do: >> >> SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, >'MinWords = >> 17') as copy, ts_rank(v1.text_search, query) AS rank FROM >> (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A') >> || >> setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search >> FROM search.v_searchable_content b1) v1, >> plainto_tsquery($1) query >> WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search >ORDER >> BY rank DESC, title >> >> Now, this use of ts_headline correctly returns me highlighted >fragmented >> search results, but there will be no fragment delimiter for the >headline. >> Some suggestions were to change ts_headline(v1.copy, query, 'MinWords >= 17') >> to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but >as you >> can clearly see this would always occur, and not be intelligent >regarding >> the fragments. I hope that you're correct and that it is implemented, >and >> not documented >> >> >-----Original Message----- >> >From: Sushant Sinha [mailto:sushant354@gmail.com] >> >Sent: Saturday, February 14, 2009 4:07 PM >> >To: Asher Snyder >> >Cc: pgsql-hackers@postgresql.org >> >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline >> > >> >I think we currently do that. We add ellipses only when we encounter >a >> >new fragment. So there should not be ellipses if we are at the end of >> >the document or if that is the first fragment (includes the beginning >of >> >the document). Here is the code in generateHeadline, ts_parse.c that >> >adds the ellipses: >> > >> > if (!infrag) >> > { >> > >> > /* start of a new fragment */ >> > infrag = 1; >> > numfragments ++; >> > /* add a fragment delimitor if this is after the >first >> >one */ >> > if (numfragments > 1) >> > { >> > memcpy(ptr, prs->fragdelim, prs->fragdelimlen); >> > ptr += prs->fragdelimlen; >> > } >> > >> > } >> > >> >It is possible that there is a bug that needs to be fixed. Can you >show >> >me an example where you found that? >> > >> >-Sushant. >> > >> > >> > >> > >> >On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote: >> >> It would be very useful if there were an option to have ts_headline >> >append >> >> ellipses before or after a result fragement based on the position >of >> >the >> >> fragment in the source document. For instance, when running >> >ts_headline(doc, >> >> query) it will correctly return a fragment with words highlighted, >> >however, >> >> there's no easy way to determine whether this returned fragment is >at >> >the >> >> beginning or end of the original doc, and add the necessary >ellipses. >> >> >> >> Searches such as postgresql.org ALWAYS add ellipses before or after >> >the >> >> fragment regardless of whether or not ellipses are warranted. In my >> >opinion >> >> always adding ellipses to the fragment is deceptive to the user, in >> >many of >> >> my search result cases, the fragment is at the beginning of the >doc, >> >and >> >> would confuse the user to always see ellipses. So you can see how >> >useful the >> >> feature described above would be beneficial to the accuracy of the >> >search >> >> result fragment. >> >> >> >> >> >> >> >> >> >> >> >>
No worries, I'm going to start playing around with the dev branch now, but in any case, your previous response is still applicable, and the question regarding the fragment delimiter for the first fragment is still applicable. It seems that without that, I would still have the same problem with the first fragment. >-----Original Message----- >From: Sushant Sinha [mailto:sushant354@gmail.com] >Sent: Saturday, February 14, 2009 4:47 PM >To: Tom Lane >Cc: Asher Snyder; pgsql-hackers@postgresql.org >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline > >Sorry ... I thought you were running the development branch. > >-Sushant. > >On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote: >> Sushant Sinha <sushant354@gmail.com> writes: >> > I think we currently do that. >> >> ... since about four months ago. >> >> 2008-10-17 14:05 teodor >> >> * doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c, >> src/backend/tsearch/wparser_def.c, >src/include/tsearch/ts_public.h, >> src/test/regress/expected/tsearch.out, >> src/test/regress/sql/tsearch.sql: Improve headeline generation. >Now >> headline can contain several fragments a-la Google. >> >> Sushant Sinha <sushant354@gmail.com> >> >> regards, tom lane
Sorry ... I thought you were running the development branch. -Sushant. On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote: > Sushant Sinha <sushant354@gmail.com> writes: > > I think we currently do that. > > ... since about four months ago. > > 2008-10-17 14:05 teodor > > * doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c, > src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h, > src/test/regress/expected/tsearch.out, > src/test/regress/sql/tsearch.sql: Improve headeline generation. Now > headline can contain several fragments a-la Google. > > Sushant Sinha <sushant354@gmail.com> > > regards, tom lane