Thread: Website mailing list search enhancement idea - one result per thread
Just an idea...
When searching the mailing lists in advanced search mode we have list filtering, timeframe filtering (pretty coarse though) and then sorting mode. For moderately long running threads a search term is going to return many messages from the same thread. It would desirable to ask that only threads be shown and maybe how many individual messages in that thread matched. Going into the first message and then choosing "whole thread" (or going directly to whole thread mode) would be the typical action by the searcher who is looking to pull up a conversation they recall happened (or assumed did) and can quickly skim the whole thing for context and information.
Thanks for reading!
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > When searching the mailing lists in advanced search mode we have list > filtering, timeframe filtering (pretty coarse though) and then sorting > mode. For moderately long running threads a search term is going to return > many messages from the same thread. It would desirable to ask that only > threads be shown and maybe how many individual messages in that thread > matched. +1, an option to do that would be handy. No idea how hard it is ... regards, tom lane
On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode. For moderately long running threads a search term is going to return
> many messages from the same thread. It would desirable to ask that only
> threads be shown and maybe how many individual messages in that thread
> matched.
+1, an option to do that would be handy. No idea how hard it is ...
Not having actually looked at it, probably not too hard. The hardest part might be bikeshedding about the color (aka user interface).
Should this replace the current interface, or do we want to make something like this a choice?
//Magnus
Re: Website mailing list search enhancement idea - one result per thread
From
"David G. Johnston"
Date:
On Thu, Apr 30, 2020 at 9:30 AM Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:"David G. Johnston" <david.g.johnston@gmail.com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode. For moderately long running threads a search term is going to return
> many messages from the same thread. It would desirable to ask that only
> threads be shown and maybe how many individual messages in that thread
> matched.
+1, an option to do that would be handy. No idea how hard it is ...
Should this replace the current interface, or do we want to make something like this a choice?
I would have it be a choice under the advanced search options with possibly an option on the search results page to "See thread results only".
David J.
Magnus Hagander <magnus@hagander.net> writes: > On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "David G. Johnston" <david.g.johnston@gmail.com> writes: >>> It would desirable to ask that only >>> threads be shown and maybe how many individual messages in that thread >>> matched. >> +1, an option to do that would be handy. No idea how hard it is ... > Not having actually looked at it, probably not too hard. The hardest part > might be bikeshedding about the color (aka user interface). > Should this replace the current interface, or do we want to make something > like this a choice? No no, not replace. It should be an option, "aggregate into threads" or something like that. regards, tom lane
"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes: > When searching the mailing lists in advanced search mode we have list > filtering, timeframe filtering (pretty coarse though) and then sorting > mode. For moderately long running threads a search term is going to > return > many messages from the same thread. It would desirable to ask that > only > threads be shown and maybe how many individual messages in that thread > matched. +1. I always think so. On 2020-05-01 01:55, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> "David G. Johnston" <david.g.johnston@gmail.com> writes: >>>> It would desirable to ask that only >>>> threads be shown and maybe how many individual messages in that >>>> thread >>>> matched. > >>> +1, an option to do that would be handy. No idea how hard it is ... > >> Not having actually looked at it, probably not too hard. The hardest >> part >> might be bikeshedding about the color (aka user interface). > >> Should this replace the current interface, or do we want to make >> something >> like this a choice? > > No no, not replace. It should be an option, "aggregate into threads" > or something like that. Why don't you implement the following feature? (1) add an option on the advanced search page. I agree with the idea to add the option to aggregate into threads on the advanced search page. I attached the image("add_option_to_aggregate_into_threads.png"). What do you think? (2) change the message's URLs on the result pages if the option is checked. I think it's better to change the message's URLs to flatted ones on result pages if the messages are aggregated into threads because users want to access the discussion pages not per messages but per threads directly. ``` --- a/templates/search/listsearch.html +++ b/templates/search/listsearch.html @@ -53,7 +59,12 @@ <h2>Results {{firsthit}}-{{lasthit}} of {%if hitcount == 1000%}more than 1000{%else%}{{hitcount}}{%endif%}.</h2> {%if pagelinks %}Result pages: {{pagelinks|safe}}<br/><br/>{%endif%} {%for hit in hits %} - {{forloop.counter0|add:firsthit}}. <a href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> [{{hit.rank|floatformat:2}}]<br/> + {{forloop.counter0|add:firsthit}}. + {%if aggregate%} + <a href="https://www.postgresql.org/message-id/flat/{{hit.messageid}}">{{hit.subject}}</a> [{{hit.rank|floatformat:2}}]<br/> + {%else%} + <a href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> [{{hit.rank|floatformat:2}}]<br/> + {%endif%} From {{hit.author}} on {{hit.date}}.<br/> {{hit.abstract|safe}}<br/> <a href="https://www.postgresql.org/message-id/{{hit.messageid}}">https://www.postgresql.org/message-id/{{hit.messageid}}</a><br/> ``` If someone already works for this, it's ok to ignore the following. Although I'm not familiar with Django, I made the patches for pgarchives and pgweb. - pgweb: "0001-wip-add-an-search-option-to-aggregate-into-threads.patch" This patch changes the user interface the above (1) and (2). - pgarchives: "0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch" This patch changes the search query to the Postgresql. The example query is the following. To aggregate into threads, it uses the window function to extract the highest-ranked messages in the same thread group. ``` SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg', 'wal writer')), ts_headline(bodytxt, plainto_tsquery('public.pg', 'wal writer'),'StartSel="[[[[[[",StopSel="]]]]]]"') FROM (SELECT messageid, date, subject, _from, fti, bodytxt, -- add a sub query RANK() OVER (PARTITION BY threadid ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC, id) AS rank -- this line is main change FROM messages m WHERE fti @@ plainto_tsquery('public.pg', 'wal writer') AND EXISTS (SELECT 1 FROM list_threads lt WHERE lt.threadid=m.threadid AND lt.listid=ANY(ARRAY[2])) AND m.date > '2020-02-07T08:44:56.738979'::timestamp ) m1 WHERE m1.rank = 1 ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC LIMIT 1000; ``` If you have any good ideas, please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021-02-06 18:05, Masahiro Ikeda wrote: > "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes: >> When searching the mailing lists in advanced search mode we have list >> filtering, timeframe filtering (pretty coarse though) and then sorting >> mode. For moderately long running threads a search term is going to >> return >> many messages from the same thread. It would desirable to ask that >> only >> threads be shown and maybe how many individual messages in that thread >> matched. > > +1. I always think so. > > On 2020-05-01 01:55, Tom Lane wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> "David G. Johnston" <david.g.johnston@gmail.com> writes: >>>>> It would desirable to ask that only >>>>> threads be shown and maybe how many individual messages in that >>>>> thread >>>>> matched. >> >>>> +1, an option to do that would be handy. No idea how hard it is ... >> >>> Not having actually looked at it, probably not too hard. The hardest >>> part >>> might be bikeshedding about the color (aka user interface). >> >>> Should this replace the current interface, or do we want to make >>> something >>> like this a choice? >> >> No no, not replace. It should be an option, "aggregate into threads" >> or something like that. > > Why don't you implement the following feature? > > (1) add an option on the advanced search page. > > I agree with the idea to add the option to aggregate into threads > on the advanced search page. > > I attached the image("add_option_to_aggregate_into_threads.png"). > What do you think? > > > (2) change the message's URLs on the result pages if the option is > checked. > > I think it's better to change the message's URLs to flatted ones on > result pages > if the messages are aggregated into threads because users want to > access the > discussion pages not per messages but per threads directly. > > ``` > --- a/templates/search/listsearch.html > +++ b/templates/search/listsearch.html > @@ -53,7 +59,12 @@ > <h2>Results {{firsthit}}-{{lasthit}} of {%if hitcount == > 1000%}more than 1000{%else%}{{hitcount}}{%endif%}.</h2> > {%if pagelinks %}Result pages: > {{pagelinks|safe}}<br/><br/>{%endif%} > {%for hit in hits %} > - {{forloop.counter0|add:firsthit}}. <a > href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> > [{{hit.rank|floatformat:2}}]<br/> > + {{forloop.counter0|add:firsthit}}. > + {%if aggregate%} > + <a > href="https://www.postgresql.org/message-id/flat/{{hit.messageid}}">{{hit.subject}}</a> > [{{hit.rank|floatformat:2}}]<br/> > + {%else%} > + <a > href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> > [{{hit.rank|floatformat:2}}]<br/> > + {%endif%} > From {{hit.author}} on {{hit.date}}.<br/> > {{hit.abstract|safe}}<br/> > <a > href="https://www.postgresql.org/message-id/{{hit.messageid}}">https://www.postgresql.org/message-id/{{hit.messageid}}</a><br/> > ``` > > > If someone already works for this, it's ok to ignore the following. > Although I'm not familiar with Django, I made the patches for > pgarchives and pgweb. > > - pgweb: > "0001-wip-add-an-search-option-to-aggregate-into-threads.patch" > > This patch changes the user interface the above (1) and (2). > > > - pgarchives: > "0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch" > > This patch changes the search query to the Postgresql. The example > query is the following. > To aggregate into threads, it uses the window function to extract the > highest-ranked > messages in the same thread group. > > > ``` > SELECT messageid, date, subject, _from, ts_rank_cd(fti, > plainto_tsquery('public.pg', 'wal writer')), > ts_headline(bodytxt, plainto_tsquery('public.pg', 'wal > writer'),'StartSel="[[[[[[",StopSel="]]]]]]"') > FROM > (SELECT messageid, date, subject, _from, fti, bodytxt, > -- add a sub query > RANK() OVER (PARTITION BY threadid ORDER BY ts_rank_cd(fti, > plainto_tsquery('wal writer')) DESC, id) AS rank -- this line is > main change > FROM messages m > WHERE > fti @@ plainto_tsquery('public.pg', 'wal writer') > AND EXISTS (SELECT 1 FROM list_threads lt WHERE > lt.threadid=m.threadid AND lt.listid=ANY(ARRAY[2])) > AND m.date > '2020-02-07T08:44:56.738979'::timestamp > ) m1 > WHERE m1.rank = 1 > ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC > LIMIT 1000; > ``` > > If you have any good ideas, please let me know. I fixed hard-coded value in "0001-wip-add-an-search-parameter- to-aggregate-into-thread.patch" for pgarchives. Regards, -- Masahiro Ikeda NTT DATA CORPORATION