Thread: Website mailing list search enhancement idea - one result per thread

Website mailing list search enhancement idea - one result per thread

From
"David G. Johnston"
Date:
Just an idea...

When searching the mailing lists in advanced search mode we have list filtering, timeframe filtering (pretty coarse though) and then sorting mode.  For moderately long running threads a search term is going to return many messages from the same thread.  It would desirable to ask that only threads be shown and maybe how many individual messages in that thread matched.  Going into the first message and then choosing "whole thread" (or going directly to whole thread mode) would be the typical action by the searcher who is looking to pull up a conversation they recall happened (or assumed did) and can quickly skim the whole thing for context and information.

Thanks for reading!

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode.  For moderately long running threads a search term is going to return
> many messages from the same thread.  It would desirable to ask that only
> threads be shown and maybe how many individual messages in that thread
> matched.

+1, an option to do that would be handy.  No idea how hard it is ...

            regards, tom lane



Re: Website mailing list search enhancement idea - one result per thread

From
Magnus Hagander
Date:
On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode.  For moderately long running threads a search term is going to return
> many messages from the same thread.  It would desirable to ask that only
> threads be shown and maybe how many individual messages in that thread
> matched.

+1, an option to do that would be handy.  No idea how hard it is ...


Not having actually looked at it, probably not too hard. The hardest part might be bikeshedding about the color (aka user interface).

Should this replace the current interface, or do we want to make something like this a choice?

//Magnus

Re: Website mailing list search enhancement idea - one result per thread

From
"David G. Johnston"
Date:
On Thu, Apr 30, 2020 at 9:30 AM Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode.  For moderately long running threads a search term is going to return
> many messages from the same thread.  It would desirable to ask that only
> threads be shown and maybe how many individual messages in that thread
> matched.

+1, an option to do that would be handy.  No idea how hard it is ...

 
Should this replace the current interface, or do we want to make something like this a choice?

I would have it be a choice under the advanced search options with possibly an option on the search results page to "See thread results only".

David J.
Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>>> It would desirable to ask that only
>>> threads be shown and maybe how many individual messages in that thread
>>> matched.

>> +1, an option to do that would be handy.  No idea how hard it is ...

> Not having actually looked at it, probably not too hard. The hardest part
> might be bikeshedding about the color (aka user interface).

> Should this replace the current interface, or do we want to make something
> like this a choice?

No no, not replace.  It should be an option, "aggregate into threads"
or something like that.

            regards, tom lane



Re: Website mailing list search enhancement idea - one result per thread

From
Masahiro Ikeda
Date:
"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode.  For moderately long running threads a search term is going to 
> return
> many messages from the same thread.  It would desirable to ask that 
> only
> threads be shown and maybe how many individual messages in that thread
> matched.

+1. I always think so.

On 2020-05-01 01:55, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>>>> It would desirable to ask that only
>>>> threads be shown and maybe how many individual messages in that 
>>>> thread
>>>> matched.
> 
>>> +1, an option to do that would be handy.  No idea how hard it is ...
> 
>> Not having actually looked at it, probably not too hard. The hardest 
>> part
>> might be bikeshedding about the color (aka user interface).
> 
>> Should this replace the current interface, or do we want to make 
>> something
>> like this a choice?
> 
> No no, not replace.  It should be an option, "aggregate into threads"
> or something like that.

Why don't you implement the following feature?

(1) add an option on the advanced search page.

I agree with the idea to add the option to aggregate into threads
on the advanced search page.

I attached the image("add_option_to_aggregate_into_threads.png").
What do you think?


(2) change the message's URLs on the result pages if the option is 
checked.

I think it's better to change the message's URLs to flatted ones on 
result pages
if the messages are aggregated into threads because users want to access 
the
discussion pages not per messages but per threads directly.

```
--- a/templates/search/listsearch.html
+++ b/templates/search/listsearch.html
@@ -53,7 +59,12 @@
      <h2>Results {{firsthit}}-{{lasthit}} of {%if hitcount == 1000%}more 
than 1000{%else%}{{hitcount}}{%endif%}.</h2>
      {%if pagelinks %}Result pages: 
{{pagelinks|safe}}<br/><br/>{%endif%}
      {%for hit in hits %}
-      {{forloop.counter0|add:firsthit}}. <a 
href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> 
[{{hit.rank|floatformat:2}}]<br/>
+      {{forloop.counter0|add:firsthit}}.
+        {%if aggregate%}
+          <a 
href="https://www.postgresql.org/message-id/flat/{{hit.messageid}}">{{hit.subject}}</a> 
[{{hit.rank|floatformat:2}}]<br/>
+        {%else%}
+          <a 
href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> 
[{{hit.rank|floatformat:2}}]<br/>
+        {%endif%}
        From {{hit.author}} on {{hit.date}}.<br/>
        {{hit.abstract|safe}}<br/>
        <a 

href="https://www.postgresql.org/message-id/{{hit.messageid}}">https://www.postgresql.org/message-id/{{hit.messageid}}</a><br/>
```


If someone already works for this, it's ok to ignore the following.
Although I'm not familiar with Django, I made the patches for pgarchives 
and pgweb.

- pgweb: "0001-wip-add-an-search-option-to-aggregate-into-threads.patch"

This patch changes the user interface the above (1) and (2).


- pgarchives: 
"0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch"

This patch changes the search query to the Postgresql. The example query 
is the following.
To aggregate into threads, it uses the window function to extract the 
highest-ranked
messages in the same thread group.


```
SELECT messageid, date, subject, _from, ts_rank_cd(fti, 
plainto_tsquery('public.pg', 'wal writer')),
        ts_headline(bodytxt, plainto_tsquery('public.pg', 'wal 
writer'),'StartSel="[[[[[[",StopSel="]]]]]]"')
FROM
   (SELECT messageid, date, subject, _from, fti, bodytxt,                 
                                                  -- add a sub query
           RANK() OVER (PARTITION BY threadid ORDER BY ts_rank_cd(fti, 
plainto_tsquery('wal writer')) DESC, id) AS rank    -- this line is main 
change
    FROM messages m
    WHERE
      fti @@ plainto_tsquery('public.pg', 'wal writer')
      AND EXISTS (SELECT 1 FROM list_threads lt WHERE 
lt.threadid=m.threadid AND lt.listid=ANY(ARRAY[2]))
      AND m.date > '2020-02-07T08:44:56.738979'::timestamp
   ) m1
WHERE m1.rank = 1
ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC
LIMIT 1000;
```

If you have any good ideas, please let me know.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Website mailing list search enhancement idea - one result per thread

From
Masahiro Ikeda
Date:
On 2021-02-06 18:05, Masahiro Ikeda wrote:
> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
>> When searching the mailing lists in advanced search mode we have list
>> filtering, timeframe filtering (pretty coarse though) and then sorting
>> mode.  For moderately long running threads a search term is going to 
>> return
>> many messages from the same thread.  It would desirable to ask that 
>> only
>> threads be shown and maybe how many individual messages in that thread
>> matched.
> 
> +1. I always think so.
> 
> On 2020-05-01 01:55, Tom Lane wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>>>>> It would desirable to ask that only
>>>>> threads be shown and maybe how many individual messages in that 
>>>>> thread
>>>>> matched.
>> 
>>>> +1, an option to do that would be handy.  No idea how hard it is ...
>> 
>>> Not having actually looked at it, probably not too hard. The hardest 
>>> part
>>> might be bikeshedding about the color (aka user interface).
>> 
>>> Should this replace the current interface, or do we want to make 
>>> something
>>> like this a choice?
>> 
>> No no, not replace.  It should be an option, "aggregate into threads"
>> or something like that.
> 
> Why don't you implement the following feature?
> 
> (1) add an option on the advanced search page.
> 
> I agree with the idea to add the option to aggregate into threads
> on the advanced search page.
> 
> I attached the image("add_option_to_aggregate_into_threads.png").
> What do you think?
> 
> 
> (2) change the message's URLs on the result pages if the option is 
> checked.
> 
> I think it's better to change the message's URLs to flatted ones on 
> result pages
> if the messages are aggregated into threads because users want to 
> access the
> discussion pages not per messages but per threads directly.
> 
> ```
> --- a/templates/search/listsearch.html
> +++ b/templates/search/listsearch.html
> @@ -53,7 +59,12 @@
>      <h2>Results {{firsthit}}-{{lasthit}} of {%if hitcount ==
> 1000%}more than 1000{%else%}{{hitcount}}{%endif%}.</h2>
>      {%if pagelinks %}Result pages: 
> {{pagelinks|safe}}<br/><br/>{%endif%}
>      {%for hit in hits %}
> -      {{forloop.counter0|add:firsthit}}. <a
> href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a>
> [{{hit.rank|floatformat:2}}]<br/>
> +      {{forloop.counter0|add:firsthit}}.
> +        {%if aggregate%}
> +          <a
> href="https://www.postgresql.org/message-id/flat/{{hit.messageid}}">{{hit.subject}}</a>
> [{{hit.rank|floatformat:2}}]<br/>
> +        {%else%}
> +          <a
> href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a>
> [{{hit.rank|floatformat:2}}]<br/>
> +        {%endif%}
>        From {{hit.author}} on {{hit.date}}.<br/>
>        {{hit.abstract|safe}}<br/>
>        <a
>
href="https://www.postgresql.org/message-id/{{hit.messageid}}">https://www.postgresql.org/message-id/{{hit.messageid}}</a><br/>
> ```
> 
> 
> If someone already works for this, it's ok to ignore the following.
> Although I'm not familiar with Django, I made the patches for
> pgarchives and pgweb.
> 
> - pgweb: 
> "0001-wip-add-an-search-option-to-aggregate-into-threads.patch"
> 
> This patch changes the user interface the above (1) and (2).
> 
> 
> - pgarchives: 
> "0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch"
> 
> This patch changes the search query to the Postgresql. The example
> query is the following.
> To aggregate into threads, it uses the window function to extract the
> highest-ranked
> messages in the same thread group.
> 
> 
> ```
> SELECT messageid, date, subject, _from, ts_rank_cd(fti,
> plainto_tsquery('public.pg', 'wal writer')),
>        ts_headline(bodytxt, plainto_tsquery('public.pg', 'wal
> writer'),'StartSel="[[[[[[",StopSel="]]]]]]"')
> FROM
>   (SELECT messageid, date, subject, _from, fti, bodytxt,
>                                                    -- add a sub query
>           RANK() OVER (PARTITION BY threadid ORDER BY ts_rank_cd(fti,
> plainto_tsquery('wal writer')) DESC, id) AS rank    -- this line is
> main change
>    FROM messages m
>    WHERE
>      fti @@ plainto_tsquery('public.pg', 'wal writer')
>      AND EXISTS (SELECT 1 FROM list_threads lt WHERE
> lt.threadid=m.threadid AND lt.listid=ANY(ARRAY[2]))
>      AND m.date > '2020-02-07T08:44:56.738979'::timestamp
>   ) m1
> WHERE m1.rank = 1
> ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC
> LIMIT 1000;
> ```
> 
> If you have any good ideas, please let me know.

I fixed hard-coded value in "0001-wip-add-an-search-parameter-
to-aggregate-into-thread.patch" for pgarchives.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment