Re: [PERFORM] select with max functions - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: [PERFORM] select with max functions
Date
Msg-id 002ad4b4-ec5f-4e9f-fa48-5c19405292ee@catalyst.net.nz
Whole thread Raw
In response to Re: [PERFORM] select with max functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 03/10/17 04:29, Tom Lane wrote:
> Mariel Cherkassky <mariel.cherkassky@gmail.com> writes:
>> explain analyze   SELECT Ma.User_Id,
>>                        COUNT(*) COUNT
>>                 FROM   Manuim Ma
>>                 WHERE  Ma.Bb_Open_Date  =
>>                                    (SELECT Bb_Open_Date
>>                                     FROM   Manuim Man
>>                                     WHERE  Man.User_Id = Ma.User_Id order
>> by                                   bb_open_date desc limit 1
>>                                    )
>>                 GROUP  BY Ma.User_Id
>>                 HAVING COUNT(*) > 1;
> The core problem with this query is that the sub-select has to be done
> over again for each row of the outer table, since it's a correlated
> sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
> a max() call with handmade logic doesn't do anything to help that.
> I'd try refactoring it so that you calculate the max Bb_Open_Date just
> once for each user id, perhaps along the lines of
>
> SELECT Ma.User_Id,
>         COUNT(*) COUNT
>         FROM   Manuim Ma,
>                (SELECT User_Id, max(Bb_Open_Date) as max
>                 FROM   Manuim Man
>                 GROUP BY User_Id) ss
>         WHERE  Ma.User_Id = ss.User_Id AND
>                Ma.Bb_Open_Date = ss.max
>         GROUP  BY Ma.User_Id
>         HAVING COUNT(*) > 1;
>
> This is still not going to be instantaneous, but it might be better.
>
> It's possible that an index on (User_Id, Bb_Open_Date) would help,
> but I'm not sure.
>
>             regards, tom lane
>
>

Further ideas based on Tom's rewrite: If that MAX is still expensive it 
might be worth breaking


SELECT User_Id, max(Bb_Open_Date) as max               FROM   Manuim Man               GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an 
equivalent  trigger based summary table (there are examples in the docs 
of how to do this).

Cheers

Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: Ben Nachtrieb
Date:
Subject: [PERFORM] How does max_parallel_workers_per_gather change load averages?
Next
From: David Rowley
Date:
Subject: Re: [PERFORM] How does max_parallel_workers_per_gather change load averages?