Thread: [PERFORM] select with max functions

[PERFORM] select with max functions

From
Mariel Cherkassky
Date:
Hi,
I need to use the max function in my query. I had very bad performance when I used the max  : 

               SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  = 
                                  (SELECT max(Bb_Open_Date)
                                   FROM   Manuim Man
                                   WHERE  Man.User_Id = Ma.User_Id
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;


                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   ->  Index Scan using manuim_i_user_id on manuim ma  (cost=0.56..3250552295.59 rows=178324 width=10)
         Filter: (bb_open_date = (SubPlan 1))
         SubPlan 1
           ->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
                 ->  Index Scan using manuim_i_user_id on manuim man  (cost=0.56..90.92 rows=22 width=8)
                       Index Cond: ((user_id)::text = (ma.user_id)::text)
(9 rows)



So I used the limit 1 option : 

               SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  = 
                                  (SELECT Bb_Open_Date
                                   FROM   Manuim Man
                                   WHERE  Man.User_Id = Ma.User_Id order by                                   bb_open_date desc limit 1
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;

and the performance are still the same : 

                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   ->  Index Scan using manuim_i_user_id on manuim ma  (cost=0.56..3252246374.92 rows=178324 width=10)
         Filter: (bb_open_date = (SubPlan 1))
         SubPlan 1
           ->  Limit  (cost=91.03..91.03 rows=1 width=8)
                 ->  Sort  (cost=91.03..91.09 rows=22 width=8)
                       Sort Key: man.bb_open_date DESC
                       ->  Index Scan using manuim_i_user_id on manuim man  (cost=0.56..90.92 rows=22 width=8)
                             Index Cond: ((user_id)::text = (ma.user_id)::text)
(11 rows)



the reading on the table manuim takes a lot of effort, what else can I do ? the table`s size is 8G. 

select count(*) from manuim;
  count   
----------
 35664828
(1 row)

the indexes on the table : 
 "manuim_bb_open_date" btree (bb_open_date)
"manuim_i_user_id" btree (user_id)


Any idea how can I continue from here ? Thanks , Mariel.

Re: [PERFORM] select with max functions

From
Gerardo Herzig
Date:

----- Mensaje original -----
> De: "Mariel Cherkassky" <mariel.cherkassky@gmail.com>
> Para: pgsql-performance@postgresql.org
> Enviados: Domingo, 1 de Octubre 2017 9:41:37
> Asunto: [PERFORM] select with max functions
> 
> Hi,
> I need to use the max function in my query. I had very bad performance when
> I used the max  :
> 
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT max(Bb_Open_Date)
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
> 
> 
>                                                QUERY PLAN
> 
> ---------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
>    Group Key: ma.user_id
>    Filter: (count(*) > 1)
>    ->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3250552295.59 rows=178324 width=10)
>          Filter: (bb_open_date = (SubPlan 1))
>          SubPlan 1
>            ->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
>                  ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>                        Index Cond: ((user_id)::text = (ma.user_id)::text)
> (9 rows)
> 
> 
> 
> So I used the limit 1 option :
> 
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT Bb_Open_Date
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id order
> by                                   bb_open_date desc limit 1
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
> 
> and the performance are still the same :
> 
>                                                   QUERY PLAN
> 
> ---------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
>    Group Key: ma.user_id
>    Filter: (count(*) > 1)
>    ->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3252246374.92 rows=178324 width=10)
>          Filter: (bb_open_date = (SubPlan 1))
>          SubPlan 1
>            ->  Limit  (cost=91.03..91.03 rows=1 width=8)
>                  ->  Sort  (cost=91.03..91.09 rows=22 width=8)
>                        Sort Key: man.bb_open_date DESC
>                        ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>                              Index Cond: ((user_id)::text =
> (ma.user_id)::text)
> (11 rows)
> 
> 
> 
> the reading on the table manuim takes a lot of effort, what else can I do ?
> the table`s size is 8G.
> 
> select count(*) from manuim;
>   count
> ----------
>  35664828
> (1 row)
> 
> the indexes on the table :
>  "manuim_bb_open_date" btree (bb_open_date)
> "manuim_i_user_id" btree (user_id)
> 
> 
> Any idea how can I continue from here ? Thanks , Mariel.

Start by posting the results of "explain analyze" of that queries, so we can see some timming stuff.

Gerardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From
Andreas Kretschmer
Date:

Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
> Hi,
> I need to use the max function in my query. I had very bad performance 
> when I used the max  :
>
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT max(Bb_Open_Date)
>                                    FROM  Manuim Man
>                                    WHERE Man.User_Id = Ma.User_Id
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
>
>
> Any idea how can I continue from here ? Thanks , Mariel.


Maybe you can rewrite it, for instance to

select distinct on (user_id, bb_open_date) user_id, bb_open_date, 
count(1) from Manuim group by 1,2 having count(1) > 1;

maybe much cheaper, but untested! If not, please share more details, at 
least table-definition.

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From
Mariel Cherkassky
Date:
Andreas I tried to rewrite it with the function rank() but I failed. The query you wrote isnt the same as what I search. Moreover, I cant use explain analyze because it is taking to much time to run and I'm getting timeout..

2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andreas@a-kretschmer.de>:


Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
Hi,
I need to use the max function in my query. I had very bad performance when I used the max  :

               SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  =
                                  (SELECT max(Bb_Open_Date)
                                   FROM  Manuim Man
                                   WHERE Man.User_Id = Ma.User_Id
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;


Any idea how can I continue from here ? Thanks , Mariel.


Maybe you can rewrite it, for instance to

select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1) from Manuim group by 1,2 having count(1) > 1;

maybe much cheaper, but untested! If not, please share more details, at least table-definition.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From
Gerardo Herzig
Date:

----- Mensaje original -----
> De: "Mariel Cherkassky" <mariel.cherkassky@gmail.com>
> Para: "Andreas Kretschmer" <andreas@a-kretschmer.de>
> CC: pgsql-performance@postgresql.org
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
> 
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
> 
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andreas@a-kretschmer.de>:

Do a "set statement_timeout TO 0" prior to "explain analyze"


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From
Mariel Cherkassky
Date:
explain analyze   SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  = 
                                  (SELECT Bb_Open_Date
                                   FROM   Manuim Man
                                   WHERE  Man.User_Id = Ma.User_Id order by                                   bb_open_date desc limit 1
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;
                                                                           QUERY PLAN                                    
                                        
-------------------------------------------------------------------------------------------------------------------------
----------------------------------------
 GroupAggregate  (cost=0.56..2430770384.80 rows=128137 width=18) (actual time=55.823..2970443.757 rows=1213 loops=1)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   Rows Removed by Filter: 3693020
   ->  Index Scan using manuim_i_user_id on manuim ma  (cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249
..2966355.734 rows=3695461 loops=1)
         Filter: (bb_open_date = (SubPlan 1))
         Rows Removed by Filter: 31969367
         SubPlan 1
           ->  Limit  (cost=68.00..68.00 rows=1 width=8) (actual time=0.082..0.082 rows=0 loops=35664828)
                 ->  Sort  (cost=68.00..68.04 rows=16 width=8) (actual time=0.081..0.081 rows=0 loops=35664828)
                       Sort Key: man.bb_open_date DESC
                       Sort Method: quicksort  Memory: 25kB
                       ->  Index Scan using manuim_i_user_id on manuim man  (cost=0.56..67.92 rows=16 width=8) (actual ti
me=0.001..0.069 rows=85 loops=35664828)
                             Index Cond: ((user_id)::text = (ma.user_id)::text)
 Planning time: 0.414 ms
 Execution time: 2970444.732 ms
(16 rows)

2017-10-02 16:45 GMT+03:00 Gerardo Herzig <gherzig@fmed.uba.ar>:


----- Mensaje original -----
> De: "Mariel Cherkassky" <mariel.cherkassky@gmail.com>
> Para: "Andreas Kretschmer" <andreas@a-kretschmer.de>
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
>
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
>
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andreas@a-kretschmer.de>:

Do a "set statement_timeout TO 0" prior to "explain analyze"

Re: [PERFORM] select with max functions

From
Tom Lane
Date:
Mariel Cherkassky <mariel.cherkassky@gmail.com> writes:
> explain analyze   SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT Bb_Open_Date
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id order
> by                                   bb_open_date desc limit 1
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,      COUNT(*) COUNT      FROM   Manuim Ma,             (SELECT User_Id, max(Bb_Open_Date) as max
        FROM   Manuim Man              GROUP BY User_Id) ss      WHERE  Ma.User_Id = ss.User_Id AND
Ma.Bb_Open_Date= ss.max      GROUP  BY Ma.User_Id      HAVING COUNT(*) > 1;
 

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.
        regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From
Mark Kirkwood
Date:
On 03/10/17 04:29, Tom Lane wrote:
> Mariel Cherkassky <mariel.cherkassky@gmail.com> writes:
>> explain analyze   SELECT Ma.User_Id,
>>                        COUNT(*) COUNT
>>                 FROM   Manuim Ma
>>                 WHERE  Ma.Bb_Open_Date  =
>>                                    (SELECT Bb_Open_Date
>>                                     FROM   Manuim Man
>>                                     WHERE  Man.User_Id = Ma.User_Id order
>> by                                   bb_open_date desc limit 1
>>                                    )
>>                 GROUP  BY Ma.User_Id
>>                 HAVING COUNT(*) > 1;
> The core problem with this query is that the sub-select has to be done
> over again for each row of the outer table, since it's a correlated
> sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
> a max() call with handmade logic doesn't do anything to help that.
> I'd try refactoring it so that you calculate the max Bb_Open_Date just
> once for each user id, perhaps along the lines of
>
> SELECT Ma.User_Id,
>         COUNT(*) COUNT
>         FROM   Manuim Ma,
>                (SELECT User_Id, max(Bb_Open_Date) as max
>                 FROM   Manuim Man
>                 GROUP BY User_Id) ss
>         WHERE  Ma.User_Id = ss.User_Id AND
>                Ma.Bb_Open_Date = ss.max
>         GROUP  BY Ma.User_Id
>         HAVING COUNT(*) > 1;
>
> This is still not going to be instantaneous, but it might be better.
>
> It's possible that an index on (User_Id, Bb_Open_Date) would help,
> but I'm not sure.
>
>             regards, tom lane
>
>

Further ideas based on Tom's rewrite: If that MAX is still expensive it 
might be worth breaking


SELECT User_Id, max(Bb_Open_Date) as max               FROM   Manuim Man               GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an 
equivalent  trigger based summary table (there are examples in the docs 
of how to do this).

Cheers

Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance