Home > mailing lists

Thread: [PERFORM] select with max functions

[PERFORM] select with max functions

From

Mariel Cherkassky

Date:

01 October 2017, 15:41:37

Hi,

I need to use the max function in my query. I had very bad performance when I used the max :

SELECT Ma.User_Id,

COUNT(*) COUNT

FROM Manuim Ma

WHERE Ma.Bb_Open_Date =

(SELECT max(Bb_Open_Date)

FROM Manuim Man

WHERE Man.User_Id = Ma.User_Id

)

GROUP BY Ma.User_Id

HAVING COUNT(*) > 1;

QUERY PLAN

---------------------------------------------------------------------------------------------------------

GroupAggregate (cost=0.56..3250554784.13 rows=115111 width=18)

Group Key: ma.user_id

Filter: (count(*) > 1)

-> Index Scan using manuim_i_user_id on manuim ma (cost=0.56..3250552295.59 rows=178324 width=10)

Filter: (bb_open_date = (SubPlan 1))

SubPlan 1

-> Aggregate (cost=90.98..90.99 rows=1 width=8)

-> Index Scan using manuim_i_user_id on manuim man (cost=0.56..90.92 rows=22 width=8)

Index Cond: ((user_id)::text = (ma.user_id)::text)

(9 rows)

So I used the limit 1 option :

SELECT Ma.User_Id,

COUNT(*) COUNT

FROM Manuim Ma

WHERE Ma.Bb_Open_Date =

(SELECT Bb_Open_Date

FROM Manuim Man

WHERE Man.User_Id = Ma.User_Id order by bb_open_date desc limit 1

)

GROUP BY Ma.User_Id

HAVING COUNT(*) > 1;

and the performance are still the same :

QUERY PLAN

---------------------------------------------------------------------------------------------------------------

GroupAggregate (cost=0.56..3252248863.46 rows=115111 width=18)

Group Key: ma.user_id

Filter: (count(*) > 1)

-> Index Scan using manuim_i_user_id on manuim ma (cost=0.56..3252246374.92 rows=178324 width=10)

Filter: (bb_open_date = (SubPlan 1))

SubPlan 1

-> Limit (cost=91.03..91.03 rows=1 width=8)

-> Sort (cost=91.03..91.09 rows=22 width=8)

Sort Key: man.bb_open_date DESC

-> Index Scan using manuim_i_user_id on manuim man (cost=0.56..90.92 rows=22 width=8)

Index Cond: ((user_id)::text = (ma.user_id)::text)

(11 rows)

the reading on the table manuim takes a lot of effort, what else can I do ? the table`s size is 8G.

select count(*) from manuim;

count

----------

35664828

(1 row)

the indexes on the table :

"manuim_bb_open_date" btree (bb_open_date)

"manuim_i_user_id" btree (user_id)

Any idea how can I continue from here ? Thanks , Mariel.

Re: [PERFORM] select with max functions

From

Gerardo Herzig

Date:

01 October 2017, 16:35:47


----- Mensaje original -----
> De: "Mariel Cherkassky" <mariel.cherkassky@gmail.com>
> Para: pgsql-performance@postgresql.org
> Enviados: Domingo, 1 de Octubre 2017 9:41:37
> Asunto: [PERFORM] select with max functions
> 
> Hi,
> I need to use the max function in my query. I had very bad performance when
> I used the max  :
> 
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT max(Bb_Open_Date)
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
> 
> 
>                                                QUERY PLAN
> 
> ---------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
>    Group Key: ma.user_id
>    Filter: (count(*) > 1)
>    ->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3250552295.59 rows=178324 width=10)
>          Filter: (bb_open_date = (SubPlan 1))
>          SubPlan 1
>            ->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
>                  ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>                        Index Cond: ((user_id)::text = (ma.user_id)::text)
> (9 rows)
> 
> 
> 
> So I used the limit 1 option :
> 
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT Bb_Open_Date
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id order
> by                                   bb_open_date desc limit 1
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
> 
> and the performance are still the same :
> 
>                                                   QUERY PLAN
> 
> ---------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
>    Group Key: ma.user_id
>    Filter: (count(*) > 1)
>    ->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3252246374.92 rows=178324 width=10)
>          Filter: (bb_open_date = (SubPlan 1))
>          SubPlan 1
>            ->  Limit  (cost=91.03..91.03 rows=1 width=8)
>                  ->  Sort  (cost=91.03..91.09 rows=22 width=8)
>                        Sort Key: man.bb_open_date DESC
>                        ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>                              Index Cond: ((user_id)::text =
> (ma.user_id)::text)
> (11 rows)
> 
> 
> 
> the reading on the table manuim takes a lot of effort, what else can I do ?
> the table`s size is 8G.
> 
> select count(*) from manuim;
>   count
> ----------
>  35664828
> (1 row)
> 
> the indexes on the table :
>  "manuim_bb_open_date" btree (bb_open_date)
> "manuim_i_user_id" btree (user_id)
> 
> 
> Any idea how can I continue from here ? Thanks , Mariel.

Start by posting the results of "explain analyze" of that queries, so we can see some timming stuff.

Gerardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From

Andreas Kretschmer

Date:

01 October 2017, 21:48:45


Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
> Hi,
> I need to use the max function in my query. I had very bad performance 
> when I used the max  :
>
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT max(Bb_Open_Date)
>                                    FROM  Manuim Man
>                                    WHERE Man.User_Id = Ma.User_Id
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
>
>
> Any idea how can I continue from here ? Thanks , Mariel.


Maybe you can rewrite it, for instance to

select distinct on (user_id, bb_open_date) user_id, bb_open_date, 
count(1) from Manuim group by 1,2 having count(1) > 1;

maybe much cheaper, but untested! If not, please share more details, at 
least table-definition.

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From

Mariel Cherkassky

Date:

02 October 2017, 16:25:19

Andreas I tried to rewrite it with the function rank() but I failed. The query you wrote isnt the same as what I search. Moreover, I cant use explain analyze because it is taking to much time to run and I'm getting timeout..

2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andreas@a-kretschmer.de>:

Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
Hi,
I need to use the max function in my query. I had very bad performance when I used the max :

SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma
WHERE Ma.Bb_Open_Date =
(SELECT max(Bb_Open_Date)
FROM Manuim Man
WHERE Man.User_Id = Ma.User_Id
)
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;

Any idea how can I continue from here ? Thanks , Mariel.

Maybe you can rewrite it, for instance to

select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1) from Manuim group by 1,2 having count(1) > 1;

maybe much cheaper, but untested! If not, please share more details, at least table-definition.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From

Gerardo Herzig

Date:

02 October 2017, 16:45:11


----- Mensaje original -----
> De: "Mariel Cherkassky" <mariel.cherkassky@gmail.com>
> Para: "Andreas Kretschmer" <andreas@a-kretschmer.de>
> CC: pgsql-performance@postgresql.org
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
> 
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
> 
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andreas@a-kretschmer.de>:

Do a "set statement_timeout TO 0" prior to "explain analyze"


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From

Mariel Cherkassky

Date:

02 October 2017, 17:45:36

explain analyze SELECT Ma.User_Id,

COUNT(*) COUNT

FROM Manuim Ma

WHERE Ma.Bb_Open_Date =

(SELECT Bb_Open_Date

FROM Manuim Man

WHERE Man.User_Id = Ma.User_Id order by bb_open_date desc limit 1

)

GROUP BY Ma.User_Id

HAVING COUNT(*) > 1;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

----------------------------------------

GroupAggregate (cost=0.56..2430770384.80 rows=128137 width=18) (actual time=55.823..2970443.757 rows=1213 loops=1)

Group Key: ma.user_id

Filter: (count(*) > 1)

Rows Removed by Filter: 3693020

-> Index Scan using manuim_i_user_id on manuim ma (cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249

..2966355.734 rows=3695461 loops=1)

Filter: (bb_open_date = (SubPlan 1))

Rows Removed by Filter: 31969367

SubPlan 1

-> Limit (cost=68.00..68.00 rows=1 width=8) (actual time=0.082..0.082 rows=0 loops=35664828)

-> Sort (cost=68.00..68.04 rows=16 width=8) (actual time=0.081..0.081 rows=0 loops=35664828)

Sort Key: man.bb_open_date DESC

Sort Method: quicksort Memory: 25kB

-> Index Scan using manuim_i_user_id on manuim man (cost=0.56..67.92 rows=16 width=8) (actual ti

me=0.001..0.069 rows=85 loops=35664828)

Index Cond: ((user_id)::text = (ma.user_id)::text)

Planning time: 0.414 ms

Execution time: 2970444.732 ms

(16 rows)

2017-10-02 16:45 GMT+03:00 Gerardo Herzig <gherzig@fmed.uba.ar>:

----- Mensaje original -----
> De: "Mariel Cherkassky" <mariel.cherkassky@gmail.com>
> Para: "Andreas Kretschmer" <andreas@a-kretschmer.de>
> CC: pgsql-performance@postgresql.org
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
>
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
>
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andreas@a-kretschmer.de>:

Do a "set statement_timeout TO 0" prior to "explain analyze"

Re: [PERFORM] select with max functions

From

Tom Lane

Date:

02 October 2017, 18:29:59

Mariel Cherkassky <mariel.cherkassky@gmail.com> writes:
> explain analyze   SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT Bb_Open_Date
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id order
> by                                   bb_open_date desc limit 1
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,      COUNT(*) COUNT      FROM   Manuim Ma,             (SELECT User_Id, max(Bb_Open_Date) as max
        FROM   Manuim Man              GROUP BY User_Id) ss      WHERE  Ma.User_Id = ss.User_Id AND
Ma.Bb_Open_Date= ss.max      GROUP  BY Ma.User_Id      HAVING COUNT(*) > 1;
 

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.
        regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select with max functions

From

Mark Kirkwood

Date:

04 October 2017, 00:16:37

On 03/10/17 04:29, Tom Lane wrote:
> Mariel Cherkassky <mariel.cherkassky@gmail.com> writes:
>> explain analyze   SELECT Ma.User_Id,
>>                        COUNT(*) COUNT
>>                 FROM   Manuim Ma
>>                 WHERE  Ma.Bb_Open_Date  =
>>                                    (SELECT Bb_Open_Date
>>                                     FROM   Manuim Man
>>                                     WHERE  Man.User_Id = Ma.User_Id order
>> by                                   bb_open_date desc limit 1
>>                                    )
>>                 GROUP  BY Ma.User_Id
>>                 HAVING COUNT(*) > 1;
> The core problem with this query is that the sub-select has to be done
> over again for each row of the outer table, since it's a correlated
> sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
> a max() call with handmade logic doesn't do anything to help that.
> I'd try refactoring it so that you calculate the max Bb_Open_Date just
> once for each user id, perhaps along the lines of
>
> SELECT Ma.User_Id,
>         COUNT(*) COUNT
>         FROM   Manuim Ma,
>                (SELECT User_Id, max(Bb_Open_Date) as max
>                 FROM   Manuim Man
>                 GROUP BY User_Id) ss
>         WHERE  Ma.User_Id = ss.User_Id AND
>                Ma.Bb_Open_Date = ss.max
>         GROUP  BY Ma.User_Id
>         HAVING COUNT(*) > 1;
>
> This is still not going to be instantaneous, but it might be better.
>
> It's possible that an index on (User_Id, Bb_Open_Date) would help,
> but I'm not sure.
>
>             regards, tom lane
>
>

Further ideas based on Tom's rewrite: If that MAX is still expensive it 
might be worth breaking


SELECT User_Id, max(Bb_Open_Date) as max               FROM   Manuim Man               GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an 
equivalent  trigger based summary table (there are examples in the docs 
of how to do this).

Cheers

Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance