Thread: How to see memory usage using explain analyze ?

How to see memory usage using explain analyze ?

From
hyelluas
Date:
Hello,

I need to compare quiery execution : I have 2 tables partitioned by Datex (
daily):

summary_daily (
 counter          | bigint
 datasource_id    | integer
 application_id   | integer
 action           | character(1)
 srcreporter_id   | integer
 destreporter_id  | integer
 bytes            | bigint
 srcusergroup_id  | integer
 datex            | timestamp with time zone
 root_cause_id    | integer
 rule_id            | integer
 srcgeo_id        | integer
 destgeo_id       | integer
 mlapp_id         | bigint
)

app (
 counter        | bigint
 bytes          | bigint
 action         | character(1)
 datex          | timestamp with time zone
 datasource_id  | integer
 application_id | integer
 mlapp_id       | bigint
 root_cause_id  | integer
)


The second table has been created from the first by aggregation.

 table Summary has 9 mln rec per partition,
 table App has 7 mln rec per partition

execution plan looks the same except the actual time is a huge difference.

work_mem=10mb,

days/partitions      query from Summary        query from App

1    2.5 sec                           1 sec
3    5.5 sec                           1.5 sec
7    60 sec                            8 sec.

when I set session work_mem=60mb  query for 7 days takes 8.5 sec vs 60 sec.

how can I see where/when  it is using disk or memory?

explain analyze SELECT summary_app.action, sum(summary_app.counter),
summary_app.mlapp_id,
 summary_app.application_id, sum(summary_app.bytes),
summary_app.root_cause_id
 FROM summary_app
 WHERE  summary_app.datasource_id = 10 and
 summary_app.datex >= '2011-08-03 00:00:00+00'::timestamp with time zone
 AND summary_app.datex < '2011-08-06 00:00:00+00'::timestamp with time zone
 group by mlapp_id, application_id,action, root_cause_id



HashAggregate  (cost=8223.97..8226.97 rows=200 width=37) (actual
time=4505.607..4506.806 rows=3126 loops=1)
  ->  Append  (cost=0.00..8213.42 rows=703 width=37) (actual
time=1071.043..4046.780 rows=283968 loops=1)
        ->  Seq Scan on summary_daily_data summary_app  (cost=0.00..23.83
rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1)
              Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp with
time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)
AND (datasource_id = 10))
        ->  Bitmap Heap Scan on summ_daily_15191 summary_app
(cost=1854.89..2764.60 rows=234 width=37) (actual time=1071.041..1343.235
rows=94656 loops=1)
              Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-08-03
00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))
              ->  BitmapAnd  (cost=1854.89..1854.89 rows=234 width=0)
(actual time=1054.310..1054.310 rows=0 loops=1)
                    ->  Bitmap Index Scan on ind_fw_15191
(cost=0.00..868.69 rows=46855 width=0) (actual time=17.896..17.896
rows=94656 loops=1)
                          Index Cond: (datasource_id = 10)
                    ->  Bitmap Index Scan on ind_datex_15191
(cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
rows=9370944 loops=1)
                          Index Cond: ((datex >= '2011-08-03
00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))


the same query from the smaller table:


HashAggregate  (cost=252859.36..253209.94 rows=23372 width=34) (actual
time=371.164..372.153 rows=3126 loops=1)
  ->  Append  (cost=0.00..249353.62 rows=233716 width=34) (actual
time=11.028..115.915 rows=225072 loops=1)
        ->  Seq Scan on summary_app  (cost=0.00..28.03 rows=1 width=37)
(actual time=0.001..0.001 rows=0 loops=1)
              Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp with
time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)
AND (datasource_id = 10))
        ->  Bitmap Heap Scan on summ_app_15191 summary_app
(cost=2299.40..82014.85 rows=72293 width=34) (actual time=11.027..31.341
rows=75024 loops=1)
              Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-08-03
00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))
              ->  Bitmap Index Scan on summ_app_fw_datex_15191
(cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910
rows=75024 loops=1)
                    Index Cond: ((datasource_id = 10) AND (datex >=
'2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
00:00:00+00'::timestamp with time zone))


Why the difference is so large? How I can tune this query?

thank you.

Helen






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694681p4694681.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How to see memory usage using explain analyze ?

From
"Igor Neyman"
Date:

> -----Original Message-----
> From: hyelluas [mailto:helen_yelluas@mcafee.com]
> Sent: Friday, August 12, 2011 5:30 PM
> To: pgsql-performance@postgresql.org
> Subject: How to see memory usage using explain analyze ?
>
> Hello,
>
> I need to compare quiery execution : I have 2 tables partitioned by
> Datex (
> daily):
>
> summary_daily (
>  counter          | bigint
>  datasource_id    | integer
>  application_id   | integer
>  action           | character(1)
>  srcreporter_id   | integer
>  destreporter_id  | integer
>  bytes            | bigint
>  srcusergroup_id  | integer
>  datex            | timestamp with time zone
>  root_cause_id    | integer
>  rule_id            | integer
>  srcgeo_id        | integer
>  destgeo_id       | integer
>  mlapp_id         | bigint
> )
>
> app (
>  counter        | bigint
>  bytes          | bigint
>  action         | character(1)
>  datex          | timestamp with time zone
>  datasource_id  | integer
>  application_id | integer
>  mlapp_id       | bigint
>  root_cause_id  | integer
> )
>
>
> The second table has been created from the first by aggregation.
>
>  table Summary has 9 mln rec per partition,
>  table App has 7 mln rec per partition
>
> execution plan looks the same except the actual time is a huge
> difference.
>
> work_mem=10mb,
>
> days/partitions      query from Summary        query from App
>
> 1    2.5 sec                           1 sec
> 3    5.5 sec                           1.5 sec
> 7    60 sec                            8 sec.
>
> when I set session work_mem=60mb  query for 7 days takes 8.5 sec vs 60
> sec.
>
> how can I see where/when  it is using disk or memory?
>
> explain analyze SELECT summary_app.action, sum(summary_app.counter),
> summary_app.mlapp_id,
>  summary_app.application_id, sum(summary_app.bytes),
> summary_app.root_cause_id
>  FROM summary_app
>  WHERE  summary_app.datasource_id = 10 and
>  summary_app.datex >= '2011-08-03 00:00:00+00'::timestamp with time
> zone
>  AND summary_app.datex < '2011-08-06 00:00:00+00'::timestamp with time
> zone
>  group by mlapp_id, application_id,action, root_cause_id
>
>
>
> HashAggregate  (cost=8223.97..8226.97 rows=200 width=37) (actual
> time=4505.607..4506.806 rows=3126 loops=1)
>   ->  Append  (cost=0.00..8213.42 rows=703 width=37) (actual
> time=1071.043..4046.780 rows=283968 loops=1)
>         ->  Seq Scan on summary_daily_data summary_app
> (cost=0.00..23.83
> rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp
> with
> time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time
> zone)
> AND (datasource_id = 10))
>         ->  Bitmap Heap Scan on summ_daily_15191 summary_app
> (cost=1854.89..2764.60 rows=234 width=37) (actual
> time=1071.041..1343.235
> rows=94656 loops=1)
>               Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-
> 08-03
> 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
> 00:00:00+00'::timestamp with time zone))
>               ->  BitmapAnd  (cost=1854.89..1854.89 rows=234 width=0)
> (actual time=1054.310..1054.310 rows=0 loops=1)
>                     ->  Bitmap Index Scan on ind_fw_15191
> (cost=0.00..868.69 rows=46855 width=0) (actual time=17.896..17.896
> rows=94656 loops=1)
>                           Index Cond: (datasource_id = 10)
>                     ->  Bitmap Index Scan on ind_datex_15191
> (cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
> rows=9370944 loops=1)
>                           Index Cond: ((datex >= '2011-08-03
> 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
> 00:00:00+00'::timestamp with time zone))
>
>
> the same query from the smaller table:
>
>
> HashAggregate  (cost=252859.36..253209.94 rows=23372 width=34) (actual
> time=371.164..372.153 rows=3126 loops=1)
>   ->  Append  (cost=0.00..249353.62 rows=233716 width=34) (actual
> time=11.028..115.915 rows=225072 loops=1)
>         ->  Seq Scan on summary_app  (cost=0.00..28.03 rows=1
width=37)
> (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp
> with
> time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time
> zone)
> AND (datasource_id = 10))
>         ->  Bitmap Heap Scan on summ_app_15191 summary_app
> (cost=2299.40..82014.85 rows=72293 width=34) (actual
> time=11.027..31.341
> rows=75024 loops=1)
>               Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-
> 08-03
> 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
> 00:00:00+00'::timestamp with time zone))
>               ->  Bitmap Index Scan on summ_app_fw_datex_15191
> (cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910
> rows=75024 loops=1)
>                     Index Cond: ((datasource_id = 10) AND (datex >=
> '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex <
'2011-
> 08-06
> 00:00:00+00'::timestamp with time zone))
>
>
> Why the difference is so large? How I can tune this query?
>
> thank you.
>
> Helen
>
>
>
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
> explain-analyze-tp4694681p4694681.html
> Sent from the PostgreSQL - performance mailing list archive at
> Nabble.com.

Helen,

I'm probably a bit late answering your question.
But, just in case...

It looks like one table has "combined" index summ_app_fw_datex_15191 on
both: datasource_id and datex, which works better than 2 separate
indexes ind_datex_15191(datex) and ind_fw_15191(datasource_id), that you
have on the other table.
Besides, this:

->  Bitmap Index Scan on ind_datex_15191
(cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
rows=9370944 loops=1)

Shows that statistics on ind_datex_15191 are completely "out of wack"
(expected rows=46855, actual rows=9370944).

HTH,
Igor Neyman


Re: How to see memory usage using explain analyze ?

From
hyelluas
Date:
Igor,

thank you , my tests showed better performance against the larger summary
tables when I splited the index for datasource_id & datex , I use to have a
composed index.

Regarding that index statistics - should I analyze the tables? I thought
auto vacuum takes care of it.

helen

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694962p4701919.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: How to see memory usage using explain analyze ?

From
"Igor Neyman"
Date:
> -----Original Message-----
> From: hyelluas [mailto:helen_yelluas@mcafee.com]
> Sent: Monday, August 15, 2011 2:33 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: How to see memory usage using explain analyze ?
>
> Igor,
>
> thank you , my tests showed better performance against the larger
> summary
> tables when I splited the index for datasource_id & datex , I use to
> have a
> composed index.
>
> Regarding that index statistics - should I analyze the tables? I
> thought
> auto vacuum takes care of it.
>
> helen
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
> explain-analyze-tp4694962p4701919.html
> Sent from the PostgreSQL - performance mailing list archive at
> Nabble.com.


But, having different sets of indexes, you can't compare execution
plans.
In regards to statistics, you could try to ANALYZE table manually, may
be increasing "default_statistics_target".
From the docs:

"default_statistics_target (integer)

    Sets the default statistics target for table columns that have not
had a column-specific target set via ALTER TABLE SET STATISTICS. Larger
values increase the time needed to do ANALYZE, but might improve the
quality of the planner's estimates. The default is 10. For more
information on the use of statistics by the PostgreSQL query planner,
refer to Section 14.2."

HTH,
Igor

Re: How to see memory usage using explain analyze ?

From
hyelluas
Date:
Igor,

Thank you for the hint, I read about the planner, added "vacuum analyze " to
my procedures.

There is no join in my query but GROUP BY that is taking all the time and I
don't know how to tune it.
It gets executed by the procedure, the execution time requirement is < 4
sec,
but it takes 8-11 sec against 3 partitions , 9 mln rec each, it goes to 22
sec for 5 partitions.


I've been testing PostgreSQL performance for the last 2 months, comparing it
whith MySQL,
PostgreSQL performance with 5+ mln records on the table with 14 columns is
worse.
Is 14 columns is a big table for Postgres or 5mln rec is a big table?

The whole picture is that there are 2 databases : OLTP & "OLAP" that use to
be on different machines and on different databases.
The new project requires to put it on one database & machine.

I preferred Postgres ( poorly designed oltp would not suffer even more on
mysql) and now I'm trying to tune OLAP db.

thank you.

Helen


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694962p4709415.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.