Thread: How to see memory usage using explain analyze ?
Hello, I need to compare quiery execution : I have 2 tables partitioned by Datex ( daily): summary_daily ( counter | bigint datasource_id | integer application_id | integer action | character(1) srcreporter_id | integer destreporter_id | integer bytes | bigint srcusergroup_id | integer datex | timestamp with time zone root_cause_id | integer rule_id | integer srcgeo_id | integer destgeo_id | integer mlapp_id | bigint ) app ( counter | bigint bytes | bigint action | character(1) datex | timestamp with time zone datasource_id | integer application_id | integer mlapp_id | bigint root_cause_id | integer ) The second table has been created from the first by aggregation. table Summary has 9 mln rec per partition, table App has 7 mln rec per partition execution plan looks the same except the actual time is a huge difference. work_mem=10mb, days/partitions query from Summary query from App 1 2.5 sec 1 sec 3 5.5 sec 1.5 sec 7 60 sec 8 sec. when I set session work_mem=60mb query for 7 days takes 8.5 sec vs 60 sec. how can I see where/when it is using disk or memory? explain analyze SELECT summary_app.action, sum(summary_app.counter), summary_app.mlapp_id, summary_app.application_id, sum(summary_app.bytes), summary_app.root_cause_id FROM summary_app WHERE summary_app.datasource_id = 10 and summary_app.datex >= '2011-08-03 00:00:00+00'::timestamp with time zone AND summary_app.datex < '2011-08-06 00:00:00+00'::timestamp with time zone group by mlapp_id, application_id,action, root_cause_id HashAggregate (cost=8223.97..8226.97 rows=200 width=37) (actual time=4505.607..4506.806 rows=3126 loops=1) -> Append (cost=0.00..8213.42 rows=703 width=37) (actual time=1071.043..4046.780 rows=283968 loops=1) -> Seq Scan on summary_daily_data summary_app (cost=0.00..23.83 rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone) AND (datasource_id = 10)) -> Bitmap Heap Scan on summ_daily_15191 summary_app (cost=1854.89..2764.60 rows=234 width=37) (actual time=1071.041..1343.235 rows=94656 loops=1) Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)) -> BitmapAnd (cost=1854.89..1854.89 rows=234 width=0) (actual time=1054.310..1054.310 rows=0 loops=1) -> Bitmap Index Scan on ind_fw_15191 (cost=0.00..868.69 rows=46855 width=0) (actual time=17.896..17.896 rows=94656 loops=1) Index Cond: (datasource_id = 10) -> Bitmap Index Scan on ind_datex_15191 (cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834 rows=9370944 loops=1) Index Cond: ((datex >= '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)) the same query from the smaller table: HashAggregate (cost=252859.36..253209.94 rows=23372 width=34) (actual time=371.164..372.153 rows=3126 loops=1) -> Append (cost=0.00..249353.62 rows=233716 width=34) (actual time=11.028..115.915 rows=225072 loops=1) -> Seq Scan on summary_app (cost=0.00..28.03 rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone) AND (datasource_id = 10)) -> Bitmap Heap Scan on summ_app_15191 summary_app (cost=2299.40..82014.85 rows=72293 width=34) (actual time=11.027..31.341 rows=75024 loops=1) Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)) -> Bitmap Index Scan on summ_app_fw_datex_15191 (cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910 rows=75024 loops=1) Index Cond: ((datasource_id = 10) AND (datex >= '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time zone)) Why the difference is so large? How I can tune this query? thank you. Helen -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694681p4694681.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
> -----Original Message----- > From: hyelluas [mailto:helen_yelluas@mcafee.com] > Sent: Friday, August 12, 2011 5:30 PM > To: pgsql-performance@postgresql.org > Subject: How to see memory usage using explain analyze ? > > Hello, > > I need to compare quiery execution : I have 2 tables partitioned by > Datex ( > daily): > > summary_daily ( > counter | bigint > datasource_id | integer > application_id | integer > action | character(1) > srcreporter_id | integer > destreporter_id | integer > bytes | bigint > srcusergroup_id | integer > datex | timestamp with time zone > root_cause_id | integer > rule_id | integer > srcgeo_id | integer > destgeo_id | integer > mlapp_id | bigint > ) > > app ( > counter | bigint > bytes | bigint > action | character(1) > datex | timestamp with time zone > datasource_id | integer > application_id | integer > mlapp_id | bigint > root_cause_id | integer > ) > > > The second table has been created from the first by aggregation. > > table Summary has 9 mln rec per partition, > table App has 7 mln rec per partition > > execution plan looks the same except the actual time is a huge > difference. > > work_mem=10mb, > > days/partitions query from Summary query from App > > 1 2.5 sec 1 sec > 3 5.5 sec 1.5 sec > 7 60 sec 8 sec. > > when I set session work_mem=60mb query for 7 days takes 8.5 sec vs 60 > sec. > > how can I see where/when it is using disk or memory? > > explain analyze SELECT summary_app.action, sum(summary_app.counter), > summary_app.mlapp_id, > summary_app.application_id, sum(summary_app.bytes), > summary_app.root_cause_id > FROM summary_app > WHERE summary_app.datasource_id = 10 and > summary_app.datex >= '2011-08-03 00:00:00+00'::timestamp with time > zone > AND summary_app.datex < '2011-08-06 00:00:00+00'::timestamp with time > zone > group by mlapp_id, application_id,action, root_cause_id > > > > HashAggregate (cost=8223.97..8226.97 rows=200 width=37) (actual > time=4505.607..4506.806 rows=3126 loops=1) > -> Append (cost=0.00..8213.42 rows=703 width=37) (actual > time=1071.043..4046.780 rows=283968 loops=1) > -> Seq Scan on summary_daily_data summary_app > (cost=0.00..23.83 > rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1) > Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp > with > time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time > zone) > AND (datasource_id = 10)) > -> Bitmap Heap Scan on summ_daily_15191 summary_app > (cost=1854.89..2764.60 rows=234 width=37) (actual > time=1071.041..1343.235 > rows=94656 loops=1) > Recheck Cond: ((datasource_id = 10) AND (datex >= '2011- > 08-03 > 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 > 00:00:00+00'::timestamp with time zone)) > -> BitmapAnd (cost=1854.89..1854.89 rows=234 width=0) > (actual time=1054.310..1054.310 rows=0 loops=1) > -> Bitmap Index Scan on ind_fw_15191 > (cost=0.00..868.69 rows=46855 width=0) (actual time=17.896..17.896 > rows=94656 loops=1) > Index Cond: (datasource_id = 10) > -> Bitmap Index Scan on ind_datex_15191 > (cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834 > rows=9370944 loops=1) > Index Cond: ((datex >= '2011-08-03 > 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 > 00:00:00+00'::timestamp with time zone)) > > > the same query from the smaller table: > > > HashAggregate (cost=252859.36..253209.94 rows=23372 width=34) (actual > time=371.164..372.153 rows=3126 loops=1) > -> Append (cost=0.00..249353.62 rows=233716 width=34) (actual > time=11.028..115.915 rows=225072 loops=1) > -> Seq Scan on summary_app (cost=0.00..28.03 rows=1 width=37) > (actual time=0.001..0.001 rows=0 loops=1) > Filter: ((datex >= '2011-08-03 00:00:00+00'::timestamp > with > time zone) AND (datex < '2011-08-06 00:00:00+00'::timestamp with time > zone) > AND (datasource_id = 10)) > -> Bitmap Heap Scan on summ_app_15191 summary_app > (cost=2299.40..82014.85 rows=72293 width=34) (actual > time=11.027..31.341 > rows=75024 loops=1) > Recheck Cond: ((datasource_id = 10) AND (datex >= '2011- > 08-03 > 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06 > 00:00:00+00'::timestamp with time zone)) > -> Bitmap Index Scan on summ_app_fw_datex_15191 > (cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910 > rows=75024 loops=1) > Index Cond: ((datasource_id = 10) AND (datex >= > '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex < '2011- > 08-06 > 00:00:00+00'::timestamp with time zone)) > > > Why the difference is so large? How I can tune this query? > > thank you. > > Helen > > > > > > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using- > explain-analyze-tp4694681p4694681.html > Sent from the PostgreSQL - performance mailing list archive at > Nabble.com. Helen, I'm probably a bit late answering your question. But, just in case... It looks like one table has "combined" index summ_app_fw_datex_15191 on both: datasource_id and datex, which works better than 2 separate indexes ind_datex_15191(datex) and ind_fw_15191(datasource_id), that you have on the other table. Besides, this: -> Bitmap Index Scan on ind_datex_15191 (cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834 rows=9370944 loops=1) Shows that statistics on ind_datex_15191 are completely "out of wack" (expected rows=46855, actual rows=9370944). HTH, Igor Neyman
Igor, thank you , my tests showed better performance against the larger summary tables when I splited the index for datasource_id & datex , I use to have a composed index. Regarding that index statistics - should I analyze the tables? I thought auto vacuum takes care of it. helen -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694962p4701919.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
> -----Original Message----- > From: hyelluas [mailto:helen_yelluas@mcafee.com] > Sent: Monday, August 15, 2011 2:33 PM > To: pgsql-performance@postgresql.org > Subject: Re: How to see memory usage using explain analyze ? > > Igor, > > thank you , my tests showed better performance against the larger > summary > tables when I splited the index for datasource_id & datex , I use to > have a > composed index. > > Regarding that index statistics - should I analyze the tables? I > thought > auto vacuum takes care of it. > > helen > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using- > explain-analyze-tp4694962p4701919.html > Sent from the PostgreSQL - performance mailing list archive at > Nabble.com. But, having different sets of indexes, you can't compare execution plans. In regards to statistics, you could try to ANALYZE table manually, may be increasing "default_statistics_target". From the docs: "default_statistics_target (integer) Sets the default statistics target for table columns that have not had a column-specific target set via ALTER TABLE SET STATISTICS. Larger values increase the time needed to do ANALYZE, but might improve the quality of the planner's estimates. The default is 10. For more information on the use of statistics by the PostgreSQL query planner, refer to Section 14.2." HTH, Igor
Igor, Thank you for the hint, I read about the planner, added "vacuum analyze " to my procedures. There is no join in my query but GROUP BY that is taking all the time and I don't know how to tune it. It gets executed by the procedure, the execution time requirement is < 4 sec, but it takes 8-11 sec against 3 partitions , 9 mln rec each, it goes to 22 sec for 5 partitions. I've been testing PostgreSQL performance for the last 2 months, comparing it whith MySQL, PostgreSQL performance with 5+ mln records on the table with 14 columns is worse. Is 14 columns is a big table for Postgres or 5mln rec is a big table? The whole picture is that there are 2 databases : OLTP & "OLAP" that use to be on different machines and on different databases. The new project requires to put it on one database & machine. I preferred Postgres ( poorly designed oltp would not suffer even more on mysql) and now I'm trying to tune OLAP db. thank you. Helen -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-explain-analyze-tp4694962p4709415.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.