Thread: vacuum / analyze parent tables on partitioned tables.
Hello,
I first wrote, by mistake, to the sql mailing list. But here is my e-mail:
table_name ; avg runt time ; max run time ; min run time
"f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03"
"f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16"
f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.
Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).
wkr,
Bertwkr,
Bert wrote: > I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very > well. Autovacuum doesn't do the job for you? That would save you from worries like the ones you have. > Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table > as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their > vacuum/analyze time is much shorter). ANALYZE on the parent table gathers statistics about the child tables as well, so it scans them too. You still need to ANALYZE and VACUUM the child tables though. Yours, Laurenz Albe
> autovacuum is still enabled, but we want to avoid autovacuum to hit right in the day / when the etl is
> working. So that's why we want to shedule it by hand, before autovacuum starts.
> Is it a good idea to exclude those tables then? And run only vacuum / analyze on the child tables, an
> wait for autovacuum to analyze the parent tables if it ever would become necesarry..
Maybe somebody on the list would know if autovacuum/analyze on the parent table
will scan the children - I believe it does not.
You could also test it yourself.
Yours,
Laurenz Albe
On Wed, Jan 23, 2013 at 1:38 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Bert wrote:Autovacuum doesn't do the job for you?
> I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very
> well.
That would save you from worries like the ones you have.ANALYZE on the parent table gathers statistics about the child
> Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table
> as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their
> vacuum/analyze time is much shorter).
tables as well, so it scans them too.
You still need to ANALYZE and VACUUM the child tables though.
Yours,
Laurenz Albe
--
Bert Desmet
0477/305361
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result. 于 2013/1/23 19:43, Bert 写道: > Hello, > > I first wrote, by mistake, to the sql mailing list. But here is my > e-mail: > > I wrote a script to make sure all tables are vacuumed and analyzed > every evening. This works very well. > I save in a table the start and end time of a vacuum/analyze. This way > I can measure what tables take a long time to vaccum/analyze, and what > tables are slow. (and much more). > > But I have noticed that the parent table of a partitioned table also > takes a long time. Here is a snap shot of the following table > > table_name ; avg runt time ; max run time ; min > run time > "f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03" > "f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16" > > f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of > indexes). f_tranaction is totally empy, but also contains all indexes. > Which means 0B table zise, and 140kB index size. > > Does anyone has an idea why in this case the vacuum/analyze takes > almost as long on the parent table as on the biggest child table? (the > other child tables are smaller than f_tranaction_1, and their > vacuum/analyze time is much shorter). > > wkr, > Bert
Yes, at the moment that is still the case.
We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.
We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too. We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.
I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.
cheers,
Bert
On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.
于 2013/1/23 19:43, Bert 写道:--Hello,
I first wrote, by mistake, to the sql mailing list. But here is my e-mail:
I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).
But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table
table_name ; avg runt time ; max run time ; min run time
"f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03"
"f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16"
f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.
Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).
wkr,
Bert
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
--
Hi Bert,
Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..
To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.
Thanks & Regards,
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar
Cell: +91 932 568 2271
On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@gmail.com> wrote:
but thank you for giving me some clarification.Yes, at the moment that is still the case.We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.
We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.
I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.
cheers,
Bert--On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.
于 2013/1/23 19:43, Bert 写道:--Hello,
I first wrote, by mistake, to the sql mailing list. But here is my e-mail:
I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).
But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table
table_name ; avg runt time ; max run time ; min run time
"f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03"
"f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16"
f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.
Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).
wkr,
Bert
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Hello,
I can see why that would be a good practice if you do a daily load, but the goal is to have a 'live' dwh. with updated statistics every so-many-hours.
Thank you for the information anyway. I guess I will ask more questions here when time passes. And I'll try to share some knowlegde too! :)
cheers,
Bert
I can see why that would be a good practice if you do a daily load, but the goal is to have a 'live' dwh. with updated statistics every so-many-hours.
Thank you for the information anyway. I guess I will ask more questions here when time passes. And I'll try to share some knowlegde too! :)
cheers,
Bert
On Wed, Jan 23, 2013 at 4:45 PM, Prashanth Ranjalkar <prashant.ranjalkar@gmail.com> wrote:
Hi Bert,Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.Thanks & Regards,Prashanth RanjalkarDatabase Consultant & ArchitectSkype:prashanth.ranjalkarCell: +91 932 568 2271On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@gmail.com> wrote:but thank you for giving me some clarification.Yes, at the moment that is still the case.We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.
We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.
I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.
cheers,
Bert--On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.
于 2013/1/23 19:43, Bert 写道:--Hello,
I first wrote, by mistake, to the sql mailing list. But here is my e-mail:
I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).
But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table
table_name ; avg runt time ; max run time ; min run time
"f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03"
"f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16"
f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.
Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).
wkr,
Bert
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
--
Bert Desmet
0477/305361
Hi Bert,
Yes, ETL process needs to load the data every day into datawarehouse database from OLTP database and there would be no DML operations executing on DWH DB which is mainly used for analyzing the data and getting the reports for making business decisions. After every data load process, statistices should be updated in order to gain the performance therefore it's best practice to add analyze command for the respective tables in ETL script itself. If auto vacuum is enabled on tables which participate in ETL process, autovacuum kicks off frequently when its threshold values met and keep running affecting data load process.
If we disable autovacuum on table level (NOT at cluster level) auto vacuum never kicks off during ETL process and ETL job get finished quickly with analyze operation completed as well.
This is a daily process therefore you no need to worry about the autovacuum kick off and analyze operations which get executed manually from ETL job.
Thanks & Regards,
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar
Cell: +91 932 568 2271
On Thu, Jan 24, 2013 at 11:56 AM, Bert <biertie@gmail.com> wrote:
Hello,
I can see why that would be a good practice if you do a daily load, but the goal is to have a 'live' dwh. with updated statistics every so-many-hours.
Thank you for the information anyway. I guess I will ask more questions here when time passes. And I'll try to share some knowlegde too! :)
cheers,
Bert--On Wed, Jan 23, 2013 at 4:45 PM, Prashanth Ranjalkar <prashant.ranjalkar@gmail.com> wrote:Hi Bert,Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.Thanks & Regards,Prashanth RanjalkarDatabase Consultant & ArchitectSkype:prashanth.ranjalkarCell: +91 932 568 2271On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@gmail.com> wrote:but thank you for giving me some clarification.Yes, at the moment that is still the case.We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.
We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.
I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.
cheers,
Bert--On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.
于 2013/1/23 19:43, Bert 写道:--Hello,
I first wrote, by mistake, to the sql mailing list. But here is my e-mail:
I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).
But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table
table_name ; avg runt time ; max run time ; min run time
"f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03"
"f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16"
f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.
Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).
wkr,
Bert
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Bert Desmet
0477/305361
On Thu, Jan 24, 2013 at 7:43 AM, Prashanth Ranjalkar <prashant.ranjalkar@gmail.com> wrote:
be no DML operations executing on DWH DB which is mainly used for analyzing the data and getting the reports for making business decisions. After every data load process, statistices should be updated in order to gain the performance therefore it's best practice to add analyze command for the respective tables in ETL script itself. If auto vacuum is enabled on tables which participate in ETL process, autovacuum kicks off frequently when its threshold values met and keep running affecting data load process.If we disable autovacuum on table level (NOT at cluster level) auto vacuum never kicks off during ETL process and ETL job get finished quickly with analyze operation completed as well.
Hello,
Yes I know, but for this particular case we won't use a daily recalcuation, or a daily incremental load.
We want to do an incremental load up to 6times / day. 9or more, if possible, the ultimate goal would be to have all pre calculated data almost live in the dwh from the oltp systems)
Yes I know, but for this particular case we won't use a daily recalcuation, or a daily incremental load.
We want to do an incremental load up to 6times / day. 9or more, if possible, the ultimate goal would be to have all pre calculated data almost live in the dwh from the oltp systems)
So it needs some more tweaking and tuning than the average dwh. Because we are handling both very complex queries and star schemes, and still a high insert / update load. pretty much all the time.
But that's where the challenge is, I guess.
cheers,
cheers,
Bert
--
Bert Desmet
0477/305361