Thread: vacuum / analyze parent tables on partitioned tables.

vacuum / analyze parent tables on partitioned tables.

From
Bert
Date:
Hello,

I first wrote, by mistake, to the sql mailing list. But here is my e-mail:

I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).

But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table

table_name          ;     avg runt time   ;   max run time  ;    min run time
"f_transaction_1"  ;    "00:03:07.8"     ;   "00:03:10"      ;   "00:03:03"
"f_transaction"      ;   "00:02:19.8"     ;   "00:02:25"       ;  "00:02:16"

f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.

Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).

wkr,
Bert

Re: vacuum / analyze parent tables on partitioned tables.

From
Albe Laurenz
Date:
Bert wrote:
> I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very
> well.

Autovacuum doesn't do the job for you?

That would save you from worries like the ones you have.

> Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table
> as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their
> vacuum/analyze time is much shorter).

ANALYZE on the parent table gathers statistics about the child
tables as well, so it scans them too.

You still need to ANALYZE and VACUUM the child tables though.

Yours,
Laurenz Albe

Re: vacuum / analyze parent tables on partitioned tables.

From
Bert
Date:



> autovacuum is still enabled, but we want to avoid autovacuum to hit right in the day / when the etl is
> working. So that's why we want to shedule it by hand, before autovacuum starts.

I see.

> Is it a good idea to exclude those tables then? And run only vacuum / analyze on the child tables, an
> wait for autovacuum to analyze the parent tables if it ever would become necesarry..

Sure, you could do that.
Maybe somebody on the list would know if autovacuum/analyze on the parent table
will scan the children - I believe it does not.
You could also test it yourself.

Yours,
Laurenz Albe


On Wed, Jan 23, 2013 at 1:38 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Bert wrote:
> I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very
> well.

Autovacuum doesn't do the job for you?

That would save you from worries like the ones you have.

> Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table
> as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their
> vacuum/analyze time is much shorter).

ANALYZE on the parent table gathers statistics about the child
tables as well, so it scans them too.

You still need to ANALYZE and VACUUM the child tables though.

Yours,
Laurenz Albe



--
Bert Desmet
0477/305361

Re: vacuum / analyze parent tables on partitioned tables.

From
Rural Hunter
Date:
I'm doing the same thing. In my case, the vacuum part on parent is very
quick while analyzing takes a bit longer since it runs rough analyzes
all children tables. You can see the behavior by "analyze verbose".
Maybe the bigger part of your vacuum/analyze is on analyze so that you
are seeing this result.

于 2013/1/23 19:43, Bert 写道:
> Hello,
>
> I first wrote, by mistake, to the sql mailing list. But here is my
> e-mail:
>
> I wrote a script to make sure all tables are vacuumed and analyzed
> every evening. This works very well.
> I save in a table the start and end time of a vacuum/analyze. This way
> I can measure what tables take a long time to vaccum/analyze, and what
> tables are slow. (and much more).
>
> But I have noticed that the parent table of a partitioned table also
> takes a long time. Here is a snap shot of the following table
>
> table_name          ;     avg runt time   ;   max run time  ;   min
> run time
> "f_transaction_1"  ;    "00:03:07.8"     ;   "00:03:10" ;   "00:03:03"
> "f_transaction"      ;   "00:02:19.8"     ;   "00:02:25" ;  "00:02:16"
>
> f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of
> indexes). f_tranaction is totally empy, but also contains all indexes.
> Which means 0B table zise, and 140kB index size.
>
> Does anyone has an idea why in this case the vacuum/analyze takes
> almost as long on the parent table as on the biggest child table? (the
> other child tables are smaller than f_tranaction_1, and their
> vacuum/analyze time is much shorter).
>
> wkr,
> Bert



Re: vacuum / analyze parent tables on partitioned tables.

From
Bert
Date:
Yes, at the moment that is still the case.

We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.

We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.

I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.

but thank you for giving me some clarification.

cheers,
Bert


On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.

于 2013/1/23 19:43, Bert 写道:

Hello,

I first wrote, by mistake, to the sql mailing list. But here is my e-mail:

I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).

But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table

table_name          ;     avg runt time   ;   max run time  ;   min run time
"f_transaction_1"  ;    "00:03:07.8"     ;   "00:03:10" ;   "00:03:03"
"f_transaction"      ;   "00:02:19.8"     ;   "00:02:25" ;  "00:02:16"

f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.

Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).

wkr,
Bert



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



--

Re: vacuum / analyze parent tables on partitioned tables.

From
Prashanth Ranjalkar
Date:
Hi Bert,
 
Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..
 
To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.
 
 
  
Thanks & Regards,
 
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar
Cell: +91 932 568 2271


On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@gmail.com> wrote:
Yes, at the moment that is still the case.

We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.

We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.

I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.

but thank you for giving me some clarification.

cheers,
Bert


On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.

于 2013/1/23 19:43, Bert 写道:

Hello,

I first wrote, by mistake, to the sql mailing list. But here is my e-mail:

I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).

But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table

table_name          ;     avg runt time   ;   max run time  ;   min run time
"f_transaction_1"  ;    "00:03:07.8"     ;   "00:03:10" ;   "00:03:03"
"f_transaction"      ;   "00:02:19.8"     ;   "00:02:25" ;  "00:02:16"

f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.

Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).

wkr,
Bert



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



--


Re: vacuum / analyze parent tables on partitioned tables.

From
Bert
Date:
Hello,

I can see why that would be a good practice if you do a daily load, but the goal is to have a 'live' dwh. with updated statistics every so-many-hours.

Thank you for the information anyway. I guess I will ask more questions here when time passes. And I'll try to share some knowlegde too! :)

cheers,
Bert


On Wed, Jan 23, 2013 at 4:45 PM, Prashanth Ranjalkar <prashant.ranjalkar@gmail.com> wrote:
Hi Bert,
 
Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..
 
To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.
 
 
  
Thanks & Regards,
 
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar


On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@gmail.com> wrote:
Yes, at the moment that is still the case.

We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.

We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.

I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.

but thank you for giving me some clarification.

cheers,
Bert


On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.

于 2013/1/23 19:43, Bert 写道:

Hello,

I first wrote, by mistake, to the sql mailing list. But here is my e-mail:

I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).

But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table

table_name          ;     avg runt time   ;   max run time  ;   min run time
"f_transaction_1"  ;    "00:03:07.8"     ;   "00:03:10" ;   "00:03:03"
"f_transaction"      ;   "00:02:19.8"     ;   "00:02:25" ;  "00:02:16"

f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.

Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).

wkr,
Bert



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



--





--
Bert Desmet
0477/305361

Re: vacuum / analyze parent tables on partitioned tables.

From
Prashanth Ranjalkar
Date:
Hi Bert,
 
Yes, ETL process needs to load the data every day into datawarehouse database from OLTP database and there would be no DML operations executing on DWH DB which is mainly used for analyzing the data and getting the reports for making business decisions. After every data load process, statistices should be updated in order to gain the performance therefore it's best practice to add analyze command for the respective tables in ETL script itself. If auto vacuum is enabled on tables which participate in  ETL process, autovacuum kicks off frequently when its threshold values met and keep running affecting data load process. 
 
If we disable autovacuum on table level (NOT at cluster level) auto vacuum never kicks off during ETL process and ETL job get finished quickly with analyze operation completed as well.
 
This is a daily process therefore you no need to worry about the autovacuum kick off and analyze operations which get executed manually from ETL job. 

 
Thanks & Regards,
 
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar
Cell: +91 932 568 2271


On Thu, Jan 24, 2013 at 11:56 AM, Bert <biertie@gmail.com> wrote:
Hello,

I can see why that would be a good practice if you do a daily load, but the goal is to have a 'live' dwh. with updated statistics every so-many-hours.

Thank you for the information anyway. I guess I will ask more questions here when time passes. And I'll try to share some knowlegde too! :)

cheers,
Bert


On Wed, Jan 23, 2013 at 4:45 PM, Prashanth Ranjalkar <prashant.ranjalkar@gmail.com> wrote:
Hi Bert,
 
Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..
 
To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.
 
 
  
Thanks & Regards,
 
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar


On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@gmail.com> wrote:
Yes, at the moment that is still the case.

We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.

We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.

I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.

but thank you for giving me some clarification.

cheers,
Bert


On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.

于 2013/1/23 19:43, Bert 写道:

Hello,

I first wrote, by mistake, to the sql mailing list. But here is my e-mail:

I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).

But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table

table_name          ;     avg runt time   ;   max run time  ;   min run time
"f_transaction_1"  ;    "00:03:07.8"     ;   "00:03:10" ;   "00:03:03"
"f_transaction"      ;   "00:02:19.8"     ;   "00:02:25" ;  "00:02:16"

f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.

Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).

wkr,
Bert



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



--





--
Bert Desmet
0477/305361

Re: vacuum / analyze parent tables on partitioned tables.

From
Bert
Date:

On Thu, Jan 24, 2013 at 7:43 AM, Prashanth Ranjalkar <prashant.ranjalkar@gmail.com> wrote:
be no DML operations executing on DWH DB which is mainly used for analyzing the data and getting the reports for making business decisions. After every data load process, statistices should be updated in order to gain the performance therefore it's best practice to add analyze command for the respective tables in ETL script itself. If auto vacuum is enabled on tables which participate in  ETL process, autovacuum kicks off frequently when its threshold values met and keep running affecting data load process. 
 
If we disable autovacuum on table level (NOT at cluster level) auto vacuum never kicks off during ETL process and ETL job get finished quickly with analyze operation completed as well.

Hello,

Yes I know, but for this particular case we won't use a daily recalcuation, or a daily incremental load.
We want to do an incremental load up to 6times / day. 9or more, if possible, the ultimate goal would be to have all pre calculated data almost live in the dwh from the oltp systems)
So it needs some more tweaking and tuning than the average dwh. Because we are handling both very complex queries and star schemes, and still a high insert / update load. pretty much all the time.

But that's where the challenge is, I guess.

cheers,
Bert


--
Bert Desmet
0477/305361