Thread: : Performance Improvement Strategy
Hello Everyone,
I had posted a query in "GENERAL" category, not sure if that was the correct category to post.
Please help me understand how to calculate free space in Tables and Indexes even after vacuuming and analyzing is performed.
What i understand is that, even if we perform VACUUM ANALYZE regularly, the free space generated is not filled up.
I see lot of free spaces or free pages in Tables and Indexes. But, I need to give an exact calculation on how much space will be reclaimed after VACUUM FULL and RE-INDEXING.
Is there any standard procedure or process to calculate the same ?
Please help !
Thanks
Venkat
W dniu 2011-09-20 18:22, Venkat Balaji pisze: > Hello Everyone, > > I had posted a query in "GENERAL" category, not sure if that was the > correct category to post. > > Please help me understand how to calculate free space in Tables and > Indexes even after vacuuming and analyzing is performed. > > What i understand is that, even if we perform VACUUM ANALYZE regularly, > the free space generated is not filled up. > > I see lot of free spaces or free pages in Tables and Indexes. But, I > need to give an exact calculation on how much space will be reclaimed > after VACUUM FULL and RE-INDEXING. > > Is there any standard procedure or process to calculate the same ? Hello! I hope this link will be usefull for you : http://wiki.postgresql.org/wiki/Show_database_bloat Regards
Venkat, > I see lot of free spaces or free pages in Tables and Indexes. But, I need to > give an exact calculation on how much space will be reclaimed after VACUUM > FULL and RE-INDEXING. At present, there is no way to calculate this precisely. You can only estimate, and estimates have significant error factors. The query which Marcin linked for you, for example, can be as much as 500% off (although usually only 50% off). -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 21/09/11 06:09, Josh Berkus wrote: > Venkat, > >> I see lot of free spaces or free pages in Tables and Indexes. But, I need to >> give an exact calculation on how much space will be reclaimed after VACUUM >> FULL and RE-INDEXING. > At present, there is no way to calculate this precisely. You can only > estimate, and estimates have significant error factors. The query which > Marcin linked for you, for example, can be as much as 500% off (although > usually only 50% off). > If you have autovacuum on (which should be typical thee days), then using the freespacemap contrib module should give very accurate results: SELECT oid::regclass, pg_relation_size(oid)/(1024*1024) AS mb, sum(free)/(1024*1024) AS free_mb FROM (SELECT oid, (pg_freespace(oid)).avail AS free FROM pg_class) AS a GROUP BY a.oid ORDER BY free_mb DESC; regards Mark
On 21/09/11 10:05, Mark Kirkwood wrote: > > ...then using the freespacemap contrib module should give very > accurate results: > Sorry, should have said - for 8.4 and later!
Thank Everyone for your inputs !
Mark,
We are using 9.0, so, i should be able to make use of this "freespacemap" contrib module and would get back to you with the results.
I was using below query (which i got it by googling)..
But, was not sure, if its picking up the correct information. I want to avoid mis-prediction cost after whole production has been scheduled for downtime for maintenance.
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC
Thanks
Venkat
On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 21/09/11 10:05, Mark Kirkwood wrote:
...then using the freespacemap contrib module should give very accurate results:
Sorry, should have said - for 8.4 and later!
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Can you please help me understand what "blkno" column refers to ?
Thanks
Venkat
On Wed, Sep 21, 2011 at 11:08 AM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Thank Everyone for your inputs !Mark,We are using 9.0, so, i should be able to make use of this "freespacemap" contrib module and would get back to you with the results.I was using below query (which i got it by googling)..But, was not sure, if its picking up the correct information. I want to avoid mis-prediction cost after whole production has been scheduled for downtime for maintenance.SELECTcurrent_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,iname, /*ituples::bigint, ipages::bigint, iotta,*/ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytesFROM (SELECTschemaname, tablename, cc.reltuples, cc.relpages, bs,CEIL((cc.reltuples*((datahdr+ma-(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all colsFROM (SELECTma,bs,schemaname,tablename,(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2FROM (SELECTschemaname, tablename, hdr, ma, bs,SUM((1-null_frac)*avg_width) AS datawidth,MAX(null_frac) AS maxfracsum,hdr+(SELECT 1+count(*)/8FROM pg_stats s2WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename) AS nullhdrFROM pg_stats s, (SELECT(SELECT current_setting('block_size')::numeric) AS bs,CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS maFROM (SELECT version() AS v) AS foo) AS constantsGROUP BY 1,2,3,4,5) AS foo) AS rsJOIN pg_class cc ON cc.relname = rs.tablenameJOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'LEFT JOIN pg_index i ON indrelid = cc.oidLEFT JOIN pg_class c2 ON c2.oid = i.indexrelid) AS smlORDER BY wastedbytes DESCThanksVenkatOn Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:On 21/09/11 10:05, Mark Kirkwood wrote:
...then using the freespacemap contrib module should give very accurate results:
Sorry, should have said - for 8.4 and later!
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 09/20/2011 11:22 AM, Venkat Balaji wrote: > Please help me understand how to calculate free space in Tables and > Indexes even after vacuuming and analyzing is performed. Besides the query Mark gave you using freespacemap, there's also the pgstattuple contrib module. You'd use it like this: SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('some_table'); Query must be run as a super-user, and I wouldn't recommend running it on huge tables, since it scans the actual data files to get its information. There's a lot of other useful information in that function, such as the number of dead rows. > What i understand is that, even if we perform VACUUM ANALYZE > regularly, the free space generated is not filled up. VACUUM does not actually generate free space. It locates and marks reusable tuples. Any future updates or inserts on that table will be put in those newly reclaimed spots, instead of being bolted onto the end of the table. > I see lot of free spaces or free pages in Tables and Indexes. But, I > need to give an exact calculation on how much space will be reclaimed > after VACUUM FULL and RE-INDEXING. Why? If your database is so desperate for space, VACUUM and REINDEX won't really help you. A properly maintained database will still have a certain amount of "bloat" equal to the number of rows that change between maintenance intervals. One way or another, that space is going to be used by *something*. It sounds more like you need to tweak your autovacuum settings to be more aggressive if you're seeing significant enough turnover that your tables are bloating significantly. One of our tables, for instance, gets vacuumed more than once per hour because it experiences 1,000% turnover daily. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
Shaun Thomas <sthomas@peak6.com> wrote: > Venkat Balaji wrote: >> I see lot of free spaces or free pages in Tables and Indexes. >> But, I need to give an exact calculation on how much space will >> be reclaimed after VACUUM FULL and RE-INDEXING. > > Why? I've been wondering that, too. And talking about the space being "reclaimed" seems to be at odds with your subject line. The space is given up by the database engine to the file system free space, where reuse by the database will be much more expensive. For good performance you want some free space in the tables and indexes, where it can be allocated to new tuples without going out through OS calls to the file system. Clearly, if free space gets higher than necessary to support creation of new tuples, it can start to harm performance, and you may need to take aggressive action (such as CLUSTER) to reclaim it; but any time you find it necessary to do *that* you should be investigating what went wrong to put you in such a spot. Either your autovacuum is (as Shaun suggested) not aggressive enough, or you have some long running transaction (possibly "idle in transaction") which is preventing vacuums from doing their work effectively. Investigating that is going to help more than calculating just how much space the database is going to give up to file system free space. -Kevin
Thank you very much for your detailed explanation !
I will be working on our existing "auto-vacuuming" strategy to see if that's optimal. But, we do have VACUUM VERBOSE ANALYZE running at the cluster level every day and auto-vacuum is aggressive for highly active tables.
Today, we have vacuumed a 10GB table and the table size decreased to 5 GB.
I understand that, it would very expensive for the table to reclaim the space back from the filesystem. We have decided to do the maintenance after a thorough analysis and our databases were not subjected to any kind of maintenance activity since 2 yrs (with downtime).
I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to ensure that IO performance and Indexing performance would be good and the PG optimizer would pick up the optimal plan. As said earlier, our databases have never been part of any re-organization since 2 years and are highly transactional databases. I believe that, performing VACUUM FULL and RE-INDEXING would have tightly packed rows (in every page) would ensure good IOs.
I might have not put across the explanation in an understandable manner.
Please help me know the following -
1. When would pg_stat_user_tables will be updated and what would the information show ?
2. Will the information about dead-rows and live-rows vanish after VACUUM or ANALYZE or VACUUM FULL ?
I am just preparing a monitoring system which would help us know the rate of bloats and data generation on daily basis.
Sorry for the long email !
Looking forward for your help !
Thanks
Venkat
On Wed, Sep 21, 2011 at 7:27 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Shaun Thomas <sthomas@peak6.com> wrote:> Venkat Balaji wrote:I've been wondering that, too. And talking about the space being
>> I see lot of free spaces or free pages in Tables and Indexes.
>> But, I need to give an exact calculation on how much space will
>> be reclaimed after VACUUM FULL and RE-INDEXING.
>
> Why?
"reclaimed" seems to be at odds with your subject line. The space
is given up by the database engine to the file system free space,
where reuse by the database will be much more expensive. For good
performance you want some free space in the tables and indexes,
where it can be allocated to new tuples without going out through OS
calls to the file system.
Clearly, if free space gets higher than necessary to support
creation of new tuples, it can start to harm performance, and you
may need to take aggressive action (such as CLUSTER) to reclaim it;
but any time you find it necessary to do *that* you should be
investigating what went wrong to put you in such a spot. Either
your autovacuum is (as Shaun suggested) not aggressive enough, or
you have some long running transaction (possibly "idle in
transaction") which is preventing vacuums from doing their work
effectively. Investigating that is going to help more than
calculating just how much space the database is going to give up to
file system free space.
-Kevin
On 09/21/2011 12:13 PM, Venkat Balaji wrote: > I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE > to ensure that IO performance and Indexing performance would be good Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM FULL. You probably don't want to do that. A multi-gigabyte table can easily be unavailable for several hours if you execute VACUUM FULL against it. CLUSTER is almost always faster. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Thanks Greg !
If i got it correct, CLUSTER would do the same what VACUUM FULL does (except being fast).
CLUSTER is recommended only because it is faster ? As per the link, the table would be unavailable (for shorter period compared to VACUUM FULL) when CLUSTER is executed as well. Hope i got it correct !
Thanks
Venkat
On Wed, Sep 21, 2011 at 11:27 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 09/21/2011 12:13 PM, Venkat Balaji wrote:Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM FULL. You probably don't want to do that. A multi-gigabyte table can easily be unavailable for several hours if you execute VACUUM FULL against it. CLUSTER is almost always faster.I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to ensure that IO performance and Indexing performance would be good
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Venkat Balaji <venkat.balaji@verse.in> wrote: > If i got it correct, CLUSTER would do the same what VACUUM FULL > does (except being fast) CLUSTER copies the table (in the sequence of the specified index) to a new set of files, builds fresh indexes, and then replaces the original set of files with the new ones. So you do need room on disk for a second copy of the table, but it tends to be much faster then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans the table data rather than using an index.) REINDEX is not needed when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was generally a good idea. When choosing an index for CLUSTER, pick one on which you often search for a *range* of rows, if possible. Like a name column if you do a lot of name searches. -Kevin
We had performed VACUUM FULL on our production and performance has improved a lot !
I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot.
Thanks for all your inputs and help !
Regards,
VB
On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> If i got it correct, CLUSTER would do the same what VACUUM FULLCLUSTER copies the table (in the sequence of the specified index) to
> does (except being fast)
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones. So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.) REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.
When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible. Like a name column if
you do a lot of name searches.
-Kevin
Forgot to mention -
Kevin,
CLUSTER seems to be an very interesting concept to me.
I am thinking to test the CLUSTER TABLE on our production according to the Index usage on the table.
Will let you know once i get the results.
Regards,
VB
On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
We had performed VACUUM FULL on our production and performance has improved a lot !I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot.Thanks for all your inputs and help !Regards,VBOn Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:> If i got it correct, CLUSTER would do the same what VACUUM FULLCLUSTER copies the table (in the sequence of the specified index) to
> does (except being fast)
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones. So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.) REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.
When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible. Like a name column if
you do a lot of name searches.
-Kevin
Hello,
Thanks for your suggestions !
We CLUSTERED a table using mostly used Index. Application is performing better now.
Thanks
VB
On Tue, Sep 27, 2011 at 6:01 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Forgot to mention -Kevin,CLUSTER seems to be an very interesting concept to me.I am thinking to test the CLUSTER TABLE on our production according to the Index usage on the table.Will let you know once i get the results.Regards,VBOn Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:We had performed VACUUM FULL on our production and performance has improved a lot !I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot.Thanks for all your inputs and help !Regards,VBOn Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:> If i got it correct, CLUSTER would do the same what VACUUM FULLCLUSTER copies the table (in the sequence of the specified index) to
> does (except being fast)
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones. So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.) REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.
When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible. Like a name column if
you do a lot of name searches.
-Kevin
Venkat Balaji <venkat.balaji@verse.in> wrote: > We CLUSTERED a table using mostly used Index. Application is > performing better now. CLUSTER can help in at least four ways: (1) It eliminates bloat in the table heap. (2) It eliminates bloat in the indexes. (3) It can correct fragmentation in the underlying disk files. (4) It can put tuples which are accessed by the same query into adjacent locations on disk, reducing physical disk access. An aggressive autovacuum configuration can generally prevent the first two from coming back to haunt you, and the third may not be a big problem (depending on your OS and file system), but that last one is a benefit which will degrade over time in most use cases -- the order in the heap is set by the cluster, but not maintained after that. If this ordering is a significant part of the performance improvement you're seeing, you may want to schedule some regular CLUSTER run. It's hard to say what frequency would make sense, but if performance gradually deteriorates and a CLUSTER fixes it, you'll get a sense of how often it pays to do it. -Kevin
Thanks a lot Kevin !
This email has deepened my understanding on the clustering concept.
Keeping this in mind, I have recommended a new disk layout at the OS level for our production servers so that IOs will be balanced on the disks as well.
Currently, we do not have mount points divided according to the type of IOs.
I will share my recommended plan in an different email thread.
Thanks again for this detailed explanation.
Regards,
VB
On Mon, Oct 3, 2011 at 9:45 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:CLUSTER can help in at least four ways:
> We CLUSTERED a table using mostly used Index. Application is
> performing better now.
(1) It eliminates bloat in the table heap.
(2) It eliminates bloat in the indexes.
(3) It can correct fragmentation in the underlying disk files.
(4) It can put tuples which are accessed by the same query into
adjacent locations on disk, reducing physical disk access.
An aggressive autovacuum configuration can generally prevent the
first two from coming back to haunt you, and the third may not be a
big problem (depending on your OS and file system), but that last
one is a benefit which will degrade over time in most use cases --
the order in the heap is set by the cluster, but not maintained
after that. If this ordering is a significant part of the
performance improvement you're seeing, you may want to schedule some
regular CLUSTER run. It's hard to say what frequency would make
sense, but if performance gradually deteriorates and a CLUSTER fixes
it, you'll get a sense of how often it pays to do it.
-Kevin
Hello,
I was attempting to calculate the actual occupied space by a Table.
Below is what i did -
I summed up the avg_width of each column of a table from pg_stats, which gives me the average size of a row (277 bytes).
select sum(avg_width) as average_row_size from pg_stats where tablename='tablename'
average_row_size
---------------------------
277
(1 row)
Calculated the actual occupied space by rows in the table as below -
Took the average_row_size * number_of_rows from pg_class
select 277*reltuples/1024 as occupied_space from pg_class where relname='tablename'; == 552 KB
occupied_space
-------------------------
552.6474609375
Calculated the actual Table size (600 kb)
select pg_size_pretty(pg_relation_size('tablename'));
pg_size_pretty
----------------
600 KB
(1 row)
Calculated the free space with in the table (by scanning the pages - as suggested by Shaun Thomas) -- 14 KB
SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');
mb_free
---------
14 KB
(1 row)
600 KB is the size of the table (taken through pg_size_pretty)
14 KB is the free space (taken through contrib modules)
600+14 = 586 KB -- is the occupied space by normal calculation through contrib modules. This is based on number of pages allocated to the table.
552 KB is the actual occupied size by the rows (taken by calculating avg row size ). This is based on number of rows with in the pages.
586-552 = 34 KB -- is still free some where with in the occupied pages ( calculated through pg_stats and pg_class )
34 KB is still free within the pages ( each 8K ) which is basically taken as occupied space.
This is similar concept which i successfully applied in an other RDBMS Technology to calculate space usage metrics on production.
This is all calculated after considering Vacuum and Analyze jobs are executed.
Please comment !
Sorry if this is too confusing and too long.
Thanks
VB
On Wed, Sep 21, 2011 at 6:33 PM, Shaun Thomas <sthomas@peak6.com> wrote:
On 09/20/2011 11:22 AM, Venkat Balaji wrote:Besides the query Mark gave you using freespacemap, there's also the pgstattuple contrib module. You'd use it like this:Please help me understand how to calculate free space in Tables and
Indexes even after vacuuming and analyzing is performed.
SELECT pg_size_pretty(free_space) AS mb_free
FROM pgstattuple('some_table');
Query must be run as a super-user, and I wouldn't recommend running it on huge tables, since it scans the actual data files to get its information. There's a lot of other useful information in that function, such as the number of dead rows.VACUUM does not actually generate free space. It locates and marks reusable tuples. Any future updates or inserts on that table will be put in those newly reclaimed spots, instead of being bolted onto the end of the table.What i understand is that, even if we perform VACUUM ANALYZE
regularly, the free space generated is not filled up.Why? If your database is so desperate for space, VACUUM and REINDEX won't really help you. A properly maintained database will still have a certain amount of "bloat" equal to the number of rows that change between maintenance intervals. One way or another, that space is going to be used by *something*.I see lot of free spaces or free pages in Tables and Indexes. But, I
need to give an exact calculation on how much space will be reclaimed
after VACUUM FULL and RE-INDEXING.
It sounds more like you need to tweak your autovacuum settings to be more aggressive if you're seeing significant enough turnover that your tables are bloating significantly. One of our tables, for instance, gets vacuumed more than once per hour because it experiences 1,000% turnover daily.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com
______________________________________________
See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On Wed, Oct 5, 2011 at 2:38 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Hello,I was attempting to calculate the actual occupied space by a Table.Below is what i did -I summed up the avg_width of each column of a table from pg_stats, which gives me the average size of a row (277 bytes).select sum(avg_width) as average_row_size from pg_stats where tablename='tablename'average_row_size---------------------------277(1 row)Calculated the actual occupied space by rows in the table as below -Took the average_row_size * number_of_rows from pg_classselect 277*reltuples/1024 as occupied_space from pg_class where relname='tablename'; == 552 KBoccupied_space-------------------------552.6474609375Calculated the actual Table size (600 kb)select pg_size_pretty(pg_relation_size('tablename'));pg_size_pretty----------------600 KB(1 row)Calculated the free space with in the table (by scanning the pages - as suggested by Shaun Thomas) -- 14 KBSELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');mb_free---------14 KB(1 row)600 KB is the size of the table (taken through pg_size_pretty)14 KB is the free space (taken through contrib modules)600+14 = 586 KB -- is the occupied space by normal calculation through contrib modules. This is based on number of pages allocated to the table.
Its typo 600 - 14 = 586 KB
552 KB is the actual occupied size by the rows (taken by calculating avg row size ). This is based on number of rows with in the pages.586-552 = 34 KB -- is still free some where with in the occupied pages ( calculated through pg_stats and pg_class )34 KB is still free within the pages ( each 8K ) which is basically taken as occupied space.
One more point to add to this good discussion, each row header will occupy 24 bytes + 4 bytes pointer on page to tuple.
Venkat Balaji <venkat.balaji 'at' verse.in> writes: > Hello, > > I was attempting to calculate the actual occupied space by a Table. SELECT relname, reltuples, pg_size_pretty(relpages*8*1024) as size FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespaceAND relkind = 'r' AND nspname = 'public' ORDER BY relpages DESC; relkind = 'i' for indexes. -- Guillaume Cottenceau
Raghavendra <raghavendra.rao@enterprisedb.com> wrote: > Venkat Balaji <venkat.balaji@verse.in>wrote: >> [attempt to calculate file space from row layout and number of >> rows] > One more point to add to this good discussion, each row header > will occupy 24 bytes + 4 bytes pointer on page to tuple. Not to mention: http://www.postgresql.org/docs/9.1/interactive/storage-toast.html -Kevin
On Wed, Sep 21, 2011 at 11:57 AM, Greg Smith <greg@2ndquadrant.com> wrote: > On 09/21/2011 12:13 PM, Venkat Balaji wrote: >> >> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to >> ensure that IO performance and Indexing performance would be good > > > Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM FULL. > You probably don't want to do that. A multi-gigabyte table can easily be > unavailable for several hours if you execute VACUUM FULL against it. > CLUSTER is almost always faster. It used to be that cluster on a very randomly ordered table was much slower than doing something like select * into newtable from oldtable order by col1, col2; Is that still the case in 9.0/9.1?
Scott Marlowe <scott.marlowe@gmail.com> writes: > It used to be that cluster on a very randomly ordered table was much > slower than doing something like select * into newtable from oldtable > order by col1, col2; Is that still the case in 9.0/9.1? Fixed in 9.1, per release notes: * Allow CLUSTER to sort the table rather than scanning the index when it seems likely to be cheaper (Leonardo Francalanci) regards, tom lane
Thanks Kevin.
Am in 9.1 and tested same scenario, how exactly storage metrics are calculated. Please comment.
Table Structure:
postgres=# \d test
Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
name | text |
No. of rows:
postgres=# select relname,reltuples from pg_class where relname='test';
relname | reltuples
---------+-----------
test | 1001
(1 row)
Average Row size:
postgres=# select sum(avg_width) as average_row_size from pg_stats where tablename='test';
average_row_size
------------------
17
(1 row)
Occupied Space:
postgres=# select 17*reltuples/1024 as "No.of.Row_size * No.of.Rows = Occupied_Space" from pg_class where relname='test';
No.of.Row_size * No.of.Rows = Occupied_Space
----------------------------------------------
16.6181640625
Actual Table Size:
postgres=# select pg_size_pretty(pg_relation_size('test'));
pg_size_pretty
----------------
48 kB
(1 row)
or
postgres=# SELECT relname, reltuples, pg_size_pretty(relpages*8*1024) as size FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'r' AND nspname = 'public' ORDER BY relpages DESC;
relname | reltuples | size
---------+-----------+-------
test | 1001 | 48 kB
(1 row)
Its different here:
postgres=# \dt+ test
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+-------+-------------
public | test | table | postgres | 88 kB |
(1 row)
postgres=# select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
88 kB
(1 row)
Free Space:
postgres=# SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('test');
mb_free
-----------
936 bytes
(1 row)
or
postgres=# select * from pgstattuple('test');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
49152 | 1001 | 41041 | 83.5 | 0 | 0 | 0 | 936 | 1.9
(1 row)
OS Level Storage:
bash-4.1$ ll -h 16447*
-rw------- 1 postgres postgres 48K Oct 2 17:40 16447
-rw------- 1 postgres postgres 24K Oct 2 17:40 16447_fsm
-rw------- 1 postgres postgres 8.0K Oct 2 17:40 16447_vm
What has occupied in extra 8KB ?
postgres=# select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
88 kB
(1 row)
Thanks in advance.
---
Regards,
Raghavendra
EnterpriseDB Corporation
On Wed, Oct 5, 2011 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Scott Marlowe <scott.marlowe@gmail.com> writes: >> It used to be that cluster on a very randomly ordered table was much >> slower than doing something like select * into newtable from oldtable >> order by col1, col2; Is that still the case in 9.0/9.1? > > Fixed in 9.1, per release notes: > > * Allow CLUSTER to sort the table rather than scanning the index when it seems likely to be cheaper (Leonardo Francalanci) Looks like I owe Leonardo Francalanci a pizza.
>> * Allow CLUSTER to sort the table rather than scanning the index > when it seems likely to be cheaper (Leonardo Francalanci) > > Looks like I owe Leonardo Francalanci a pizza. Well, the patch started from a work by Gregory Stark, and Tom fixed a nasty bug; but I'll take a slice ;) Leonardo