Thread: : Performance Improvement Strategy

: Performance Improvement Strategy

From
Venkat Balaji
Date:
Hello Everyone,

I had  posted  a query in "GENERAL" category, not sure if that was the correct category to post.

Please help me understand how to calculate free space in Tables and Indexes even after vacuuming and analyzing is performed.

What i understand is that, even if we perform VACUUM ANALYZE regularly, the free space generated is not filled up.

I see lot of free spaces or free pages in Tables and Indexes. But, I need to give an exact calculation on how much space will be reclaimed after VACUUM FULL and RE-INDEXING.

Is there any standard procedure or process to calculate the same ?

Please help !

Thanks
Venkat

Re: : Performance Improvement Strategy

From
Marcin Mirosław
Date:
W dniu 2011-09-20 18:22, Venkat Balaji pisze:
> Hello Everyone,
>
> I had  posted  a query in "GENERAL" category, not sure if that was the
> correct category to post.
>
> Please help me understand how to calculate free space in Tables and
> Indexes even after vacuuming and analyzing is performed.
>
> What i understand is that, even if we perform VACUUM ANALYZE regularly,
> the free space generated is not filled up.
>
> I see lot of free spaces or free pages in Tables and Indexes. But, I
> need to give an exact calculation on how much space will be reclaimed
> after VACUUM FULL and RE-INDEXING.
>
> Is there any standard procedure or process to calculate the same ?

Hello!
I hope this link will be usefull for you :
http://wiki.postgresql.org/wiki/Show_database_bloat
Regards

Re: : Performance Improvement Strategy

From
Josh Berkus
Date:
Venkat,

> I see lot of free spaces or free pages in Tables and Indexes. But, I need to
> give an exact calculation on how much space will be reclaimed after VACUUM
> FULL and RE-INDEXING.

At present, there is no way to calculate this precisely.  You can only
estimate, and estimates have significant error factors.  The query which
Marcin linked for you, for example, can be as much as 500% off (although
usually only 50% off).

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: : Performance Improvement Strategy

From
Mark Kirkwood
Date:
On 21/09/11 06:09, Josh Berkus wrote:
> Venkat,
>
>> I see lot of free spaces or free pages in Tables and Indexes. But, I need to
>> give an exact calculation on how much space will be reclaimed after VACUUM
>> FULL and RE-INDEXING.
> At present, there is no way to calculate this precisely.  You can only
> estimate, and estimates have significant error factors.  The query which
> Marcin linked for you, for example, can be as much as 500% off (although
> usually only 50% off).
>

If you have autovacuum on (which should be typical thee days), then
using the freespacemap contrib module should give very accurate results:

SELECT oid::regclass,
                pg_relation_size(oid)/(1024*1024) AS mb,
                sum(free)/(1024*1024) AS free_mb
         FROM
             (SELECT oid, (pg_freespace(oid)).avail AS free
              FROM pg_class) AS a
         GROUP BY a.oid ORDER BY free_mb DESC;


regards

Mark

Re: : Performance Improvement Strategy

From
Mark Kirkwood
Date:
On 21/09/11 10:05, Mark Kirkwood wrote:
>
> ...then using the freespacemap contrib module should give very
> accurate results:
>

Sorry, should have said - for 8.4 and later!

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Thank Everyone for your inputs !

Mark,

We are using 9.0, so, i should be able to make use of this "freespacemap" contrib module and would get back to you with the results.

I was using below query (which i got it by googling)..

But, was not sure, if its picking up the correct information. I want to avoid mis-prediction cost after whole production has been scheduled for downtime for maintenance.

SELECT 
  current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ 
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, 
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, 
  iname, /*ituples::bigint, ipages::bigint, iotta,*/ 
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, 
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes 
FROM ( 
  SELECT 
    schemaname, tablename, cc.reltuples, cc.relpages, bs, 
    CEIL((cc.reltuples*((datahdr+ma- 
      (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, 
    COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, 
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols 
  FROM ( 
    SELECT 
      ma,bs,schemaname,tablename, 
      (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, 
      (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 
    FROM ( 
      SELECT 
        schemaname, tablename, hdr, ma, bs, 
        SUM((1-null_frac)*avg_width) AS datawidth, 
        MAX(null_frac) AS maxfracsum, 
        hdr+( 
          SELECT 1+count(*)/8 
          FROM pg_stats s2 
          WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename 
        ) AS nullhdr 
      FROM pg_stats s, ( 
        SELECT 
          (SELECT current_setting('block_size')::numeric) AS bs, 
          CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, 
          CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma 
        FROM (SELECT version() AS v) AS foo 
      ) AS constants 
      GROUP BY 1,2,3,4,5 
    ) AS foo 
  ) AS rs 
  JOIN pg_class cc ON cc.relname = rs.tablename 
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema' 
  LEFT JOIN pg_index i ON indrelid = cc.oid 
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid 
) AS sml 
ORDER BY wastedbytes DESC 

Thanks
Venkat

On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 21/09/11 10:05, Mark Kirkwood wrote:

...then using the freespacemap contrib module should give very accurate results:


Sorry, should have said - for 8.4 and later!


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:

Can you please help me understand what "blkno" column refers to ?

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:08 AM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Thank Everyone for your inputs !

Mark,

We are using 9.0, so, i should be able to make use of this "freespacemap" contrib module and would get back to you with the results.

I was using below query (which i got it by googling)..

But, was not sure, if its picking up the correct information. I want to avoid mis-prediction cost after whole production has been scheduled for downtime for maintenance.

SELECT 
  current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ 
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, 
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, 
  iname, /*ituples::bigint, ipages::bigint, iotta,*/ 
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, 
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes 
FROM ( 
  SELECT 
    schemaname, tablename, cc.reltuples, cc.relpages, bs, 
    CEIL((cc.reltuples*((datahdr+ma- 
      (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, 
    COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, 
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols 
  FROM ( 
    SELECT 
      ma,bs,schemaname,tablename, 
      (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, 
      (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 
    FROM ( 
      SELECT 
        schemaname, tablename, hdr, ma, bs, 
        SUM((1-null_frac)*avg_width) AS datawidth, 
        MAX(null_frac) AS maxfracsum, 
        hdr+( 
          SELECT 1+count(*)/8 
          FROM pg_stats s2 
          WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename 
        ) AS nullhdr 
      FROM pg_stats s, ( 
        SELECT 
          (SELECT current_setting('block_size')::numeric) AS bs, 
          CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, 
          CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma 
        FROM (SELECT version() AS v) AS foo 
      ) AS constants 
      GROUP BY 1,2,3,4,5 
    ) AS foo 
  ) AS rs 
  JOIN pg_class cc ON cc.relname = rs.tablename 
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema' 
  LEFT JOIN pg_index i ON indrelid = cc.oid 
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid 
) AS sml 
ORDER BY wastedbytes DESC 

Thanks
Venkat


On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 21/09/11 10:05, Mark Kirkwood wrote:

...then using the freespacemap contrib module should give very accurate results:


Sorry, should have said - for 8.4 and later!


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: : Performance Improvement Strategy

From
Shaun Thomas
Date:
On 09/20/2011 11:22 AM, Venkat Balaji wrote:

> Please help me understand how to calculate free space in Tables and
> Indexes even after vacuuming and analyzing is performed.

Besides the query Mark gave you using freespacemap, there's also the
pgstattuple contrib module. You'd use it like this:

SELECT pg_size_pretty(free_space) AS mb_free
   FROM pgstattuple('some_table');

Query must be run as a super-user, and I wouldn't recommend running it
on huge tables, since it scans the actual data files to get its
information. There's a lot of other useful information in that function,
such as the number of dead rows.

> What i understand is that, even if we perform VACUUM ANALYZE
> regularly, the free space generated is not filled up.

VACUUM does not actually generate free space. It locates and marks
reusable tuples. Any future updates or inserts on that table will be put
in those newly reclaimed spots, instead of being bolted onto the end of
the table.

> I see lot of free spaces or free pages in Tables and Indexes. But, I
> need to give an exact calculation on how much space will be reclaimed
> after VACUUM FULL and RE-INDEXING.

Why? If your database is so desperate for space, VACUUM and REINDEX
won't really help you. A properly maintained database will still have a
certain amount of "bloat" equal to the number of rows that change
between maintenance intervals. One way or another, that space is going
to be used by *something*.

It sounds more like you need to tweak your autovacuum settings to be
more aggressive if you're seeing significant enough turnover that your
tables are bloating significantly. One of our tables, for instance, gets
vacuumed more than once per hour because it experiences 1,000% turnover
daily.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: : Performance Improvement Strategy

From
"Kevin Grittner"
Date:
Shaun Thomas <sthomas@peak6.com> wrote:
> Venkat Balaji wrote:

>> I see lot of free spaces or free pages in Tables and Indexes.
>> But, I need to give an exact calculation on how much space will
>> be reclaimed after VACUUM FULL and RE-INDEXING.
>
> Why?

I've been wondering that, too.  And talking about the space being
"reclaimed" seems to be at odds with your subject line.  The space
is given up by the database engine to the file system free space,
where reuse by the database will be much more expensive.  For good
performance you want some free space in the tables and indexes,
where it can be allocated to new tuples without going out through OS
calls to the file system.

Clearly, if free space gets higher than necessary to support
creation of new tuples, it can start to harm performance, and you
may need to take aggressive action (such as CLUSTER) to reclaim it;
but any time you find it necessary to do *that* you should be
investigating what went wrong to put you in such a spot.  Either
your autovacuum is (as Shaun suggested) not aggressive enough, or
you have some long running transaction (possibly "idle in
transaction") which is preventing vacuums from doing their work
effectively.  Investigating that is going to help more than
calculating just how much space the database is going to give up to
file system free space.

-Kevin

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Thank you very much for your detailed explanation !

I will be working on our existing "auto-vacuuming" strategy to see if that's optimal. But, we do have VACUUM VERBOSE ANALYZE running at the cluster level every day and auto-vacuum is aggressive for highly active tables.

Today, we have vacuumed a 10GB table and the table size decreased to 5 GB.

I understand that, it would very expensive for the table to reclaim the space back from the filesystem. We have decided to do the maintenance after a thorough analysis and our databases were not subjected to any kind of maintenance activity since 2 yrs (with downtime).

I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to ensure that IO performance and Indexing performance would be good and the PG optimizer would pick up the optimal plan. As said earlier, our databases have never been part of any re-organization since 2 years and are highly transactional databases. I believe that, performing VACUUM FULL and RE-INDEXING would have tightly packed rows (in every page) would ensure good IOs.

I might have not put across the explanation in an understandable manner.

Please help me know the following -

1. When would pg_stat_user_tables will be updated and what would the information show ?
2. Will the information about dead-rows and live-rows vanish after VACUUM or ANALYZE or VACUUM FULL ?

I am just preparing a monitoring system which would help us know the rate of bloats and data generation on daily basis.

Sorry for the long email !

Looking forward for your help !

Thanks
Venkat




On Wed, Sep 21, 2011 at 7:27 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Shaun Thomas <sthomas@peak6.com> wrote:
> Venkat Balaji wrote:

>> I see lot of free spaces or free pages in Tables and Indexes.
>> But, I need to give an exact calculation on how much space will
>> be reclaimed after VACUUM FULL and RE-INDEXING.
>
> Why?

I've been wondering that, too.  And talking about the space being
"reclaimed" seems to be at odds with your subject line.  The space
is given up by the database engine to the file system free space,
where reuse by the database will be much more expensive.  For good
performance you want some free space in the tables and indexes,
where it can be allocated to new tuples without going out through OS
calls to the file system.

Clearly, if free space gets higher than necessary to support
creation of new tuples, it can start to harm performance, and you
may need to take aggressive action (such as CLUSTER) to reclaim it;
but any time you find it necessary to do *that* you should be
investigating what went wrong to put you in such a spot.  Either
your autovacuum is (as Shaun suggested) not aggressive enough, or
you have some long running transaction (possibly "idle in
transaction") which is preventing vacuums from doing their work
effectively.  Investigating that is going to help more than
calculating just how much space the database is going to give up to
file system free space.

-Kevin

Re: : Performance Improvement Strategy

From
Greg Smith
Date:
On 09/21/2011 12:13 PM, Venkat Balaji wrote:
> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE
> to ensure that IO performance and Indexing performance would be good


Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM
FULL.  You probably don't want to do that.  A multi-gigabyte table can
easily be unavailable for several hours if you execute VACUUM FULL
against it.  CLUSTER is almost always faster.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Thanks Greg !

If i got it correct, CLUSTER would do the same what VACUUM FULL does (except being fast).

CLUSTER is recommended only because it is faster ? As per the link, the table would be unavailable (for shorter period compared to VACUUM FULL) when CLUSTER is executed as well. Hope i got it correct !

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:27 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 09/21/2011 12:13 PM, Venkat Balaji wrote:
I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to ensure that IO performance and Indexing performance would be good


Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM FULL.  You probably don't want to do that.  A multi-gigabyte table can easily be unavailable for several hours if you execute VACUUM FULL against it.  CLUSTER is almost always faster.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: : Performance Improvement Strategy

From
"Kevin Grittner"
Date:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> If i got it correct, CLUSTER would do the same what VACUUM FULL
> does (except being fast)

CLUSTER copies the table (in the sequence of the specified index) to
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones.  So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.)  REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.

When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible.  Like a name column if
you do a lot of name searches.

-Kevin

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
We had performed VACUUM FULL on our production and performance has improved a lot !

I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot.

Thanks for all your inputs and help !

Regards,
VB

On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> If i got it correct, CLUSTER would do the same what VACUUM FULL
> does (except being fast)

CLUSTER copies the table (in the sequence of the specified index) to
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones.  So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.)  REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.

When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible.  Like a name column if
you do a lot of name searches.

-Kevin

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Forgot to mention -

Kevin,

CLUSTER seems to be an very interesting concept to me.

I am thinking to test the CLUSTER TABLE on our production according to the Index usage on the table.

Will let you know once i get the results.

Regards,
VB

On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
We had performed VACUUM FULL on our production and performance has improved a lot !

I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot.

Thanks for all your inputs and help !

Regards,
VB


On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> If i got it correct, CLUSTER would do the same what VACUUM FULL
> does (except being fast)

CLUSTER copies the table (in the sequence of the specified index) to
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones.  So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.)  REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.

When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible.  Like a name column if
you do a lot of name searches.

-Kevin


Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Hello,

Thanks for your suggestions !

We CLUSTERED a table using mostly used Index. Application is performing better now.

Thanks
VB

On Tue, Sep 27, 2011 at 6:01 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Forgot to mention -

Kevin,

CLUSTER seems to be an very interesting concept to me.

I am thinking to test the CLUSTER TABLE on our production according to the Index usage on the table.

Will let you know once i get the results.

Regards,
VB

On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
We had performed VACUUM FULL on our production and performance has improved a lot !

I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot.

Thanks for all your inputs and help !

Regards,
VB


On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> If i got it correct, CLUSTER would do the same what VACUUM FULL
> does (except being fast)

CLUSTER copies the table (in the sequence of the specified index) to
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones.  So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.)  REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.

When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible.  Like a name column if
you do a lot of name searches.

-Kevin



Re: : Performance Improvement Strategy

From
"Kevin Grittner"
Date:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> We CLUSTERED a table using mostly used Index. Application is
> performing better now.

CLUSTER can help in at least four ways:

(1)  It eliminates bloat in the table heap.

(2)  It eliminates bloat in the indexes.

(3)  It can correct fragmentation in the underlying disk files.

(4)  It can put tuples which are accessed by the same query into
adjacent locations on disk, reducing physical disk access.

An aggressive autovacuum configuration can generally prevent the
first two from coming back to haunt you, and the third may not be a
big problem (depending on your OS and file system), but that last
one is a benefit which will degrade over time in most use cases --
the order in the heap is set by the cluster, but not maintained
after that.  If this ordering is a significant part of the
performance improvement you're seeing, you may want to schedule some
regular CLUSTER run.  It's hard to say what frequency would make
sense, but if performance gradually deteriorates and a CLUSTER fixes
it, you'll get a sense of how often it pays to do it.

-Kevin

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Thanks a lot Kevin !

This email has deepened my understanding on the clustering concept.

Keeping this in mind, I have recommended a new disk layout at the OS level for our production servers so that IOs will be balanced on the disks as well.

Currently, we do not have mount points divided according to the type of IOs.

I will share my recommended plan in an different email thread.

Thanks again for this detailed explanation.

Regards,
VB

On Mon, Oct 3, 2011 at 9:45 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> We CLUSTERED a table using mostly used Index. Application is
> performing better now.

CLUSTER can help in at least four ways:

(1)  It eliminates bloat in the table heap.

(2)  It eliminates bloat in the indexes.

(3)  It can correct fragmentation in the underlying disk files.

(4)  It can put tuples which are accessed by the same query into
adjacent locations on disk, reducing physical disk access.

An aggressive autovacuum configuration can generally prevent the
first two from coming back to haunt you, and the third may not be a
big problem (depending on your OS and file system), but that last
one is a benefit which will degrade over time in most use cases --
the order in the heap is set by the cluster, but not maintained
after that.  If this ordering is a significant part of the
performance improvement you're seeing, you may want to schedule some
regular CLUSTER run.  It's hard to say what frequency would make
sense, but if performance gradually deteriorates and a CLUSTER fixes
it, you'll get a sense of how often it pays to do it.

-Kevin

Re: : Performance Improvement Strategy

From
Venkat Balaji
Date:
Hello,

I was attempting to calculate the actual occupied space by a Table.

Below is what i did -

I summed up the avg_width of each column of a table from pg_stats, which gives me the average size of a row (277 bytes).

select sum(avg_width) as average_row_size from pg_stats where tablename='tablename'

 average_row_size
---------------------------
       277

(1 row)

Calculated the actual occupied space by rows in the table as below -

Took the average_row_size * number_of_rows from pg_class

select 277*reltuples/1024 as occupied_space from pg_class where relname='tablename'; == 552 KB

 occupied_space
-------------------------
 552.6474609375

Calculated the actual Table size (600 kb)

select pg_size_pretty(pg_relation_size('tablename'));                                                                                     

pg_size_pretty
----------------
 600 KB

(1 row)

Calculated the free space with in the table (by scanning the pages - as suggested by Shaun Thomas) -- 14 KB

SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');

 mb_free
---------
 14 KB

(1 row)

600 KB is the size of the table (taken through pg_size_pretty)
14 KB is the free space (taken through contrib modules)
600+14 = 586 KB -- is the occupied space by normal calculation through contrib modules. This is based on number of pages allocated to the table.
552 KB is the actual occupied size by the rows (taken by calculating avg row size ). This is based on number of rows with in the pages.
586-552 = 34 KB -- is still free some where with in the occupied pages ( calculated through pg_stats and pg_class )
34 KB is still free within the pages ( each 8K ) which is basically taken as occupied space.

This is similar concept which i successfully applied in an other RDBMS Technology to calculate space usage metrics on production.
This is all calculated after considering Vacuum and Analyze jobs are executed.

Please comment !

Sorry if this is too confusing and too long.

Thanks
VB

On Wed, Sep 21, 2011 at 6:33 PM, Shaun Thomas <sthomas@peak6.com> wrote:
On 09/20/2011 11:22 AM, Venkat Balaji wrote:

Please help me understand how to calculate free space in Tables and
Indexes even after vacuuming and analyzing is performed.

Besides the query Mark gave you using freespacemap, there's also the pgstattuple contrib module. You'd use it like this:

SELECT pg_size_pretty(free_space) AS mb_free
 FROM pgstattuple('some_table');

Query must be run as a super-user, and I wouldn't recommend running it on huge tables, since it scans the actual data files to get its information. There's a lot of other useful information in that function, such as the number of dead rows.


What i understand is that, even if we perform VACUUM ANALYZE
regularly, the free space generated is not filled up.

VACUUM does not actually generate free space. It locates and marks reusable tuples. Any future updates or inserts on that table will be put in those newly reclaimed spots, instead of being bolted onto the end of the table.


I see lot of free spaces or free pages in Tables and Indexes. But, I
need to give an exact calculation on how much space will be reclaimed
after VACUUM FULL and RE-INDEXING.

Why? If your database is so desperate for space, VACUUM and REINDEX won't really help you. A properly maintained database will still have a certain amount of "bloat" equal to the number of rows that change between maintenance intervals. One way or another, that space is going to be used by *something*.

It sounds more like you need to tweak your autovacuum settings to be more aggressive if you're seeing significant enough turnover that your tables are bloating significantly. One of our tables, for instance, gets vacuumed more than once per hour because it experiences 1,000% turnover daily.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: : Performance Improvement Strategy

From
Raghavendra
Date:

On Wed, Oct 5, 2011 at 2:38 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Hello,

I was attempting to calculate the actual occupied space by a Table.

Below is what i did -

I summed up the avg_width of each column of a table from pg_stats, which gives me the average size of a row (277 bytes).

select sum(avg_width) as average_row_size from pg_stats where tablename='tablename'

 average_row_size
---------------------------
       277

(1 row)

Calculated the actual occupied space by rows in the table as below -

Took the average_row_size * number_of_rows from pg_class

select 277*reltuples/1024 as occupied_space from pg_class where relname='tablename'; == 552 KB

 occupied_space
-------------------------
 552.6474609375

Calculated the actual Table size (600 kb)

select pg_size_pretty(pg_relation_size('tablename'));                                                                                     

pg_size_pretty
----------------
 600 KB

(1 row)

Calculated the free space with in the table (by scanning the pages - as suggested by Shaun Thomas) -- 14 KB

SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');

 mb_free
---------
 14 KB

(1 row)

600 KB is the size of the table (taken through pg_size_pretty)
14 KB is the free space (taken through contrib modules)
600+14 = 586 KB -- is the occupied space by normal calculation through contrib modules. This is based on number of pages allocated to the table.

Its typo 600 - 14 = 586 KB 

552 KB is the actual occupied size by the rows (taken by calculating avg row size ). This is based on number of rows with in the pages.
586-552 = 34 KB -- is still free some where with in the occupied pages ( calculated through pg_stats and pg_class )
34 KB is still free within the pages ( each 8K ) which is basically taken as occupied space.


One more point to add to this good discussion, each row header will occupy 24 bytes + 4 bytes pointer on page to tuple.

---
Regards,
Raghavendra
EnterpriseDB Corporation

Re: : Performance Improvement Strategy

From
Guillaume Cottenceau
Date:
Venkat Balaji <venkat.balaji 'at' verse.in> writes:

> Hello,
>
> I was attempting to calculate the actual occupied space by a Table.

SELECT relname, reltuples, pg_size_pretty(relpages*8*1024) as size FROM pg_class, pg_namespace WHERE pg_namespace.oid =
pg_class.relnamespaceAND relkind = 'r' AND nspname = 'public' ORDER BY relpages DESC; 

relkind = 'i' for indexes.

--
Guillaume Cottenceau

Re: : Performance Improvement Strategy

From
"Kevin Grittner"
Date:
Raghavendra <raghavendra.rao@enterprisedb.com> wrote:
> Venkat Balaji <venkat.balaji@verse.in>wrote:

>> [attempt to calculate file space from row layout and number of
>> rows]

> One more point to add to this good discussion, each row header
> will occupy 24 bytes + 4 bytes pointer on page to tuple.

Not to mention:

http://www.postgresql.org/docs/9.1/interactive/storage-toast.html

-Kevin

Re: : Performance Improvement Strategy

From
Scott Marlowe
Date:
On Wed, Sep 21, 2011 at 11:57 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 09/21/2011 12:13 PM, Venkat Balaji wrote:
>>
>> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
>> ensure that IO performance and Indexing performance would be good
>
>
> Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM FULL.
>  You probably don't want to do that.  A multi-gigabyte table can easily be
> unavailable for several hours if you execute VACUUM FULL against it.
>  CLUSTER is almost always faster.

It used to be that cluster on a very randomly ordered table was much
slower than doing something like select * into newtable from oldtable
order by col1, col2;  Is that still the case in 9.0/9.1?

Re: : Performance Improvement Strategy

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> It used to be that cluster on a very randomly ordered table was much
> slower than doing something like select * into newtable from oldtable
> order by col1, col2;  Is that still the case in 9.0/9.1?

Fixed in 9.1, per release notes:

    * Allow CLUSTER to sort the table rather than scanning the index when it seems likely to be cheaper (Leonardo
Francalanci)

            regards, tom lane

Re: : Performance Improvement Strategy

From
Raghavendra
Date:
Thanks Kevin. 

Am in 9.1 and tested same scenario, how exactly storage metrics are calculated. Please comment.

Table Structure:
postgres=# \d test
     Table "public.test"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 name   | text    |

No. of rows:

postgres=# select relname,reltuples from pg_class where relname='test';
 relname | reltuples
---------+-----------
 test    |      1001
(1 row)

Average Row size:

postgres=# select sum(avg_width) as average_row_size from pg_stats where tablename='test';
 average_row_size
------------------
               17
(1 row)

Occupied Space:

postgres=# select 17*reltuples/1024 as "No.of.Row_size * No.of.Rows = Occupied_Space" from pg_class where relname='test';
 No.of.Row_size * No.of.Rows = Occupied_Space
----------------------------------------------
                                16.6181640625

Actual Table Size:

postgres=# select pg_size_pretty(pg_relation_size('test'));
 pg_size_pretty
----------------
 48 kB
(1 row)

or 

postgres=# SELECT relname, reltuples, pg_size_pretty(relpages*8*1024) as size FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'r' AND nspname = 'public' ORDER BY relpages DESC;
 relname | reltuples | size
---------+-----------+-------
 test    |      1001 | 48 kB
(1 row)

Its different here:

postgres=# \dt+ test
                   List of relations
 Schema | Name | Type  |  Owner   | Size  | Description
--------+------+-------+----------+-------+-------------
 public | test | table | postgres | 88 kB |
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('test'));
 pg_size_pretty
----------------
 88 kB
(1 row)

Free Space:

postgres=# SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('test');
  mb_free
-----------
 936 bytes
(1 row)

or

postgres=# select * from pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
     49152 |        1001 |     41041 |          83.5 |                0 |              0 |                  0 |        936 |          1.9
(1 row)

OS Level Storage:

bash-4.1$ ll -h 16447*
-rw------- 1 postgres postgres  48K Oct  2 17:40 16447
-rw------- 1 postgres postgres  24K Oct  2 17:40 16447_fsm
-rw------- 1 postgres postgres 8.0K Oct  2 17:40 16447_vm

What has occupied in extra 8KB ?

postgres=# select pg_size_pretty(pg_total_relation_size('test'));
 pg_size_pretty
----------------
 88 kB
(1 row)

Thanks in advance.

---
Regards,
Raghavendra
EnterpriseDB Corporation

Re: : Performance Improvement Strategy

From
Scott Marlowe
Date:
On Wed, Oct 5, 2011 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> It used to be that cluster on a very randomly ordered table was much
>> slower than doing something like select * into newtable from oldtable
>> order by col1, col2;  Is that still the case in 9.0/9.1?
>
> Fixed in 9.1, per release notes:
>
>        * Allow CLUSTER to sort the table rather than scanning the index when it seems likely to be cheaper (Leonardo
Francalanci)

Looks like I owe Leonardo Francalanci a pizza.

Re: : Performance Improvement Strategy

From
Leonardo Francalanci
Date:
>>         * Allow CLUSTER to sort the table rather than scanning the index 

> when it seems likely to be cheaper (Leonardo Francalanci)
>
> Looks like I owe Leonardo Francalanci a pizza.



Well, the patch started from a work by Gregory Stark, and Tom fixed
a nasty bug; but I'll take a slice ;)


Leonardo