Thread: More Praise for 7.4RC2
More praise for 7.4RC2:
I've installed 7.4RC2 and restored a fairly complex 20GB database (from 7.3.4) with ~75M rows in 30 tables and 4 schemas, numerous triggers and constraints, procs in plpgsql and plperl. To say that it all works great hugely underestimates the improvements. Thank you! Great work, guys!
Here's one comparison of a query that choked 7.3.4 and flies in 7.4RC2:
And with the same data and indices:
The cost estimates give the right qualitative feel. 7.3.4 took >30 minutes for this query whereas 7.4RC2 responds essentially instantaneously.
Again, thanks pgsql-hackers. This is great!
-Reece
P.S. I don't use plpython, but I did try to install the language with Python 1.5 (as I said, I don't use python). This does work with 7.3.4 on the same machine/environment. I got:
createlang: language installation failed: ERROR: could not load library "/apps/compbio/i686-linux-2.4/opt/postgresql-7.4RC2/lib/plpython.so": /apps/compbio/i686-linux-2.4/opt/postgresql-7.4RC2/lib/plpython.so: undefined symbol: PyDict_Copy
I strongly suspect my user error. However, it's probably worth getting some python user to ensure all's well there.
I've installed 7.4RC2 and restored a fairly complex 20GB database (from 7.3.4) with ~75M rows in 30 tables and 4 schemas, numerous triggers and constraints, procs in plpgsql and plperl. To say that it all works great hugely underestimates the improvements. Thank you! Great work, guys!
Here's one comparison of a query that choked 7.3.4 and flies in 7.4RC2:
7.3.4=> explain select count(distinct pseq_id) from paprospect2 where pmodel_id in (select pmodel_id from pmsm_prospect2 where pmodelset_id=2) and run_id=1 and svm>11::real; QUERY PLAN ----------------------------------------------------------------------------------Aggregate (cost=78639954.60..78639954.60 rows=1 width=4) -> Seq Scan on paprospect2 (cost=0.00..78639951.41 rows=1274 width=4) Filter: ((run_id = 1) AND (svm > 11::real) AND (subplan)) SubPlan -> Materialize (cost=3.02..3.02 rows=3 width=4) -> Seq Scan on pmsm_prospect2 (cost=0.00..3.02 rows=3 width=4) Filter: (pmodelset_id = 2)
And with the same data and indices:
7.4RC2=# explain select count(distinct pseq_id) from paprospect2 where pmodel_id in (select pmodel_id from pmsm_prospect2 where pmodelset_id=2) and run_id=1 and svm>11::real; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=871.09..871.09 rows=1 width=4) -> Nested Loop (cost=3.03..870.55 rows=215 width=4) -> HashAggregate (cost=3.03..3.03 rows=1 width=4) -> Seq Scan on pmsm_prospect2 (cost=0.00..3.02 rows=1 width=4) Filter: (pmodelset_id = 2) -> Index Scan using paprospect2_search1 on paprospect2 (cost=0.00..864.84 rows=215 width=8) Index Cond: ((paprospect2.pmodel_id = "outer".pmodel_id) AND (paprospect2.run_id = 1) AND (paprospect2.svm > 11::
The cost estimates give the right qualitative feel. 7.3.4 took >30 minutes for this query whereas 7.4RC2 responds essentially instantaneously.
Again, thanks pgsql-hackers. This is great!
-Reece
P.S. I don't use plpython, but I did try to install the language with Python 1.5 (as I said, I don't use python). This does work with 7.3.4 on the same machine/environment. I got:
createlang: language installation failed: ERROR: could not load library "/apps/compbio/i686-linux-2.4/opt/postgresql-7.4RC2/lib/plpython.so": /apps/compbio/i686-linux-2.4/opt/postgresql-7.4RC2/lib/plpython.so: undefined symbol: PyDict_Copy
I strongly suspect my user error. However, it's probably worth getting some python user to ensure all's well there.
-- Reece Hart, Ph.D. rkh@gene.com, http://www.gene.com/ Genentech, Inc. 650/225-6133 (voice), -5389 (fax) Bioinformatics and Protein Engineering 1 DNA Way, MS-93 http://www.in-machina.com/~reece/ South San Francisco, CA 94080-4990 reece@in-machina.com, GPG: 0x25EC91A0 |
reece@in-machina.com (Reece Hart) wrote in message news:<1068597494.28850.266.camel@tallac>... [...snip...] > The cost estimates give the right qualitative feel. 7.3.4 took >30 > minutes for this query whereas 7.4RC2 responds essentially > instantaneously. > [...snip...] I also posted about the performance increase of 7.4, but I think that much of the difference you're seeing (because it's such a large difference) is probably due to the cleanliness of a newly restored database from backup. Perhaps a vacuum analyze could have improved that 30+ minute response time on 7.3.4. I too posted about the performance improvement, seeing about 200% at first after restoring from backup. After some transactions, however, I'd say its more like 100% - 125% improvement. I'm still particularly impressed with the improvement in the select count(*) query performance, which is about 100% improved.
On Wed, 2003-11-12 at 09:04, jake johnson wrote:
I agree that this seems likely, except that the 7.3.4 database is vacuumed nightly, and analyzed periodically. And about a week ago I reclustered on the index intended to most facilitate this select. Furthermore, merely hardcoding the subselect result achieves a tremendous improvement (which was the workaround I used). So, I'm pretty sure that it's not a vacuum, index use, or cleanliness issue.
I also meant to add in my original post that the system is a dual 2.4G xeon with 4GB of RAM.
-Reece
I also posted about the performance increase of 7.4, but I think that much of the difference you're seeing (because it's such a large difference) is probably due to the cleanliness of a newly restored database from backup.
I agree that this seems likely, except that the 7.3.4 database is vacuumed nightly, and analyzed periodically. And about a week ago I reclustered on the index intended to most facilitate this select. Furthermore, merely hardcoding the subselect result achieves a tremendous improvement (which was the workaround I used). So, I'm pretty sure that it's not a vacuum, index use, or cleanliness issue.
I also meant to add in my original post that the system is a dual 2.4G xeon with 4GB of RAM.
-Reece
-- Reece Hart, http://www.in-machina.com/~reece/, GPG:0x25EC91A0 0xD178AAF9 |
Actually, in your case it's probably the new optimisation regarding the use of IN (subquery). They're now optimised to the same lavel as EXISTS IIRC. On Wed, Nov 12, 2003 at 05:46:23PM -0800, Reece Hart wrote: > On Wed, 2003-11-12 at 09:04, jake johnson wrote: > > > I also posted about the performance increase of 7.4, but I think that > > much of the difference you're seeing (because it's such a large > > difference) is probably due to the cleanliness of a newly restored > > database from backup. > > > I agree that this seems likely, except that the 7.3.4 database is > vacuumed nightly, and analyzed periodically. And about a week ago I > reclustered on the index intended to most facilitate this select. > Furthermore, merely hardcoding the subselect result achieves a > tremendous improvement (which was the workaround I used). So, I'm pretty > sure that it's not a vacuum, index use, or cleanliness issue. > > I also meant to add in my original post that the system is a dual 2.4G > xeon with 4GB of RAM. > > -Reece > > > > -- > Reece Hart, http://www.in-machina.com/~reece/, GPG:0x25EC91A0 0xD178AAF9 -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
On Wed, 12 Nov 2003, Reece Hart wrote: > On Wed, 2003-11-12 at 09:04, jake johnson wrote: > > > I also posted about the performance increase of 7.4, but I think that > > much of the difference you're seeing (because it's such a large > > difference) is probably due to the cleanliness of a newly restored > > database from backup. > > > I agree that this seems likely, except that the 7.3.4 database is > vacuumed nightly, and analyzed periodically. And about a week ago I > reclustered on the index intended to most facilitate this select. > Furthermore, merely hardcoding the subselect result achieves a > tremendous improvement (which was the workaround I used). So, I'm pretty > sure that it's not a vacuum, index use, or cleanliness issue. Do you vacuum full every so often? If not, and if you've been overflowing your fsm, then your tables will just grow without shrinking. Also, index growth could be a problem. The real test is to dump the database and reload it to give 7.3.4 a fair shake.
On Thu, 2003-11-13 at 10:09, scott.marlowe wrote:
Hmm. I didn't realize that I needed to vacuum full as well -- I thought vacuum was sufficient for performance gains, and that full reclaimed space but didn't result in significant performance gains. I have reindexed infrequently, but since that locks the table I didn't do that (or vacuum full) often. I guess I should try out pg_autovacuum, but I think that full vacuums only to prevent XID wraparound (if age>1.5B transactions), but not for compaction (is this correct?).
7.3.4, long-running db: eons
7.3.4, freshly restored: 72s
7.4RC2, freshly restored: 0.3s
Thanks everyone for feedback and setting me straight. Although the gain isn't as great as I thought, it's still very significant.
-Reece
Do you vacuum full every so often? If not, and if you've been overflowing your fsm, then your tables will just grow without shrinking. Also, index growth could be a problem.
Hmm. I didn't realize that I needed to vacuum full as well -- I thought vacuum was sufficient for performance gains, and that full reclaimed space but didn't result in significant performance gains. I have reindexed infrequently, but since that locks the table I didn't do that (or vacuum full) often. I guess I should try out pg_autovacuum, but I think that full vacuums only to prevent XID wraparound (if age>1.5B transactions), but not for compaction (is this correct?).
It turns out that I have two copies of this database around at the moment running on 7.3.4. One was a fresh restore, and that's what I used to generate the explain. However, the query was run on the older database which was vacuumed and analyzed (but not vacuum full or reindexed), and on that instance the query took a long time. On the fresh install, it takes 72s. In summary:The real test is to dump the database and reload it to give 7.3.4 a fair shake.
7.3.4, long-running db: eons
7.3.4, freshly restored: 72s
7.4RC2, freshly restored: 0.3s
Thanks everyone for feedback and setting me straight. Although the gain isn't as great as I thought, it's still very significant.
-Reece
-- Reece Hart, http://www.in-machina.com/~reece/, GPG:0x25EC91A0 |
Are there any guidelines on how often one should do a reindex?
----- Original Message -----From: Reece HartTo: scott.marloweCc: pgsql-generalSent: Thursday, November 13, 2003 12:50 PMSubject: Re: [GENERAL] More Praise for 7.4RC2On Thu, 2003-11-13 at 10:09, scott.marlowe wrote:Do you vacuum full every so often? If not, and if you've been overflowing your fsm, then your tables will just grow without shrinking. Also, index growth could be a problem.
Hmm. I didn't realize that I needed to vacuum full as well -- I thought vacuum was sufficient for performance gains, and that full reclaimed space but didn't result in significant performance gains. I have reindexed infrequently, but since that locks the table I didn't do that (or vacuum full) often. I guess I should try out pg_autovacuum, but I think that full vacuums only to prevent XID wraparound (if age>1.5B transactions), but not for compaction (is this correct?).It turns out that I have two copies of this database around at the moment running on 7.3.4. One was a fresh restore, and that's what I used to generate the explain. However, the query was run on the older database which was vacuumed and analyzed (but not vacuum full or reindexed), and on that instance the query took a long time. On the fresh install, it takes 72s. In summary:The real test is to dump the database and reload it to give 7.3.4 a fair shake.
7.3.4, long-running db: eons
7.3.4, freshly restored: 72s
7.4RC2, freshly restored: 0.3s
Thanks everyone for feedback and setting me straight. Although the gain isn't as great as I thought, it's still very significant.
-Reece
-- Reece Hart, http://www.in-machina.com/~reece/, GPG:0x25EC91A0
On Thu, 13 Nov 2003, Reece Hart wrote: > On Thu, 2003-11-13 at 10:09, scott.marlowe wrote: > > > Do you vacuum full every so often? If not, and if you've been overflowing > > your fsm, then your tables will just grow without shrinking. > > Also, index growth could be a problem. > > > Hmm. I didn't realize that I needed to vacuum full as well -- I thought > vacuum was sufficient for performance gains, and that full reclaimed > space but didn't result in significant performance gains. I have > reindexed infrequently, but since that locks the table I didn't do that > (or vacuum full) often. I guess I should try out pg_autovacuum, but I > think that full vacuums only to prevent XID wraparound (if age>1.5B > transactions), but not for compaction (is this correct?). Assuming your free space map has enough room, and you vacuum (plain vacuum) often enough, you're dandy. But, let's say you run 10,000 transactions on a 1,000 row table, then run a plain vacuum. Even if your fsm can hold all the space that is free in that table's space, you've got a sparsely populated table that will take a while to seq scan through. I.e. if you don't vacuum (regular) often enough, then your tables may be quite large. I've tested out the pg_autovacuum daemon, and it seems to work quite well for me. What you're really shooting for is a "steady state" table size. Let's say you've got a 10,000 row table, and you average 500 changes an hour on it. If you vacuum it every day it will probably be fine, as by the end of 24 hours, the table will take up the space of about 22,000 rows. So, it will have the same basic performance as if it had 22,000 rows in it. If you start vacuuming it every hour after that first few days, then it will always be about 22,000 rows in size, (assuming the rows aren't noticable bigger or smaller from one version to the next.) While plain vacuums can reclaim the space at the very end of a table, and could theoretically make this table shrink over time, it's highly unlikely to ever drop back down to the approximate 10,000 rows in size started with. Vacuum full will drop it back down to somewhere around there. So, if your table is HIGHLY updated, you may need to run a plain vacuum very often, and that's where the autovacuum daemon comes in handy. Just set it to run every 30 minutes or so, and let it go. It should only vacuum the tables that have had lots of change, and leave the others alone. With the vacuum delay patch that's in testing for 7.5, it may well be that running the autovacuum daemon will become acceptable in places where, right now, vacuum, even the regular kind, produce too much system load / slow down in the middle of the day. > > The real test is to dump the database and reload it to give 7.3.4 a fair > > shake. > > It turns out that I have two copies of this database around at the > moment running on 7.3.4. One was a fresh restore, and that's what I used > to generate the explain. However, the query was run on the older > database which was vacuumed and analyzed (but not vacuum full or > reindexed), and on that instance the query took a long time. On the > fresh install, it takes 72s. In summary: > > 7.3.4, long-running db: eons > 7.3.4, freshly restored: 72s > 7.4RC2, freshly restored: 0.3s You can use the oid2name program in the contrib directory to kinda research which files are big under those trees and see if it's a table or index growth problem. something like: #su - postgres $ oid2name All databases: --------------------------------- 16975 = template0 16976 = postgres $ cd $PGDATA/base/16976 du -s *|sort -n 1004 16640 12232 109169550 65644 109169548 will list the largest files in the postgres database directory. oid2name -d postgres |grep 10169548 109169548 = accounts tells me that it's the accounts table that's taking up all my room. and so on.
Oh, another good choice for embedding is sleepycat's berkely db database, or just plain old db style (gdbm lib, or ndbm, or any of a few others) hash databases. Simple, non-relational, and fast. On Thu, 13 Nov 2003, scott.marlowe wrote: > On Thu, 13 Nov 2003, Reece Hart wrote: > > > On Thu, 2003-11-13 at 10:09, scott.marlowe wrote: > > > > > Do you vacuum full every so often? If not, and if you've been overflowing > > > your fsm, then your tables will just grow without shrinking. > > > Also, index growth could be a problem. > > > > > > Hmm. I didn't realize that I needed to vacuum full as well -- I thought > > vacuum was sufficient for performance gains, and that full reclaimed > > space but didn't result in significant performance gains. I have > > reindexed infrequently, but since that locks the table I didn't do that > > (or vacuum full) often. I guess I should try out pg_autovacuum, but I > > think that full vacuums only to prevent XID wraparound (if age>1.5B > > transactions), but not for compaction (is this correct?).
Reece Hart <reece@in-machina.com> writes: > On Thu, 2003-11-13 at 10:09, scott.marlowe wrote: > > > Do you vacuum full every so often? If not, and if you've been overflowing > > your fsm, then your tables will just grow without shrinking. > > Also, index growth could be a problem. > > > Hmm. I didn't realize that I needed to vacuum full as well -- I thought > vacuum was sufficient for performance gains, and that full reclaimed > space but didn't result in significant performance gains. plain Vacuum is sufficient if the amount of free space it finds fits within the free space map. During normal use with frequent vacuums on a system with well-tuned fsm parameters that should be true. However on a big heavily used database where the fsm parameters haven't been raised from the defaults it's possible that it isn't. And on a table where large batch updates or deletes have been run it's possible to require a vacuum full after the batch job creates lots of dead tuples. -- greg
On Friday 14 November 2003 01:33, Rick Gigger wrote: > Are there any guidelines on how often one should do a reindex? Vacuum in 7.4 does take care of index bloat, much better than earlier versions. So if you run autovacuum daemon with 7.4, then you can do away with reindex. Of course testing at your site will provide the best answer. There is no better tests than that..:-) Shridhar
Hello! > You can use the oid2name program in the contrib directory to kinda > research which files are big under those trees and see if > it's a table or > index growth problem. I found it a tedious operation, if you want to keep a check on growth of your databases regularly. So I wrote a litte script which outputs a sorted comma separated list of all objects within a database - so I can do ./showdbsize foodb >foodb.csv and import this thing in Excel for further processing or do whatever I like with it. There may surely be more elegant ways of getting the task done using another language or just plain bash-scripting and you have to have oid2name made and installed, but in terms of language php is what I am most comfortable with - if deemed necessary, anyone might write their own little tool in their preferred ways and languages; but it's short enough, so I'll just post it here if anyone wants to make use of it as it is. Kind Regards, Markus Here goes: #!/usr/bin/php -q <?php # showdbsize for PostgreSQL # # MWollny - 2003 # ###################################### /* Config Begin */ $pg_user = 'postgres'; $pg_bindir = '/opt/pgsql/bin/'; $pg_data ='/var/lib/pgsql/data/base/'; /* Config End */ ###################################### /* DO NOT EDIT BELOW THIS LINE */ $argv=$_SERVER['argv']; $argc=$_SERVER['argc']; if ($argc != 2 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) { ?> This is a commandline PHP script to generate a list of object-ids, names and filesizes of all tables/indexes within a specified POSTGRESQL-database. Usage: <?php echo $argv[0]; ?> <database> <database> is the name of the database you wish to generate the list of. With the --list, -list, -l or --show, -show or -s options, you can get a list of all available databases on this server. With the --help, -help, -h, or -? options, you can get this help. <?php } else { /* Function to make bytesize numbers human-readable */ function fsize($file) { $a = array("B", "KB", "MB", "GB", "TB", "PB"); $pos = 0; $size = filesize($file); while ($size >= 1024) { $size /= 1024; $pos++; } return round($size,2)." ".$a[$pos]; } /* One Ring To Find Them All */ $pg_data=$pg_data.'base/'; $db_exec=$pg_bindir.'oid2name -U '.$pg_user; $alldb=`$db_exec`; $i=1; $lines = explode ("\n", $alldb); foreach($lines as $value) { if (!strpos($value, "=")===false) { $dboid[$i] = trim(substr($value,0,strpos($value, "=")-1)); $dbname[$i] = trim(substr(strstr($value,'='),2)); $i++; }} if (in_array($argv[1], array('--show', '-show', '-s', '-l', '--list', '-list'))) { echo "Databases available on this server:\n"; foreach($dbname as $value) {echo " $value\n";} die(); } /* Is argument the name of an existing database on this server? */ if (!in_array ($argv[1], $dbname)) { die ("Database $argv[1] not found on this server.\n"); } /* Still alive? Okay, give me the OID of that DB! */ $i=array_search($argv[1], $dbname); $use_oid=$dboid[$i]; $use_name=$dbname[$i]; $dbdir=$pg_data.$use_oid.'/'; chdir ($dbdir); /* Let's see the list of files of the DB */ $handle=opendir($dbdir); $i=0; while ($file = readdir ($handle)) { if ($file != "." && $file != "..") { $i++; $oid[$i]=$file; } } closedir($handle); /* Now gather data about actual names and filesizes of these objects */ for ($j = 1; $j <= $i; $j++) { if (is_numeric($oid[$j])) { $oid_size[$j]=filesize($oid[$j]); $oid_hsize[$j]=fsize($oid[$j]); $db_exec=$pg_bindir.'oid2name -U '.$pg_user.' -d '.$use_name.' -o '.$oid[$j]; $raw_name=`$db_exec`; $full_name[$j]=trim(substr(substr(strstr($raw_name,'='), 1), 0, -1)); # echo "$oid[$j]; $full_name[$j]; $oid_size[$j]; $oid_hsize[$j] \n"; }} /* Sort and output the list so that it can be piped to a CSV-file */ asort ($oid_size); reset ($oid_size); echo "OID; Name; Size (Bytes); Size (readable)\n"; foreach($oid_size as $key => $tablesize) { echo "$oid[$key]; $full_name[$key]; $oid_size[$key]; $oid_hsize[$key] \n"; } } ?>
On Thu, 2003-11-13 at 13:10, scott.marlowe wrote:
On Thu, 2003-11-13 at 19:37, Greg Stark wrote:
Thanks for this info. I'm sure this explains at least part of the problem. I can't remember the sequence of events from several months back, but I did once update ~20M rows of this 40M row this table, and I have also deleted certain sets of rows at various times. Suspecting that I had a swiss-cheese table, I reclustered on an index several times (which, I presume, is at least as good as vacuum (non-full) removing internal free space, with the benefit of optimized row ordering). Since I can't remember the order of operations, it's possible that I timed the slow query at nearly the worst state, and it's the kinda think I only wanted to endure once.
Thanks again,
Reece
So, if your table is HIGHLY updated, you may need to run a plain vacuum very often, and that's where the autovacuum daemon comes in handy. Just set it to run every 30 minutes or so, and let it go. It should only vacuum the tables that have had lots of change, and leave the others alone.
On Thu, 2003-11-13 at 19:37, Greg Stark wrote:
However on a big heavily used database where the fsm parameters haven't been raised from the defaults it's possible that it isn't. And on a table where large batch updates or deletes have been run it's possible to require a vacuum full after the batch job creates lots of dead tuples.
Scott & Greg-
Thanks for this info. I'm sure this explains at least part of the problem. I can't remember the sequence of events from several months back, but I did once update ~20M rows of this 40M row this table, and I have also deleted certain sets of rows at various times. Suspecting that I had a swiss-cheese table, I reclustered on an index several times (which, I presume, is at least as good as vacuum (non-full) removing internal free space, with the benefit of optimized row ordering). Since I can't remember the order of operations, it's possible that I timed the slow query at nearly the worst state, and it's the kinda think I only wanted to endure once.
Thanks again,
Reece
-- Reece Hart, http://www.in-machina.com/~reece/, GPG:0x25EC91A0 |
reece@in-machina.com (Reece Hart) writes: > On Thu, 2003-11-13 at 10:09, scott.marlowe wrote: > > Do you vacuum full every so often? If not, and if you've been overflowing > your fsm, then your tables will just grow without shrinking. > Also, index growth could be a problem. > > Hmm. I didn't realize that I needed to vacuum full as well -- I > thought vacuum was sufficient for performance gains, and that full > reclaimed space but didn't result in significant performance > gains. I have reindexed infrequently, but since that locks the table > I didn't do that (or vacuum full) often. I guess I should try out > pg_autovacuum, but I think that full vacuums only to prevent XID > wraparound (if age>1.5B transactions), but not for compaction (is > this correct?). That's not quite correct. pg_autovacuum NEVER does a VACUUM FULL, and it is not necessary to do so in order to avoid XID wraparound. A "simple VACUUM" suffices for that purpose.. What pg_autovacuum "buys you" is mainly twofold: 1. You don't need to schedule batch jobs to vacuum things; 2. If you have heavily updated tables, it will vacuum them a lot, which should prevent them from "blowing out" the free space map, and allow quicker reuse of dead space. The merits of that aren't infinite, but are not nothing, either. -- output = reverse("ofni.smrytrebil" "@" "enworbbc") <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
rick@alpinenetworking.com ("Rick Gigger") writes: > Are there any guidelines on how often one should do a reindex? When you discover that performance is "sucking" because of table growth that would be fixed by a reindex. Unfortunately, there's not quite a "quick prescription" for how to discover that :-(. -- (format nil "~S@~S" "cbbrowne" "libertyrms.info") <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
>>>>> "RG" == Rick Gigger <rick@alpinenetworking.com> writes: RG> Are there any guidelines on how often one should do a reindex? regularly monitor your index sizes with a query such as this: SELECT relname,relpages FROM pg_class WHERE relname LIKE 'user_list%' ORDER BY relname; On my most busy tables, index bloat in PG 7.2 is horrendous. I have one index that more than triples in size after about 10 days. After reindex it gets to a respectable 22k relpages. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
What exactly does that number (relpages) mean? It seems that my large tables and indexes have large values for relpages and and small tables and indexes have small values. rg ----- Original Message ----- From: "Vivek Khera" <khera@kcilink.com> Newsgroups: ml.postgres.general To: <pgsql-general@postgresql.org> Sent: Friday, November 14, 2003 2:21 PM Subject: Re: [GENERAL] More Praise for 7.4RC2 > >>>>> "RG" == Rick Gigger <rick@alpinenetworking.com> writes: > > > RG> Are there any guidelines on how often one should do a reindex? > > regularly monitor your index sizes with a query such as this: > > SELECT relname,relpages FROM pg_class WHERE relname LIKE 'user_list%' ORDER BY relname; > > On my most busy tables, index bloat in PG 7.2 is horrendous. I have > one index that more than triples in size after about 10 days. After > reindex it gets to a respectable 22k relpages. > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Vivek Khera, Ph.D. Khera Communications, Inc. > Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
It should be the number of pages in the relation. A page is usually 8K unless it's changed. Not sure how often it's updated though. On Fri, Nov 14, 2003 at 03:02:49PM -0700, Rick Gigger wrote: > What exactly does that number (relpages) mean? It seems that my large > tables and indexes have large values for relpages and and small tables and > indexes have small values. > > rg > > ----- Original Message ----- > From: "Vivek Khera" <khera@kcilink.com> > Newsgroups: ml.postgres.general > To: <pgsql-general@postgresql.org> > Sent: Friday, November 14, 2003 2:21 PM > Subject: Re: [GENERAL] More Praise for 7.4RC2 > > > > >>>>> "RG" == Rick Gigger <rick@alpinenetworking.com> writes: > > > > > > RG> Are there any guidelines on how often one should do a reindex? > > > > regularly monitor your index sizes with a query such as this: > > > > SELECT relname,relpages FROM pg_class WHERE relname LIKE 'user_list%' > ORDER BY relname; > > > > On my most busy tables, index bloat in PG 7.2 is horrendous. I have > > one index that more than triples in size after about 10 days. After > > reindex it gets to a respectable 22k relpages. > > > > -- > > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Vivek Khera, Ph.D. Khera Communications, Inc. > > Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 > > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
Reece Hart <reece@in-machina.com> writes: > Suspecting that I had a swiss-cheese table, I reclustered on an index > several times (which, I presume, is at least as good as vacuum (non-full) > removing internal free space, with the benefit of optimized row ordering). My understanding is that CLUSTER is as good as a VACUUM FULL actually. -- greg
On Sat, Nov 15, 2003 at 11:26:44AM -0500, Greg Stark wrote: > > Reece Hart <reece@in-machina.com> writes: > > > Suspecting that I had a swiss-cheese table, I reclustered on an index > > several times (which, I presume, is at least as good as vacuum (non-full) > > removing internal free space, with the benefit of optimized row ordering). > > My understanding is that CLUSTER is as good as a VACUUM FULL actually. Well, it's different in that it does the equivalent of a REINDEX on all indexes. It's also slower and needs lots of additional space: double the space of your table and all it's indexes. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane)