Thread: Index bloat problem?
All, Running PostgreSQL 7.4.2, Solaris. Client is reporting that the size of an index is greater than the number of rows in the table (1.9 million vs. 1.5 million). Index was automatically created from a 'bigserial unique' column. Database contains several tables with exactly the same columns (including 'bigserial unique' column). This is the only table where this index is out of line with the actual # of rows. Queries on this table take 40 seconds to retrieve 2000 rows as opposed to 1-2 seconds on the other tables. We have been running 'VACUUM ANALYZE' very regularly. In fact, our vacuum schedule has probably been overkill. We have been running on a per-table basis after every update (many per day, only inserts occurring) and after every purge (one per day, deleting a day's worth of data). It is theoretically possible that at some time a process was run that deleted all rows in the table followed by a VACUUM FULL. In this case we would have dropped/recreated our own indexes on the table but not the index automatically created for the bigserial column. If that happened, could that cause these symptoms? What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? thanks, Bill __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Bill, > What about if an out-of-the-ordinary number of rows > were deleted (say 75% of rows in the table, as opposed > to normal 5%) followed by a 'VACUUM ANALYZE'? Could > things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. -- Josh Berkus Aglio Database Solutions San Francisco
--- Josh Berkus <josh@agliodbs.com> wrote: > Bill, > > > What about if an out-of-the-ordinary number of > rows > > were deleted (say 75% of rows in the table, as > opposed > > to normal 5%) followed by a 'VACUUM ANALYZE'? > �Could > > things get out of whack because of that situation? > > Yes. You'd want to run REINDEX after and event like > that. As you should now. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > Thank you. Though I must say, that is very discouraging. REINDEX is a costly operation, timewise and due to the fact that it locks out other processes from proceeding. Updates are constantly coming in and queries are occurring continuously. A REINDEX could potentially bring the whole thing to a halt. Honestly, this seems like an inordinate amount of babysitting for a production application. I'm not sure if the client will be willing to accept it. Admittedly my knowledge of the inner workings of an RDBMS is limited, but could somebody explain to me why this would be so? If you delete a bunch of rows why doesn't the index get updated at the same time? Is this a common issue among all RDBMSs or is it something that is PostgreSQL specific? Is there any way around it? thanks, Bill __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Bill, > Honestly, this seems like an inordinate amount of > babysitting for a production application. I'm not > sure if the client will be willing to accept it. Well, then, tell them not to delete 75% of the rows in a table at once. I imagine that operation brought processing to a halt, too. > Admittedly my knowledge of the inner workings of an > RDBMS is limited, but could somebody explain to me why > this would be so? If you delete a bunch of rows why > doesn't the index get updated at the same time? It does get updated. What doesn't happen is the space getting reclaimed. In a *normal* data situation, those dead nodes would be replaced with new index nodes. However, a mass-delete-in-one-go messes that system up. > Is > this a common issue among all RDBMSs or is it > something that is PostgreSQL specific? Speaking from experience, this sort of thing affects MSSQL as well, although the maintenance routines are different. -- Josh Berkus Aglio Database Solutions San Francisco
Bill, > Honestly, this seems like an inordinate amount of > babysitting for a production application. I'm not > sure if the client will be willing to accept it. Well, then, tell them not to delete 75% of the rows in a table at once. I imagine that operation brought processing to a halt, too. If the client isn't willing to accept the consequences of their own bad data management, I'm not really sure what you expect us to do about it. > Admittedly my knowledge of the inner workings of an > RDBMS is limited, but could somebody explain to me why > this would be so? If you delete a bunch of rows why > doesn't the index get updated at the same time? It does get updated. What doesn't happen is the space getting reclaimed. In a *normal* data situation, the dead nodes are recycled for new rows. But doing a massive delete operation upsets that, and generally needs to be followed by a REINDEX. > Is > this a common issue among all RDBMSs or is it > something that is PostgreSQL specific? Speaking from experience, this sort of thing affects MSSQL as well, although the maintenance routines are different. -- Josh Berkus Aglio Database Solutions San Francisco
Is: REINDEX DATABASE blah supposed to rebuild all indices in the database, or must you specify each table individualy? (I'm asking because I just tried it and it only did system tables) Alex Turner netEconomist On 4/21/05, Josh Berkus <josh@agliodbs.com> wrote: > Bill, > > > What about if an out-of-the-ordinary number of rows > > were deleted (say 75% of rows in the table, as opposed > > to normal 5%) followed by a 'VACUUM ANALYZE'? Could > > things get out of whack because of that situation? > > Yes. You'd want to run REINDEX after and event like that. As you should now. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Alex, > REINDEX DATABASE blah > > supposed to rebuild all indices in the database, or must you specify > each table individualy? (I'm asking because I just tried it and it > only did system tables) "DATABASE Recreate all system indexes of a specified database. Indexes on user tables are not processed. Also, indexes on shared system catalogs are skipped except in stand-alone mode (see below). " http://www.postgresql.org/docs/8.0/static/sql-reindex.html -- Josh Berkus Aglio Database Solutions San Francisco
josh@agliodbs.com (Josh Berkus) writes: > Bill, > >> What about if an out-of-the-ordinary number of rows >> were deleted (say 75% of rows in the table, as opposed >> to normal 5%) followed by a 'VACUUM ANALYZE'? Could >> things get out of whack because of that situation? > > Yes. You'd want to run REINDEX after and event like that. As you should now. Based on Tom's recent comments, I'd be inclined to handle this via doing a CLUSTER, which has the "triple heroism effect" of: a) Reorganizing the entire table to conform with the relevant index order, b) Having the effect of VACUUM FULL, and c) Having the effect of REINDEX all in one command. It has all of the "oops, that blocked me for 20 minutes" effect of REINDEX and VACUUM FULL, but at least it doesn't have the effect twice... -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
>>Is >>this a common issue among all RDBMSs or is it >>something that is PostgreSQL specific? >> >> > >Speaking from experience, this sort of thing affects MSSQL as well, although >the maintenance routines are different. > > > Yes, this is true with MSSQL too, however sql server implements a defrag index that doesn't lock up the table.. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_dbcc_30o9.asp "DBCC INDEXDEFRAG can defragment clustered and nonclustered indexes on tables and views. DBCC INDEXDEFRAG defragments the leaf level of an index so that the physical order of the pages matches the left-to-right logical order of the leaf nodes, thus improving index-scanning performance. ....Every five minutes, DBCC INDEXDEFRAG will report to the user an estimated percentage completed. DBCC INDEXDEFRAG can be terminated at any point in the process, and *any completed work is retained.*" -michael
Michael, > ....Every five minutes, DBCC INDEXDEFRAG will report to the user an > estimated percentage completed. DBCC INDEXDEFRAG can be terminated at > any point in the process, and *any completed work is retained.*" Keen. Sounds like something for our TODO list. -- Josh Berkus Aglio Database Solutions San Francisco
--- a3a18850@telus.net wrote: > I gather you mean, out-of-the-ordinary for most > apps, but not for this client? Actually, no. The normal activity is to delete 3-5% of the rows per day, followed by a VACUUM ANALYZE. Then over the course of the day (in multiple transactions) about the same amount are INSERTed (each transaction followed by a VACUUM ANALYZE on just the updated table). So 75% deletion is just out of the ordinary for this app. However, on occasion, deleting 75% of rows is a legitimate action for the client to take. It would be nice if they didn't have to remember to do things like REINDEX or CLUSTER or whatever on just those occasions. > In case nobody else has asked: is your max_fsm_pages > big enough to handle all > the deleted pages, across ALL tables hit by the > purge? If not, you're > haemorrhaging pages, and VACUUM is probably warning > you about exactly that. This parameter is most likely set incorrectly. So that could be causing problems. Could that be a culprit for the index bloat, though? > If that's not a problem, you might want to consider > partitioning the data. > Take a look at inherited tables. For me, they're a > good approximation of > clustered indexes (sigh, miss'em) and equivalent to > table spaces. > > My app is in a similar boat to yours: up to 1/3 of a > 10M-row table goes away > every day. For each of the child tables that is a > candidate to be dropped, there > is a big prologue txn, whichs moves (INSERT then > DELETE) the good rows into a > child table that is NOT to be dropped. Then BANG > pull the plug on the tables you > don't want. MUCH faster than DELETE: the dropped > tables' files' disk space goes > away in one shot, too. > > Just my 2c. Thanks. Bill __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Same thing happens in Oracle ALTER INDEX <blah> rebuild To force a rebuild. It will mark the free blocks as 'free' below the PCTFREE value for the tablespace. Basically If you build an index with 9999 entries. and each entry is 1/4 of a block, the database will write 2500 blocks to the disk. If you delete a random 75% of the index values, you will now have 2500 blocks that have 75% free space. The database will reuse that free space in those blocks as you insert new values, but until then, you still have 2500 blocks worth of data on a disk, that is only 25% full. Rebuilding the index forces the system to physically re-allocate all that data space, and now you have just 2499 entries, that use 625 blocks. I'm not sure that 'blocks' is the correct term in postgres, it's segments in Oracle, but the concept remains the same. Alex Turner netEconomist On 4/21/05, Bill Chandler <billybobc1210@yahoo.com> wrote: > > --- Josh Berkus <josh@agliodbs.com> wrote: > > Bill, > > > > > What about if an out-of-the-ordinary number of > > rows > > > were deleted (say 75% of rows in the table, as > > opposed > > > to normal 5%) followed by a 'VACUUM ANALYZE'? > > Could > > > things get out of whack because of that situation? > > > > Yes. You'd want to run REINDEX after and event like > > that. As you should now. > > > > -- > > Josh Berkus > > Aglio Database Solutions > > San Francisco > > > > Thank you. Though I must say, that is very > discouraging. REINDEX is a costly operation, timewise > and due to the fact that it locks out other processes > from proceeding. Updates are constantly coming in and > queries are occurring continuously. A REINDEX could > potentially bring the whole thing to a halt. > > Honestly, this seems like an inordinate amount of > babysitting for a production application. I'm not > sure if the client will be willing to accept it. > > Admittedly my knowledge of the inner workings of an > RDBMS is limited, but could somebody explain to me why > this would be so? If you delete a bunch of rows why > doesn't the index get updated at the same time? Is > this a common issue among all RDBMSs or is it > something that is PostgreSQL specific? Is there any > way around it? > > thanks, > > Bill > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote: > Michael, > > > ....Every five minutes, DBCC INDEXDEFRAG will report to the user an > > estimated percentage completed. DBCC INDEXDEFRAG can be terminated at > > any point in the process, and *any completed work is retained.*" > > Keen. Sounds like something for our TODO list. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php for my thoughts on a non-blocking alternative to REINDEX. I got no replies to that message. :-( I've almost got a working solution integrated in the backend that does correct WAL logging and everything. (Writing the code to write and replay WAL logs for complicated operations can be very annoying!) For now I've gone with a syntax of: REINDEX INDEX btree_index_name INCREMENTAL; (For now it's not a proper index AM (accessor method), instead the generic index code knows this is only supported for btrees and directly calls the btree_compress function.) It's not actually a REINDEX per-se in that it doesn't rebuild the whole index. It holds brief exclusive locks on the index while it shuffles items around to pack the leaf pages fuller. There were issues with the code I attached to the above message that have been resolved with the new code. With respect to the numbers provided in that e-mail the new code also recycles more pages than before. Once I've finished it up I'll prepare and post a patch. -- Dave Chapeskie OpenPGP Key ID: 0x3D2B6B34
Dave, > See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php > for my thoughts on a non-blocking alternative to REINDEX. I got no > replies to that message. :-( Well, sometimes you have to be pushy. Say, "Hey, comments please?" The hackers list is about 75 posts a day, it's easy for people to lose track of stuff they meant to comment on. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Quoting Bill Chandler <billybobc1210@yahoo.com>: > ... The normal activity is to delete 3-5% of the rows per day, > followed by a VACUUM ANALYZE. ... > However, on occasion, deleting 75% of rows is a > legitimate action for the client to take. > > In case nobody else has asked: is your max_fsm_pages > > big enough to handle all the deleted pages, > > across ALL tables hit by the purge? > This parameter is most likely set incorrectly. So > that could be causing problems. Could that be a > culprit for the index bloat, though? Look at the last few lines of vacuum verbose output. It will say something like: free space map: 55 relations, 88416 pages stored; 89184 total pages needed Allocated FSM size: 1000 relations + 1000000 pages = 5920 kB shared memory. "1000000" here is [max_fsm_pages] from my postgresql.conf. If the "total pages needed" is bigger than the pages fsm is allocated for, then you are bleeding. -- "Dreams come true, not free." -- S.Sondheim, ITW
Mischa, Thanks. Yes, I understand that not having a large enough max_fsm_pages is a problem and I think that it is most likely the case for the client. What I wasn't sure of was if the index bloat we're seeing is the result of the "bleeding" you're talking about or something else. If I deleted 75% of the rows but had a max_fsm_pages setting that still exceeded the pages required (as indicated in VACUUM output), would that solve my indexing problem or would I still need to REINDEX after such a purge? regards, Bill --- Mischa Sandberg <mischa.sandberg@telus.net> wrote: > Quoting Bill Chandler <billybobc1210@yahoo.com>: > > > ... The normal activity is to delete 3-5% of the > rows per day, > > followed by a VACUUM ANALYZE. > ... > > However, on occasion, deleting 75% of rows is a > > legitimate action for the client to take. > > > > In case nobody else has asked: is your > max_fsm_pages > > > big enough to handle all the deleted pages, > > > across ALL tables hit by the purge? > > > This parameter is most likely set incorrectly. So > > that could be causing problems. Could that be a > > culprit for the index bloat, though? > > Look at the last few lines of vacuum verbose output. > It will say something like: > > free space map: 55 relations, 88416 pages stored; > 89184 total pages needed > Allocated FSM size: 1000 relations + 1000000 pages > = 5920 kB shared memory. > > "1000000" here is [max_fsm_pages] from my > postgresql.conf. > If the "total pages needed" is bigger than the pages > > fsm is allocated for, then you are bleeding. > -- > "Dreams come true, not free." -- S.Sondheim, ITW > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Bill, > If I deleted 75% of the rows but had a max_fsm_pages > setting that still exceeded the pages required (as > indicated in VACUUM output), would that solve my > indexing problem or would I still need to REINDEX > after such a purge? Depends on the performance you're expecting. The FSM relates the the re-use of nodes, not taking up free space. So after you've deleted 75% of rows, the index wouldn't shrink. It just wouldn't grow when you start adding rows. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Bill Chandler wrote: >Mischa, > >Thanks. Yes, I understand that not having a large >enough max_fsm_pages is a problem and I think that it >is most likely the case for the client. What I wasn't >sure of was if the index bloat we're seeing is the >result of the "bleeding" you're talking about or >something else. > >If I deleted 75% of the rows but had a max_fsm_pages >setting that still exceeded the pages required (as >indicated in VACUUM output), would that solve my >indexing problem or would I still need to REINDEX >after such a purge? > >regards, > >Bill > > I don't believe VACUUM re-packs indexes. It just removes empty index pages. So if you have 1000 index pages all with 1 entry in them, vacuum cannot reclaim any pages. REINDEX re-packs the pages to 90% full. fsm just needs to hold enough pages that all requests have free space that can be used before your next vacuum. It is just a map letting postgres know where space is available for a new fill. John =:->
Attachment
Bill Chandler <billybobc1210@yahoo.com> writes: > Client is reporting that the size of an index is > greater than the number of rows in the table (1.9 > million vs. 1.5 million). This thread seems to have wandered away without asking the critical question "what did you mean by that?" It's not possible for an index to have more rows than there are in the table unless something is seriously broken. And there aren't any SQL operations that let you inspect an index directly anyway. So: what is the actual observation that led you to the above conclusion? Facts, please, not inferences. regards, tom lane
On 22 Apr 2005, at 06:57, Tom Lane wrote: > Bill Chandler <billybobc1210@yahoo.com> writes: >> Client is reporting that the size of an index is >> greater than the number of rows in the table (1.9 >> million vs. 1.5 million). > > This thread seems to have wandered away without asking the critical > question "what did you mean by that?" > > It's not possible for an index to have more rows than there are in > the table unless something is seriously broken. And there aren't > any SQL operations that let you inspect an index directly anyway. > So: what is the actual observation that led you to the above > conclusion? Facts, please, not inferences. I work for the client in question. Glad you picked up on that point. I covered the detail in my my post "How can an index be larger than a table" on 21 Apr. 2005. I guess I was too detailed, and too much info put people off. http://archives.postgresql.org/pgsql-performance/2005-04/msg00553.php quoting from there... | |SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'dave_data%'; | |relname relkind reltuples relpages |======================================= ======= ========= ======== |dave_data_update_events r 1593600.0 40209 |dave_data_update_events_event_id_key i 1912320.0 29271 |dave_data_update_events_event_id_seq S 1.0 1 |dave_data_update_events_lds_idx i 1593600.0 6139 |dave_data_update_events_obj_id_idx i 1593600.0 6139 |iso_pjm_data_update_events_obj_id_idx i 1593600.0 6139 | Note that there are only 1593600 rows in the table, so why the 1912320 figure? Of course I checked that the row count was correct... | |EXPLAIN ANALYZE |select count(*) from iso_pjm_data_update_events | |QUERY PLAN |Aggregate (cost=60129.00..60129.00 rows=1 width=0) (actual time=35933.292..35933.293 rows=1 loops=1) | -> Seq Scan on iso_pjm_data_update_events (cost=0.00..56145.00 rows=1593600 width=0) (actual time=0.213..27919.497 rows=1593600 loops=1) |Total runtime: 35933.489 ms | and... | |select count(*) from iso_pjm_data_update_events | |count |1593600 | so it's not that there are any undeleted rows lying around
David Roussel <pgsql-performance@diroussel.xsmail.com> writes: > |dave_data_update_events r 1593600.0 40209 > |dave_data_update_events_event_id_key i 1912320.0 29271 Hmm ... what PG version is this, and what does VACUUM VERBOSE on that table show? regards, tom lane
On Fri, 22 Apr 2005 10:06:33 -0400, "Tom Lane" <tgl@sss.pgh.pa.us> said: > David Roussel <pgsql-performance@diroussel.xsmail.com> writes: > > |dave_data_update_events r 1593600.0 40209 > > |dave_data_update_events_event_id_key i 1912320.0 29271 > > Hmm ... what PG version is this, and what does VACUUM VERBOSE on > that table show? PG 7.4 The disparity seems to have sorted itself out now, so hampering futher investigations. I guess the regular inserts of new data, and the nightly deletion and index recreation did it. However, we did suffer reduced performance and the strange cardinality for several days before it went away. For what it's worth.. ndb=# vacuum verbose iso_pjm_data_update_events; INFO: vacuuming "public.iso_pjm_data_update_events" INFO: index "iso_pjm_data_update_events_event_id_key" now contains 1912320 row versions in 29271 pages DETAIL: 21969 index pages have been deleted, 20000 are currently reusable. CPU 6.17s/0.88u sec elapsed 32.55 sec. INFO: index "iso_pjm_data_update_events_lds_idx" now contains 1912320 row versions in 7366 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 3.52s/0.57u sec elapsed 14.35 sec. INFO: index "iso_pjm_data_update_events_obj_id_idx" now contains 1912320 row versions in 7366 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 3.57s/0.58u sec elapsed 12.87 sec. INFO: "iso_pjm_data_update_events": found 0 removable, 1912320 nonremovable row versions in 40209 pages DETAIL: 159384 dead row versions cannot be removed yet. There were 745191 unused item pointers. 0 pages are entirely empty. CPU 18.26s/3.62u sec elapsed 74.35 sec. VACUUM After each insert is does this... VACUUM ANALYZE iso_pjm_DATA_UPDATE_EVENTS VACUUM ANALYZE iso_pjm_CONTROL Each night it does this... BEGIN DROP INDEX iso_pjm_control_obj_id_idx DROP INDEX iso_pjm_control_real_name_idx DROP INDEX iso_pjm_data_update_events_lds_idx DROP INDEX iso_pjm_data_update_events_obj_id_idx CREATE UNIQUE INDEX iso_pjm_control_obj_id_idx ON iso_pjm_control(obj_id) CLUSTER iso_pjm_control_obj_id_idx ON iso_pjm_control CREATE UNIQUE INDEX iso_pjm_control_real_name_idx ON iso_pjm_control(real_name) CREATE INDEX iso_pjm_data_update_events_lds_idx ON iso_pjm_data_update_events(lds) CREATE INDEX iso_pjm_data_update_events_obj_id_idx ON iso_pjm_data_update_events(obj_id) COMMIT Note there is no reference to iso_pjm_data_update_events_event_id_key which is the index that went wacky on us. Does that seem weird to you? Thanks David
"David Roussel" <pgsql-performance@diroussel.xsmail.com> writes: > Note there is no reference to iso_pjm_data_update_events_event_id_key > which is the index that went wacky on us. Does that seem weird to you? What that says is that that index doesn't belong to that table. You sure it wasn't a chance coincidence of names that made you think it did? regards, tom lane
You would be interested in http://archives.postgresql.org/pgsql-hackers/2005-04/msg00565.php On Thu, Apr 21, 2005 at 03:33:05PM -0400, Dave Chapeskie wrote: > On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote: > > Michael, > > > > > ....Every five minutes, DBCC INDEXDEFRAG will report to the user an > > > estimated percentage completed. DBCC INDEXDEFRAG can be terminated at > > > any point in the process, and *any completed work is retained.*" > > > > Keen. Sounds like something for our TODO list. > > > > -- > > Josh Berkus > > Aglio Database Solutions > > San Francisco > > See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php > for my thoughts on a non-blocking alternative to REINDEX. I got no > replies to that message. :-( > > > I've almost got a working solution integrated in the backend that does > correct WAL logging and everything. (Writing the code to write and > replay WAL logs for complicated operations can be very annoying!) > > For now I've gone with a syntax of: > > REINDEX INDEX btree_index_name INCREMENTAL; > > (For now it's not a proper index AM (accessor method), instead the > generic index code knows this is only supported for btrees and directly > calls the btree_compress function.) > > It's not actually a REINDEX per-se in that it doesn't rebuild the whole > index. It holds brief exclusive locks on the index while it shuffles > items around to pack the leaf pages fuller. There were issues with the > code I attached to the above message that have been resolved with the > new code. With respect to the numbers provided in that e-mail the new > code also recycles more pages than before. > > Once I've finished it up I'll prepare and post a patch. > > -- > Dave Chapeskie > OpenPGP Key ID: 0x3D2B6B34 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Quoting Bill Chandler <billybobc1210@yahoo.com>: > Running PostgreSQL 7.4.2, Solaris. > Client is reporting that the size of an index is > greater than the number of rows in the table (1.9 > million vs. 1.5 million). Index was automatically > created from a 'bigserial unique' column. > We have been running 'VACUUM ANALYZE' very regularly. > In fact, our vacuum schedule has probably been > overkill. We have been running on a per-table basis > after every update (many per day, only inserts > occurring) and after every purge (one per day, > deleting a day's worth of data). > > What about if an out-of-the-ordinary number of rows > were deleted (say 75% of rows in the table, as opposed > to normal 5%) followed by a 'VACUUM ANALYZE'? Could > things get out of whack because of that situation? I gather you mean, out-of-the-ordinary for most apps, but not for this client? In case nobody else has asked: is your max_fsm_pages big enough to handle all the deleted pages, across ALL tables hit by the purge? If not, you're haemorrhaging pages, and VACUUM is probably warning you about exactly that. If that's not a problem, you might want to consider partitioning the data. Take a look at inherited tables. For me, they're a good approximation of clustered indexes (sigh, miss'em) and equivalent to table spaces. My app is in a similar boat to yours: up to 1/3 of a 10M-row table goes away every day. For each of the child tables that is a candidate to be dropped, there is a big prologue txn, whichs moves (INSERT then DELETE) the good rows into a child table that is NOT to be dropped. Then BANG pull the plug on the tables you don't want. MUCH faster than DELETE: the dropped tables' files' disk space goes away in one shot, too. Just my 2c.