Thread: Vacuum Verbose output
Hi All,
We vacuum busy tables in different frequencies depending on how busy the table is. Today I noticed that one of the task that was suppose to delete rows from a busy table was running for a long time. And below is the output of vacuum verbose after the task was stopped from running. Trying to understand if the table/index has been bloated or not.
INFO: vacuuming "public.eventlog"
INFO: index "ix_eventlog_eventtime" now contains 11465395 row versions in 77113 pages
DETAIL: 8737797 index row versions were removed.
Does it mean ix_eventlog_ index had 11.4mil row versions out of which 8.7mil were removed cause of deletes?
33189 index pages have been deleted, 0 are currently reusable.
Does it mean out of 77113 index pages, 33189 were deleted?
CPU 5.24s/13.76u sec elapsed 67.05 sec.
INFO: "eventlog": removed 8737797 row versions in 361856 pages
DETAIL: CPU 14.53s/22.00u sec elapsed 122.13 sec.
INFO: index "ix_eventlog_eventtime" now contains 8023399 row versions in 77116 pages
DETAIL: 3442698 index row versions were removed.
46439 index pages have been deleted, 0 are currently reusable.
CPU 3.18s/5.35u sec elapsed 22.85 sec.
INFO: "eventlog": removed 3442698 row versions in 143144 pages
DETAIL: CPU 5.64s/8.74u sec elapsed 69.15 sec.
INFO: "eventlog": found 12180495 removable, 8022620 nonremovable row versions in 837938 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 154716 unused item pointers.
0 pages are entirely empty.
CPU 44.37s/54.07u sec elapsed 357.04 sec.
Why does index and table INFO reported twice with different row version numbers on a vacuum run.
Thanks,
Subbiah
Postgres v7.4.7
"Subbiah, Stalin" <SSubbiah@netopia.com> writes: > INFO: vacuuming "public.eventlog" > INFO: index "ix_eventlog_eventtime" now contains 11465395 row versions > in 77113 pages > DETAIL: 8737797 index row versions were removed. > Does it mean ix_eventlog_ index had 11.4mil row versions out of which > 8.7mil were removed cause of deletes? No, it says "now contains", ie those are the post-removal statistics. > 33189 index pages have been deleted, 0 are currently reusable. > Does it mean out of 77113 index pages, 33189 were deleted? Yeah --- so almost half the space in the index is free space (more than half according to the later report). Possibly you should be vacuuming this table a bit more often... > Why does index and table INFO reported twice with different row version > numbers on a vacuum run. Because it had to do two passes of row removal, due to not having enough vacuum_mem to remember all the dead rows at once. regards, tom lane
--
Husam
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
**********************************************************************
This message contains confidential information intended only for the use of the addressee(s)
named above and may contain information that is legally privileged. If you are not the
addressee, or the person responsible for delivering it to the addressee, you are hereby
notified that reading, disseminating, distributing or copying this message is strictly prohibited.
If you have received this message by mistake, please immediately notify us by replying to the
message and delete the original message immediately thereafter.
Thank you. FADLD Tag
**********************************************************************
On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote: > My understanding that the current release of Postgres does not allow > the DBA to pre-allocated space for the data files or tablespace in > advance (as Oracle or SQL Server does). Am I correct on that ? If > that's still the case, is this item on the TODO list or the white > board? Pre-allocating space will prevent extending the datafile during > loading massive data (batch processing) and would improve the overall > batch write performance. I can't see pre-allocation making much if any difference in any modern OS, except it might lower fragmentation of the file a bit. Writing data is writing data. It takes time, whether you're writing it into a file that already exists with zeros in it or a new blank file you just created. Have you got any file system benchmarks that back up this assertion? I would love to see something that shows one way or the other if that really makes any difference.
Scott Marlowe <smarlowe@g2switchworks.com> writes: > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote: >> Pre-allocating space will prevent extending the datafile during >> loading massive data (batch processing) and would improve the overall >> batch write performance. > Have you got any file system benchmarks that back up this assertion? I > would love to see something that shows one way or the other if that > really makes any difference. Barring some pretty solid evidence, you're unlikely to attract any enthusiasm among pghackers for this sort of thing. We are generally disinclined to reinvent functionality that properly belongs to the kernel or filesystem layer. "Oracle does it" cuts no ice in this connection, because Oracle is designed around a twenty-year-old assumption that the database is smarter than the kernel, and the world has changed a lot since then. In short: show us some numbers that prove this is worth our attention. regards, tom lane
On Monday 31 October 2005 22:59, Tom Lane wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote: > >> Pre-allocating space will prevent extending the datafile during > >> loading massive data (batch processing) and would improve the overall > >> batch write performance. > > > > Have you got any file system benchmarks that back up this assertion? I > > would love to see something that shows one way or the other if that > > really makes any difference. > > Barring some pretty solid evidence, you're unlikely to attract any > enthusiasm among pghackers for this sort of thing. We are generally > disinclined to reinvent functionality that properly belongs to the > kernel or filesystem layer. "Oracle does it" cuts no ice in this > connection, because Oracle is designed around a twenty-year-old > assumption that the database is smarter than the kernel, and the world > has changed a lot since then. > > In short: show us some numbers that prove this is worth our attention. > I'm not terribly excited about the idea, but it might be worth hearing a better argument. (FWIW I think this is somewhat debunkable too, but it gives one something to think about) "PostgreSQL unlike other commercial databases does not allow database files to pregrow to certain sizes. So if you are loading multiple tables via different connections there are two things that hurts scalability: One is the semaphore locking which it needs to perform IO to the database files and second is file fragmentation since it creates all tables in the same file system and grows them as needed. So if both the tables are loaded then both files are growing at "same" time which typically is seralized as blocks are allocated to each of the file one at a time which means they will be dispersed and not contiguous. How this hurts? Well if you do total row scans and compare the time you can easily huge degradations. (I have seen about 50% degradations). This means you have to load 1 table at a time. However if there was a way to increase the space for the tables (pre-grown them) then it will be a bit easier to load multiple tables simultaneously. (Of course the semaphore problem is still there and that needs to be more granular also). Duh.. I forgot the workaround here.. TABLESPACES are finally available in PostgreSQL 8. But semaphore problems are still existing and pre-growing files will still help a lot since "growing" the files will be in your "1" process connection timeline. " taken from an interesting post at http://blogs.sun.com/roller/page/jkshah?anchor=postgres_what_needs_to_be -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
There has been a discussion about allowing pre-allocation of space, and want to highlight the issues from a business perspective, rather than those affectings DBAs and OS admins. On Thu, 03 Nov 2005 06:00, Robert Treat wrote: > On Monday 31 October 2005 22:59, Tom Lane wrote: > > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote: > > >> Pre-allocating space will prevent extending the datafile during > > >> loading massive data (batch processing) and would improve the overall > > >> batch write performance. My real concern is not performance, but the impact on business continuity in the following scenario (although I can imagine others of similar import). (Apologies if there is a well-known way of addressing this issue). A) I have a tablespace holding data for important business processes. (Lets term this "24x7data") B) I allow a user to have their own data, perhaps some non-critical summaries and/or scratchpad/development work. (Lets term this "luser data") C) I want to avoid the possibility of uncontrolled growth of luser data blowing disk leading to stoppage of 24x7 data. As I see it, unless I am able to partition them on different disks, which might be difficult if I've created one logical disk across a cluster, it is possible that uncontrolled growth of luser data (they might have an endless loop in buggy code, or just attempt to pig out on data space) might chew up all available storage, leading to 24x7 processes crashing when they need to insert more data. This could happen even if luser and 24x7 stuff live in different databases. One theoretical way of avoiding this is to allow a "maximum size" argument to the CREATE TABLESPACE statement, with a preference for preallocation. This is probably easier to implement than a quota system on either an object-owner (e.g. CREATE USER) or object (e.g. CREATE TABLE/INDEX) basis. I can't see such parameters for v8.0 statements of these types. While it might be fine to monitor disk usage, even if you read reports every day and can get the $ for a new disk quickly, a lazy programmer or piggy user can blow away your disk in a blink of an eye. (Aside: I once needed to get error reports from telecoms bearers into postgresql's uncle (ingres5) and could get about 10,000 errors a second coming through msgs and sems. Even without an endless loop bug, and with creation of an expandable circular buffer for non-volatile caching, I STILL had to worry about crashing the telecoms monitoring processes if I pigged out on disk.) Note that I'm not suggesting the cloning of any particular Oracle feature, I do believe it is important to have a well-known way of addressing such scenarios. -- David T. Bath dave.bath@unix.net
"Bath, David" <dave.bath@unix.net> writes: > C) I want to avoid the possibility of uncontrolled growth of luser data > blowing disk leading to stoppage of 24x7 data. You put the luser data and the critical data into separate tablespaces that are in separate partitions (filesystems). End of problem ... (And no, I don't believe in having Postgres reinvent filesystem-level functionality. If you didn't set up appropriate hard partitions, consider a loopback filesystem for your tablespace.) regards, tom lane
On Wed, Nov 02, 2005 at 10:48:00PM -0500, Tom Lane wrote: > "Bath, David" <dave.bath@unix.net> writes: > > C) I want to avoid the possibility of uncontrolled growth of luser data > > blowing disk leading to stoppage of 24x7 data. > > You put the luser data and the critical data into separate tablespaces > that are in separate partitions (filesystems). End of problem ... > > (And no, I don't believe in having Postgres reinvent filesystem-level > functionality. If you didn't set up appropriate hard partitions, > consider a loopback filesystem for your tablespace.) Does every OS we support have a loopback filesystem? Can they all impose space limits? It doesn't seem unreasonable to support a limit on tablespace (or table) size. It also doesn't seem like it would take that much code to add support for it. Of course usual disclaimer about 'submit a patch then' applies, but it sounds like such a patch would get rejected out-of-hand. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461