Thread: Vacuum Verbose output

Vacuum Verbose output

From
"Subbiah, Stalin"
Date:

Hi All,

We vacuum busy tables in different frequencies depending on how busy the table is. Today I noticed that one of the task that was suppose to delete rows from a busy table was running for a long time. And below is the output of vacuum verbose after the task was stopped from running. Trying to understand if the table/index has been bloated or not.

INFO:  vacuuming "public.eventlog"
INFO:  index "ix_eventlog_eventtime" now contains 11465395 row versions in 77113 pages
DETAIL:  8737797 index row versions were removed.

Does it mean ix_eventlog_ index had 11.4mil row versions out of which 8.7mil were removed cause of deletes?

33189 index pages have been deleted, 0 are currently reusable.

Does it mean out of 77113 index pages, 33189 were deleted?

CPU 5.24s/13.76u sec elapsed 67.05 sec.

INFO:  "eventlog": removed 8737797 row versions in 361856 pages
DETAIL:  CPU 14.53s/22.00u sec elapsed 122.13 sec.
INFO:  index "ix_eventlog_eventtime" now contains 8023399 row versions in 77116 pages
DETAIL:  3442698 index row versions were removed.
46439 index pages have been deleted, 0 are currently reusable.
CPU 3.18s/5.35u sec elapsed 22.85 sec.
INFO:  "eventlog": removed 3442698 row versions in 143144 pages
DETAIL:  CPU 5.64s/8.74u sec elapsed 69.15 sec.
INFO:  "eventlog": found 12180495 removable, 8022620 nonremovable row versions in 837938 pages
DETAIL:  0 dead row versions cannot be removed yet.

There were 154716 unused item pointers.
0 pages are entirely empty.
CPU 44.37s/54.07u sec elapsed 357.04 sec.

Why does index and table INFO reported twice with different row version numbers on a vacuum run.

Thanks,
Subbiah
Postgres v7.4.7


Re: Vacuum Verbose output

From
Tom Lane
Date:
"Subbiah, Stalin" <SSubbiah@netopia.com> writes:
> INFO:  vacuuming "public.eventlog"
> INFO:  index "ix_eventlog_eventtime" now contains 11465395 row versions
> in 77113 pages
> DETAIL:  8737797 index row versions were removed.

> Does it mean ix_eventlog_ index had 11.4mil row versions out of which
> 8.7mil were removed cause of deletes?

No, it says "now contains", ie those are the post-removal statistics.

> 33189 index pages have been deleted, 0 are currently reusable.

> Does it mean out of 77113 index pages, 33189 were deleted?

Yeah --- so almost half the space in the index is free space (more than
half according to the later report).  Possibly you should be vacuuming
this table a bit more often...

> Why does index and table INFO reported twice with different row version
> numbers on a vacuum run.

Because it had to do two passes of row removal, due to not having enough
vacuum_mem to remember all the dead rows at once.

            regards, tom lane

Re: Vacuum Verbose output

From
"Tomeh, Husam"
Date:
My understanding that the current release of Postgres does not allow the DBA to pre-allocated space for the data files or tablespace in advance (as Oracle or SQL Server does). Am I correct on that ? If that's still the case, is this item on the TODO list or the white board? Pre-allocating space will prevent extending the datafile during loading massive data (batch processing) and would improve the overall batch write performance.
 
Thanks in advance.

--
 Husam

---------------------------(end of broadcast)---------------------------

TIP 2: Don't 'kill -9' the postmaster


**********************************************************************

This message contains confidential information intended only for the use of the addressee(s)

named above and may contain information that is legally privileged.  If you are not the

addressee, or the person responsible for delivering it to the addressee, you are hereby
notified that reading, disseminating, distributing or copying this message is strictly prohibited. 

If you have received this message by mistake, please immediately notify us by replying to the

message and delete the original message immediately thereafter.

 

Thank you.                                                                                                       FADLD Tag
**********************************************************************

Re: Vacuum Verbose output

From
Scott Marlowe
Date:
On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
> My understanding that the current release of Postgres does not allow
> the DBA to pre-allocated space for the data files or tablespace in
> advance (as Oracle or SQL Server does). Am I correct on that ? If
> that's still the case, is this item on the TODO list or the white
> board? Pre-allocating space will prevent extending the datafile during
> loading massive data (batch processing) and would improve the overall
> batch write performance.

I can't see pre-allocation making much if any difference in any modern
OS, except it might lower fragmentation of the file a bit.  Writing data
is writing data.  It takes time, whether you're writing it into a file
that already exists with zeros in it or a new blank file you just
created.

Have you got any file system benchmarks that back up this assertion?  I
would love to see something that shows one way or the other if that
really makes any difference.

Re: Vacuum Verbose output

From
Tom Lane
Date:
Scott Marlowe <smarlowe@g2switchworks.com> writes:
> On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
>> Pre-allocating space will prevent extending the datafile during
>> loading massive data (batch processing) and would improve the overall
>> batch write performance.

> Have you got any file system benchmarks that back up this assertion?  I
> would love to see something that shows one way or the other if that
> really makes any difference.

Barring some pretty solid evidence, you're unlikely to attract any
enthusiasm among pghackers for this sort of thing.  We are generally
disinclined to reinvent functionality that properly belongs to the
kernel or filesystem layer.  "Oracle does it" cuts no ice in this
connection, because Oracle is designed around a twenty-year-old
assumption that the database is smarter than the kernel, and the world
has changed a lot since then.

In short: show us some numbers that prove this is worth our attention.

            regards, tom lane

Re: Vacuum Verbose output

From
Robert Treat
Date:
On Monday 31 October 2005 22:59, Tom Lane wrote:
> Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
> >> Pre-allocating space will prevent extending the datafile during
> >> loading massive data (batch processing) and would improve the overall
> >> batch write performance.
> >
> > Have you got any file system benchmarks that back up this assertion?  I
> > would love to see something that shows one way or the other if that
> > really makes any difference.
>
> Barring some pretty solid evidence, you're unlikely to attract any
> enthusiasm among pghackers for this sort of thing.  We are generally
> disinclined to reinvent functionality that properly belongs to the
> kernel or filesystem layer.  "Oracle does it" cuts no ice in this
> connection, because Oracle is designed around a twenty-year-old
> assumption that the database is smarter than the kernel, and the world
> has changed a lot since then.
>
> In short: show us some numbers that prove this is worth our attention.
>

I'm not terribly excited about the idea, but it might be worth hearing a
better argument. (FWIW I think this is somewhat debunkable too, but it gives
one something to think about)

"PostgreSQL unlike other commercial databases does not allow database files to
pregrow to certain sizes. So if you are loading multiple tables via different
connections there are two things that hurts scalability: One is the semaphore
locking which it needs to perform IO to the database files and second is file
fragmentation since it creates all tables in the same file system and grows
them as needed. So if both the tables are loaded then both files are growing
at "same" time which typically is seralized as blocks are allocated to each
of the file one at a time which means they will be dispersed and not
contiguous. How this hurts? Well if you do total row scans and compare the
time you can easily huge degradations. (I have seen about 50% degradations).
This means you have to load 1 table at a time. However if there was a way to
increase the space for the tables (pre-grown them) then it will be a bit
easier to load multiple tables simultaneously. (Of course the semaphore
problem is still there and that needs to be more granular also). Duh.. I
forgot the workaround here.. TABLESPACES are finally available in PostgreSQL
8. But semaphore problems are still existing and pre-growing files will still
help a lot since "growing" the files will be in your "1" process connection
timeline. "

taken from an interesting post at
http://blogs.sun.com/roller/page/jkshah?anchor=postgres_what_needs_to_be

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Pre-allocation of space: a business rationale

From
"Bath, David"
Date:
There has been a discussion about allowing pre-allocation of space,
and want to highlight the issues from a business perspective, rather
than those affectings DBAs and OS admins.

On Thu, 03 Nov 2005 06:00, Robert Treat wrote:
> On Monday 31 October 2005 22:59, Tom Lane wrote:
> > Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
> > >> Pre-allocating space will prevent extending the datafile during
> > >> loading massive data (batch processing) and would improve the overall
> > >> batch write performance.

My real concern is not performance, but the impact on business continuity
in the following scenario (although I can imagine others of similar import).
(Apologies if there is a well-known way of addressing this issue).

A) I have a tablespace holding data for important business processes.
   (Lets term this "24x7data")
B) I allow a user to have their own data, perhaps some non-critical summaries
   and/or scratchpad/development work.  (Lets term this "luser data")
C) I want to avoid the possibility of uncontrolled growth of luser data
   blowing disk leading to stoppage of 24x7 data.

As I see it, unless I am able to partition them on different disks, which
might be difficult if I've created one logical disk across a cluster, it
is possible that uncontrolled growth of luser data (they might have an
endless loop in buggy code, or just attempt to pig out on data space)
might chew up all available storage, leading to 24x7 processes crashing
when they need to insert more data.

This could happen even if luser and 24x7 stuff live in different databases.

One theoretical way of avoiding this is to allow a "maximum size" argument
to the CREATE TABLESPACE statement, with a preference for preallocation.
This is probably easier to implement than a quota system on either an
object-owner (e.g. CREATE USER) or object  (e.g. CREATE TABLE/INDEX)
basis.

I can't see such parameters for v8.0 statements of these types.

While it might be fine to monitor disk usage, even if you read reports
every day and can get the $ for a new disk quickly, a lazy programmer
or piggy user can blow away your disk in a blink of an eye.

(Aside: I once needed to get error reports from telecoms bearers into
postgresql's uncle (ingres5) and could get about 10,000 errors a
second coming through msgs and sems.  Even without an endless loop bug,
and with creation of an expandable circular buffer for non-volatile
caching, I STILL had to worry about crashing the telecoms monitoring
processes if I pigged out on disk.)

Note that I'm not suggesting the cloning of any particular Oracle feature,
I do believe it is important to have a well-known way of addressing
such scenarios.

--
David T. Bath
dave.bath@unix.net


Re: Pre-allocation of space: a business rationale

From
Tom Lane
Date:
"Bath, David" <dave.bath@unix.net> writes:
> C) I want to avoid the possibility of uncontrolled growth of luser data
>    blowing disk leading to stoppage of 24x7 data.

You put the luser data and the critical data into separate tablespaces
that are in separate partitions (filesystems).  End of problem ...

(And no, I don't believe in having Postgres reinvent filesystem-level
functionality.  If you didn't set up appropriate hard partitions,
consider a loopback filesystem for your tablespace.)

            regards, tom lane

Re: Pre-allocation of space: a business rationale

From
"Jim C. Nasby"
Date:
On Wed, Nov 02, 2005 at 10:48:00PM -0500, Tom Lane wrote:
> "Bath, David" <dave.bath@unix.net> writes:
> > C) I want to avoid the possibility of uncontrolled growth of luser data
> >    blowing disk leading to stoppage of 24x7 data.
>
> You put the luser data and the critical data into separate tablespaces
> that are in separate partitions (filesystems).  End of problem ...
>
> (And no, I don't believe in having Postgres reinvent filesystem-level
> functionality.  If you didn't set up appropriate hard partitions,
> consider a loopback filesystem for your tablespace.)

Does every OS we support have a loopback filesystem? Can they all impose
space limits?

It doesn't seem unreasonable to support a limit on tablespace (or table)
size. It also doesn't seem like it would take that much code to add
support for it. Of course usual disclaimer about 'submit a patch then'
applies, but it sounds like such a patch would get rejected out-of-hand.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461