Re: Vacuum Verbose output - Mailing list pgsql-admin

From Robert Treat
Subject Re: Vacuum Verbose output
Date
Msg-id 200511021400.05346.xzilla@users.sourceforge.net
Whole thread Raw
In response to Re: Vacuum Verbose output  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Pre-allocation of space: a business rationale
List pgsql-admin
On Monday 31 October 2005 22:59, Tom Lane wrote:
> Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
> >> Pre-allocating space will prevent extending the datafile during
> >> loading massive data (batch processing) and would improve the overall
> >> batch write performance.
> >
> > Have you got any file system benchmarks that back up this assertion?  I
> > would love to see something that shows one way or the other if that
> > really makes any difference.
>
> Barring some pretty solid evidence, you're unlikely to attract any
> enthusiasm among pghackers for this sort of thing.  We are generally
> disinclined to reinvent functionality that properly belongs to the
> kernel or filesystem layer.  "Oracle does it" cuts no ice in this
> connection, because Oracle is designed around a twenty-year-old
> assumption that the database is smarter than the kernel, and the world
> has changed a lot since then.
>
> In short: show us some numbers that prove this is worth our attention.
>

I'm not terribly excited about the idea, but it might be worth hearing a
better argument. (FWIW I think this is somewhat debunkable too, but it gives
one something to think about)

"PostgreSQL unlike other commercial databases does not allow database files to
pregrow to certain sizes. So if you are loading multiple tables via different
connections there are two things that hurts scalability: One is the semaphore
locking which it needs to perform IO to the database files and second is file
fragmentation since it creates all tables in the same file system and grows
them as needed. So if both the tables are loaded then both files are growing
at "same" time which typically is seralized as blocks are allocated to each
of the file one at a time which means they will be dispersed and not
contiguous. How this hurts? Well if you do total row scans and compare the
time you can easily huge degradations. (I have seen about 50% degradations).
This means you have to load 1 table at a time. However if there was a way to
increase the space for the tables (pre-grown them) then it will be a bit
easier to load multiple tables simultaneously. (Of course the semaphore
problem is still there and that needs to be more granular also). Duh.. I
forgot the workaround here.. TABLESPACES are finally available in PostgreSQL
8. But semaphore problems are still existing and pre-growing files will still
help a lot since "growing" the files will be in your "1" process connection
timeline. "

taken from an interesting post at
http://blogs.sun.com/roller/page/jkshah?anchor=postgres_what_needs_to_be

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

pgsql-admin by date:

Previous
From: Robert Treat
Date:
Subject: Re: Starten Server / SCO OpenServer6 / PostgreSQL 8.0.3
Next
From: "Kevin Grittner"
Date:
Subject: Re: how do you automate database backups?