Re: Vacuum Verbose output - Mailing list pgsql-admin
From | Robert Treat |
---|---|
Subject | Re: Vacuum Verbose output |
Date | |
Msg-id | 200511021400.05346.xzilla@users.sourceforge.net Whole thread Raw |
In response to | Re: Vacuum Verbose output (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Pre-allocation of space: a business rationale
|
List | pgsql-admin |
On Monday 31 October 2005 22:59, Tom Lane wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote: > >> Pre-allocating space will prevent extending the datafile during > >> loading massive data (batch processing) and would improve the overall > >> batch write performance. > > > > Have you got any file system benchmarks that back up this assertion? I > > would love to see something that shows one way or the other if that > > really makes any difference. > > Barring some pretty solid evidence, you're unlikely to attract any > enthusiasm among pghackers for this sort of thing. We are generally > disinclined to reinvent functionality that properly belongs to the > kernel or filesystem layer. "Oracle does it" cuts no ice in this > connection, because Oracle is designed around a twenty-year-old > assumption that the database is smarter than the kernel, and the world > has changed a lot since then. > > In short: show us some numbers that prove this is worth our attention. > I'm not terribly excited about the idea, but it might be worth hearing a better argument. (FWIW I think this is somewhat debunkable too, but it gives one something to think about) "PostgreSQL unlike other commercial databases does not allow database files to pregrow to certain sizes. So if you are loading multiple tables via different connections there are two things that hurts scalability: One is the semaphore locking which it needs to perform IO to the database files and second is file fragmentation since it creates all tables in the same file system and grows them as needed. So if both the tables are loaded then both files are growing at "same" time which typically is seralized as blocks are allocated to each of the file one at a time which means they will be dispersed and not contiguous. How this hurts? Well if you do total row scans and compare the time you can easily huge degradations. (I have seen about 50% degradations). This means you have to load 1 table at a time. However if there was a way to increase the space for the tables (pre-grown them) then it will be a bit easier to load multiple tables simultaneously. (Of course the semaphore problem is still there and that needs to be more granular also). Duh.. I forgot the workaround here.. TABLESPACES are finally available in PostgreSQL 8. But semaphore problems are still existing and pre-growing files will still help a lot since "growing" the files will be in your "1" process connection timeline. " taken from an interesting post at http://blogs.sun.com/roller/page/jkshah?anchor=postgres_what_needs_to_be -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
pgsql-admin by date: