Thread: Performance issues with compaq server
I tried to load data using insert statement to our new database server (Compaq DL580, DL590, DL380 using SCSI driver) but it was very very slow. The database runs on Red Hat Linux 7.2 and PostgreSQL version 7.1.3
Does anybody know any idea to what cause this problem ? or things that I need to look or check (in the config. file)
I appreciate any help.
Regards,
Samuel
One thing you can try is to use "-F" switch which will turn off fsync. $ man postmaster There are other possibilities however On Tue, May 07, 2002 at 06:40:32PM -0400, Samuel J. Sutjiono wrote: > I tried to load data using insert statement to our new database server (Compaq DL580, DL590, DL380 using SCSI driver) butit was very very slow. The database runs on Red Hat Linux 7.2 and PostgreSQL version 7.1.3 > > Does anybody know any idea to what cause this problem ? or things that I need to look or check (in the config. file) > > I appreciate any help. > > Regards, > Samuel
On Tue, May 07, 2002 at 06:40:32PM -0400, Samuel J. Sutjiono wrote: > I tried to load data using insert statement to our new database server > (Compaq DL580, DL590, DL380 using SCSI driver) but it was very very slow. > The database runs on Red Hat Linux 7.2 and PostgreSQL version 7.1.3 > > Does anybody know any idea to what cause this problem ? or things that I > need to look or check (in the config. file) Put them in a trasaction (begin/commit). Without that, each insert becomes it's own transaction which is rather expensive. Postgresql 7.2 improves this a bit but the transaction will help anyway. HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
On Wed, 8 May 2002, Martijn van Oosterhout wrote: > On Tue, May 07, 2002 at 06:40:32PM -0400, Samuel J. Sutjiono wrote: > > > I tried to load data using insert statement to our new database server > > (Compaq DL580, DL590, DL380 using SCSI driver) but it was very very slow. > > The database runs on Red Hat Linux 7.2 and PostgreSQL version 7.1.3 > > > > Does anybody know any idea to what cause this problem ? or things that I > > need to look or check (in the config. file) > > Put them in a trasaction (begin/commit). Without that, each insert becomes > it's own transaction which is rather expensive. Postgresql 7.2 improves this > a bit but the transaction will help anyway. ACK. On a given hardware I get about 150 inserts per second. Using a begin/end transaction for a group of 100 inserts speeds it up to about 450 inserts per second. But beware: if one insert fails (duplicate key, faulty data) then you have to re-insert the remaining rows as single transactions, else all rows of the previous transaction are discarded. -- PGP/GPG Key-ID: http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1
Holger Marzen <holger@marzen.de> writes: > ACK. On a given hardware I get about 150 inserts per second. Using a > begin/end transaction for a group of 100 inserts speeds it up to about > 450 inserts per second. COPY is even faster as there is less query parsing to be done, plus you get a transaction per COPY statement even without BEGIN/END. > But beware: if one insert fails (duplicate key, faulty data) then you > have to re-insert the remaining rows as single transactions, else all > rows of the previous transaction are discarded. Hmm don't you have to ROLLBACK and redo the whole transaction without the offending row(s), since you can't commit while in ABORT state? Or am I misunderstanding? -Doug
On Wed, May 08, 2002 at 11:02:35AM -0400, Doug McNaught wrote: > Holger Marzen <holger@marzen.de> writes: > > > ACK. On a given hardware I get about 150 inserts per second. Using a > > begin/end transaction for a group of 100 inserts speeds it up to about > > 450 inserts per second. > > COPY is even faster as there is less query parsing to be done, plus > you get a transaction per COPY statement even without BEGIN/END. Does anyone have any performance figures to hand on COPY vs BEGIN/INSERT/END, on indexed or unindexed tables? Cheers, Steve
On 8 May 2002, Doug McNaught wrote: > Holger Marzen <holger@marzen.de> writes: > > > ACK. On a given hardware I get about 150 inserts per second. Using a > > begin/end transaction for a group of 100 inserts speeds it up to about > > 450 inserts per second. > > COPY is even faster as there is less query parsing to be done, plus > you get a transaction per COPY statement even without BEGIN/END. Yes, but I wanted to change something in some rows, so I used perl and insert. > > But beware: if one insert fails (duplicate key, faulty data) then you > > have to re-insert the remaining rows as single transactions, else all > > rows of the previous transaction are discarded. > > Hmm don't you have to ROLLBACK and redo the whole transaction without > the offending row(s), since you can't commit while in ABORT state? Or > am I misunderstanding? Postgres complains and doesn't accept the following inserts after a failed one until end of transaction. I didn't have the time yet to figure out if it rolls back the preceeding inserts. Is there a rule in SQL standards that describes what should happen if some statemens in a transaction fail and the program issues a commit? -- PGP/GPG Key-ID: http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1
Holger Marzen <holger@marzen.de> writes: > On 8 May 2002, Doug McNaught wrote: > > > Hmm don't you have to ROLLBACK and redo the whole transaction without > > the offending row(s), since you can't commit while in ABORT state? Or > > am I misunderstanding? > > Postgres complains and doesn't accept the following inserts after a > failed one until end of transaction. I didn't have the time yet to > figure out if it rolls back the preceeding inserts. I'm pretty sure that the transaction goes into ABORT state and all you can do is rollback and start over, minus the offending row(s). > Is there a rule in SQL standards that describes what should happen if > some statemens in a transaction fail and the program issues a commit? I think PG's is standard behavior; that's kind of the whole point of having transactions. -Doug
On Wed, 8 May 2002, Holger Marzen wrote: > > COPY is even faster as there is less query parsing to be done, plus > > you get a transaction per COPY statement even without BEGIN/END. > > Yes, but I wanted to change something in some rows, so I used perl and > insert. Why not use Perl and COPY? cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
On 9 May 2002, Doug McNaught wrote: > Holger Marzen <holger@marzen.de> writes: > > > Is there a rule in SQL standards that describes what should happen if > > some statemens in a transaction fail and the program issues a commit? > > I think PG's is standard behavior; that's kind of the whole point of > having transactions. - rolling back the whole transaction if you do a COMMIT or - keeping the changes until the first failing statement and ignoring everything that follows if you do a COMMIT ? -- PGP/GPG Key-ID: http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1
On Thu, 9 May 2002, Curt Sampson wrote: > On Wed, 8 May 2002, Holger Marzen wrote: > > > > COPY is even faster as there is less query parsing to be done, plus > > > you get a transaction per COPY statement even without BEGIN/END. > > > > Yes, but I wanted to change something in some rows, so I used perl and > > insert. > > Why not use Perl and COPY? A good idea. I'll try it if the customer complains about the speed. -- PGP/GPG Key-ID: http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1
On Thu, 2002-05-09 at 12:23, Holger Marzen wrote: > On 9 May 2002, Doug McNaught wrote: > > > Holger Marzen <holger@marzen.de> writes: > > > > > Is there a rule in SQL standards that describes what should happen if > > > some statemens in a transaction fail and the program issues a commit? > > > > I think PG's is standard behavior; that's kind of the whole point of > > having transactions. > > - rolling back the whole transaction if you do a COMMIT > or > - keeping the changes until the first failing statement and ignoring > everything that follows if you do a COMMIT > I can't speak to the phrasing of the standard, but PG behaviour is: Everything in the transaction is ignored. All the inserts made before the error, since the BEGIN, are rolled back. All subsequent inserts are ignored (generating an error that transaction is in ABORT state). The subsequent COMMIT will end the transaction (and thus clear the abort state) but *will not commit anything*. This means that scripts will work as intended -namely that all or none of the commands within a transaction block will succeed[*]. So if you split your insert into blocks, and one block fails, you will only need to correct that block and try to load it again. Regards John [*] This is handy for doing things like reindexing or drop/recreate table in a script. The drop table stands a good chance of succeeding, whereas the prospect of a syntax error in the create table makes it more likely to fail. A subsequent commit of the DROP TABLE would not produce the end result you intended.... -- John Gray Azuli IT www.azuli.co.uk
The Compaq servers you indicated use the smart array controller which is very very slow in storing data into hard disks. Are you using the RAID? I suggest you to buy a controller with a larger write cache (the smart array controller does not have w-cache)... You should note a large performance progress... Also do not forget to set fsync to false and to use the COPY command whenever is possible... Bye! -- Doct. Eng. Denis Gasparin: denis@edistar.com --------------------------- Programmer & System Administrator - Edistar srl Il mer, 2002-05-08 alle 00:40, Samuel J. Sutjiono ha scritto: > I tried to load data using insert statement to our new database server (Compaq DL580, DL590, DL380 using SCSI driver) butit was very very slow. The database runs on Red Hat Linux 7.2 and PostgreSQL version 7.1.3 > > Does anybody know any idea to what cause this problem ? or things that I need to look or check (in the config. file) > > I appreciate any help. > > Regards, > Samuel
On 9 May 2002, Denis Gasparin wrote: > The Compaq servers you indicated use the smart array controller which is > very very slow in storing data into hard disks. Are you using the RAID? > > I suggest you to buy a controller with a larger write cache (the smart > array controller does not have w-cache)... You should note a large > performance progress... And take your transaction log off of RAID entirely and put it on a mirrored pair of disks instead. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Good day, Does anyone have any experience in increasing the amount of shared memory buffers available to Mac OS X/Darwin? When starting with more than 128 backend connections on a test G3 I encounter: ----------------------------------------------------------------------------------- IpcMemoryCreate: shmget(key=5432001, size=33357824, 03600) failed: Invalid argument This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. ----------------------------------------------------------------------------------- However, I can find no helpful information looking around for how Darwin handles this configuration. Hopefully it does not require a recompile of the Kernel. ;) Any knowledge on this topic would be greatly appreciated. Best Regards, Jw. -- jlx@commandprompt.com by way of pgsql-general@commandprompt.com http://www.postgresql.info/ http://www.commandprompt.com/
On 5/20/02 9:25 PM, "Command Prompt, Inc." <pgsql-general@commandprompt.com> wrote: > Good day, > > Does anyone have any experience in increasing the amount of shared memory > buffers available to Mac OS X/Darwin? When starting with more than 128 > backend connections on a test G3 I encounter: > > ------------------------------------------------------------------------------ > ----- > IpcMemoryCreate: shmget(key=5432001, size=33357824, 03600) failed: Invalid > argument > > This error usually means that PostgreSQL's request for a shared memory > segment exceeded your kernel's SHMMAX parameter. You can either > reduce the request size or reconfigure the kernel with larger SHMMAX. > ------------------------------------------------------------------------------ > ----- > > However, I can find no helpful information looking around for how Darwin > handles this configuration. Hopefully it does not require a recompile of > the Kernel. ;) Actually --- :-> I'm no expert on this topic, but Darwin is BSD and I do recall a post on this very topic, to the effect that the poster believed some flavors of BSD, possibly MOSX included, do require rebuilding the kernel to change this value. As a MOSX user I'd certainly love to be found wrong. -- sgl ======================================================= Steve Lane Vice President Chris Moyer Consulting, Inc. 833 West Chicago Ave Suite 203 Voice: (312) 433-2421 Email: slane@fmpro.com Fax: (312) 850-3930 Web: http://www.fmpro.com =======================================================
"Command Prompt, Inc." <pgsql-general@commandprompt.com> writes: > Does anyone have any experience in increasing the amount of shared memory > buffers available to Mac OS X/Darwin? A little bit of googling turns up Apple bug reports stating that OSX has a hardwired limit on shared memory --- so no, short of modifying and recompiling the kernel, you're outta luck. regards, tom lane
On Mon, 20 May 2002, Steve Lane wrote: >Actually --- :-> I'm no expert on this topic, but Darwin is BSD and I do >recall a post on this very topic, to the effect that the poster believed >some flavors of BSD, possibly MOSX included, do require rebuilding the >kernel to change this value. > >As a MOSX user I'd certainly love to be found wrong. Well, after doing some digging in the OS X Darwin Developer archives, it definitely appears that at this stage in Darwin's design it DOES require a re-build of the Kernel -- thanks to those of you who pointed me in that direction. ;) Fortunately it's a fairly painless process I've found; you just need to sign up with the Apple Open Source site, and you have instant CVS access. I was able to pull down the source for the Kernel, and increased the SHMMAX by a factor of 16. Upgraded to the new Kernel and I am now able to get safely 512 connections, even up around 900, but bumping it up to 1024, I run into the following error: PGSTATBUFF: recvfrom(2): Resource temporarily unavailable DEBUG: statistics collector process (pid 1988) exited with exit code 1 ...which then repeats itself infinitely until the calling process is stopped. ;) I've upped the limit on the number of open files with ulimit, but I was wondering if anyone knows what specifically invokes this "resource temporarily unavailable" message from PGSTATBUFF? I don't see it in the source casually grepping around, so is it I take it the result of the recvfrom() function? [...actually, a minute later of looking at the source...] Okay, so it appears to be when it's checking sockets for data in src/backend/postmaster/pgstat.c, line 1579...I just tried re-building the kernel with a higher value for kern.ipc.maxsockets but that didn't have any effect, still get that error. Anyone have any ideas on what specifically would cause this part of pgstat to fail? Best regards, Jw. -- jlx@commandprompt.com by way of pgsql-general@commandprompt.com http://www.postgresql.info/ http://www.commandprompt.com/
Command Prompt, Inc. sez: [...] } [...actually, a minute later of looking at the source...] } } Okay, so it appears to be when it's checking sockets for data in } src/backend/postmaster/pgstat.c, line 1579...I just tried re-building the } kernel with a higher value for kern.ipc.maxsockets but that didn't have } any effect, still get that error. } } Anyone have any ideas on what specifically would cause this part of pgstat } to fail? Recompiling the kernel was necessary to change the shared memory limit, but kern.ipc.maxsockets can be changed on the fly with sysctl. It might be getting changed in one of the scripts (perhaps SystemTuning) in /System/Library/StartupItems (there are also sometimes scripts in /Library/StartupItems; both directories are used in the startup process). When your system is up and running, try sysctl kern.ipc.maxsockets and see if the value is what you expected. If not, dig around in the StartupItems directories and see if something is changing it. } Best regards, } Jw. --Greg
"Command Prompt, Inc." <pgsql-general@commandprompt.com> writes: > I was able to pull down the source for the Kernel, and increased the > SHMMAX by a factor of 16. Upgraded to the new Kernel and I am now able to > get safely 512 connections, even up around 900, but bumping it up to 1024, > I run into the following error: > PGSTATBUFF: recvfrom(2): Resource temporarily unavailable > DEBUG: statistics collector process (pid 1988) exited with exit code 1 > ...which then repeats itself infinitely until the calling process is > stopped. ;) [ scratches head... ] "Resource temporarily unavailable" is EAGAIN according to /usr/include/sys/errno.h on my OSX machine. But the man page for recvfrom doesn't mention any plausible reasons for EAGAIN to be signaled. select() just told us there was data available on the socket, so WTF? Could this be a kernel bug? You could try modifying pgstat.c to continue its loop rather than exiting after it gets a recvfrom error. But if the error condition recurs that'll just put pgstat.c into an infinite loop, so I'm not sure this is any solution --- just a way of gathering more data. regards, tom lane
On Wed, 22 May 2002, Gregory Seidman wrote: >Command Prompt, Inc. sez: >[...] >}Okay, so it appears to be when it's checking sockets for data in >}src/backend/postmaster/pgstat.c, line 1579...I just tried re-building the >}kernel with a higher value for kern.ipc.maxsockets but that didn't have >}any effect, still get that error. >} >}Anyone have any ideas on what specifically would cause this part of pgstat >}to fail? >Recompiling the kernel was necessary to change the shared memory limit, but >kern.ipc.maxsockets can be changed on the fly with sysctl. Hm. I tried that, with "sysctl -w kern.ipcmaxsockets=2048", and got: sysctl: oid 'kern.ipc.maxsockets' is read only >It might be getting changed in one of the scripts (perhaps SystemTuning) in >/System/Library/StartupItems (there are also sometimes scripts in >/Library/StartupItems; both directories are used in the startup process). >When your system is up and running, try sysctl kern.ipc.maxsockets and see >if the value is what you expected. If not, dig around in the StartupItems >directories and see if something is changing it. After recompile it is actually the value I expected. It appears that the problem is something less clear. :/ Jw. -- jlx@commandprompt.com by way of pgsql-general@commandprompt.com http://www.postgresql.info/ http://www.commandprompt.com/
On Wed, 22 May 2002, Tom Lane wrote: > "Command Prompt, Inc." <pgsql-general@commandprompt.com> writes: > > I was able to pull down the source for the Kernel, and increased the > > SHMMAX by a factor of 16. Upgraded to the new Kernel and I am now able to > > get safely 512 connections, even up around 900, but bumping it up to 1024, > > I run into the following error: > > > PGSTATBUFF: recvfrom(2): Resource temporarily unavailable > > DEBUG: statistics collector process (pid 1988) exited with exit code 1 > > > ...which then repeats itself infinitely until the calling process is > > stopped. ;) > > [ scratches head... ] "Resource temporarily unavailable" is EAGAIN > according to /usr/include/sys/errno.h on my OSX machine. But the man > page for recvfrom doesn't mention any plausible reasons for EAGAIN to > be signaled. select() just told us there was data available on the > socket, so WTF? Could this be a kernel bug? > > You could try modifying pgstat.c to continue its loop rather than > exiting after it gets a recvfrom error. But if the error condition > recurs that'll just put pgstat.c into an infinite loop, so I'm not > sure this is any solution --- just a way of gathering more data. My 'standard' way of doing this is to loop while the error is EAGAIN but with a small retry counter to limit the number of attempts made. Hopefully the second time through the call isn't interrupted again. Therefore I'm usually quite happy saying that after something like 5 calls to a routine then there really is an error. Isn't this how people normally handle this? -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > My 'standard' way of doing this is to loop while the error is EAGAIN > but with a small retry counter to limit the number of attempts > made. Hopefully the second time through the call isn't interrupted > again. I think you're thinking of EINTR, not EAGAIN. regards, tom lane
Hello, I have a table with about 15-20 VARCHAR items. I use VARCHAR on the assumption that it is more space efficient and not too much slower than CHAR, especially when you expect many of those VARCHAR to be null or empty strings. Is there a performance difference with VARCHAR elements of value NULL and zero-length string? If so, can I assume that NULL is the better performer, and thus re-code my INSERTS to replace empty strings with NULLs (or vice versa)? Now that I've had to re-write all my data to substring it to fit in the VARCHARs, one more "thing" won't matter much. Many Thanks, Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > Is there a performance difference with VARCHAR elements of value NULL and > zero-length string? These are not semantically equivalent (if you think they are, you've been using Oracle too long). You will almost certainly regret it if you try recoding your app to make them equivalent. An example: regression=# select 'foo'::varchar || ''::varchar; ?column? ---------- foo (1 row) regression=# select 'foo'::varchar || null::varchar; ?column? ---------- (1 row) But to answer your question, NULLs might save a couple bytes if there are multiple NULLs per row. I think that replacing a single empty varchar with a NULL would net out to no change (you lose the 4-byte varchar length word, but have to store a null-values bitmap instead), depending on alignment issues and how many columns there are in the table. regards, tom lane
At 02:54 PM 5/24/2002, you wrote: >Doug Fields <dfields-pg-general@pexicom.com> writes: > > Is there a performance difference with VARCHAR elements of value NULL and > > zero-length string? > >These are not semantically equivalent (if you think they are, you've >been using Oracle too long). You will almost certainly regret it >if you try recoding your app to make them equivalent. An example: Sorry, I haven't used Oracle in quite some time so I don't have any bad Oracle-isms. :) Thanks, Tom. Yes, I am aware of this already and my application largely doesn't care (i.e. in the places it does it asks = '' as well as OR IS NULL, but in most places it doesn't bother because it doesn't matter). I'm merely trying to wring the last ounce of performance from the database, especially given that most of these columns are rarely searched and certainly not indexed. >But to answer your question, NULLs might save a couple bytes if there >are multiple NULLs per row. I think that replacing a single empty >varchar with a NULL would net out to no change (you lose the 4-byte >varchar length word, but have to store a null-values bitmap instead), >depending on alignment issues and how many columns there are in the >table. So, in essence, having a zero-length VARCHAR requires no additional page loads (from the VARCHAR heap, if such a thing is used) and does not increase the record size over having NULLs? My intent is to make this query as fast as possible and require as few disk hits: SELECT * FROM table with this many VARCHAR'd table. Thanks, Doug