Thread: RE : RE: Postgresql vs SQLserver for this application ?
Unfortunately.
But we are in the the process to choose Postgresql with pgcluster. I'm currently running some tests (performance, stability...)
Save the money on the license fees, you get it for your hardware ;-)
I still welcome any advices or comments and I'll let you know how the project is going on.
Benjamin.
"Mohan, Ross" <RMohan@arbinet.com> 05/04/2005 20:48 | Pour : <bsimon@loxane.com> cc : Objet : RE: [PERFORM] Postgresql vs SQLserver for this application ? |
You never got answers on this? Apologies, I don't have one, but'd be curious to hear about any you did get....
thx
Ross
-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of bsimon@loxane.com
Sent: Monday, April 04, 2005 4:02 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Postgresql vs SQLserver for this application ?
hi all.
We are designing a quite big application that requires a high-performance database backend.
The rates we need to obtain are at least 5000 inserts per second and 15 selects per second for one connection. There should only be 3 or 4 simultaneous connections.
I think our main concern is to deal with the constant flow of data coming from the inserts that must be available for selection as fast as possible. (kind of real time access ...)
As a consequence, the database should rapidly increase up to more than one hundred gigs. We still have to determine how and when we shoud backup old data to prevent the application from a performance drop. We intend to develop some kind of real-time partionning on our main table keep the flows up.
At first, we were planning to use SQL Server as it has features that in my opinion could help us a lot :
- replication
- clustering
Recently we started to study Postgresql as a solution for our project :
- it also has replication
- Postgis module can handle geographic datatypes (which would facilitate our developments)
- We do have a strong knowledge on Postgresql administration (we use it for production processes)
- it is free (!) and we could save money for hardware purchase.
Is SQL server clustering a real asset ? How reliable are Postgresql replication tools ? Should I trust Postgresql performance for this kind of needs ?
My question is a bit fuzzy but any advices are most welcome... hardware,tuning or design tips as well :))
Thanks a lot.
Benjamin.
I think everyone was scared off by the 5000 inserts per second number. I've never seen even Oracle do this on a top end Dell system with copious SCSI attached storage. Alex Turner netEconomist On Apr 6, 2005 3:17 AM, bsimon@loxane.com <bsimon@loxane.com> wrote: > > Unfortunately. > > But we are in the the process to choose Postgresql with pgcluster. I'm > currently running some tests (performance, stability...) > Save the money on the license fees, you get it for your hardware ;-) > > I still welcome any advices or comments and I'll let you know how the > project is going on. > > Benjamin. > > > > "Mohan, Ross" <RMohan@arbinet.com> > > 05/04/2005 20:48 > > Pour : <bsimon@loxane.com> > cc : > Objet : RE: [PERFORM] Postgresql vs SQLserver for this > application ? > > > You never got answers on this? Apologies, I don't have one, but'd be curious > to hear about any you did get.... > > thx > > Ross > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf > Of bsimon@loxane.com > Sent: Monday, April 04, 2005 4:02 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Postgresql vs SQLserver for this application ? > > > hi all. > > We are designing a quite big application that requires a high-performance > database backend. > The rates we need to obtain are at least 5000 inserts per second and 15 > selects per second for one connection. There should only be 3 or 4 > simultaneous connections. > I think our main concern is to deal with the constant flow of data coming > from the inserts that must be available for selection as fast as possible. > (kind of real time access ...) > > As a consequence, the database should rapidly increase up to more than one > hundred gigs. We still have to determine how and when we shoud backup old > data to prevent the application from a performance drop. We intend to > develop some kind of real-time partionning on our main table keep the flows > up. > > At first, we were planning to use SQL Server as it has features that in my > opinion could help us a lot : > - replication > - clustering > > Recently we started to study Postgresql as a solution for our project : > - it also has replication > - Postgis module can handle geographic datatypes (which would > facilitate our developments) > - We do have a strong knowledge on Postgresql administration (we use > it for production processes) > - it is free (!) and we could save money for hardware purchase. > > Is SQL server clustering a real asset ? How reliable are Postgresql > replication tools ? Should I trust Postgresql performance for this kind of > needs ? > > My question is a bit fuzzy but any advices are most welcome... > hardware,tuning or design tips as well :)) > > Thanks a lot. > > Benjamin. > > >
I wish I had a Dell system and run case to show you Alex, but I don't... however...using Oracle's "direct path" feature, it's pretty straightforward. We've done 110,000 rows per second into index-less tables on a big system (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained for almost 9 minutes. ) Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad feature enabled us to migrate a 4 TB database...really quickly. Now...if you ask me "can this work without Power5 and Hitachi SAN?" my answer is..you give me a top end Dell and SCSI III on 15K disks and I'll likely easily match it, yea. I'd love to see PG get into this range..i am a big fan of PG (just a rank newbie) but I gotta think the underlying code to do this has to be not-too-complex..... Best, Ross -----Original Message----- From: Alex Turner [mailto:armtuk@gmail.com] Sent: Wednesday, April 06, 2005 11:38 AM To: bsimon@loxane.com Cc: pgsql-performance@postgresql.org; Mohan, Ross Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ? I think everyone was scared off by the 5000 inserts per second number. I've never seen even Oracle do this on a top end Dell system with copious SCSI attached storage. Alex Turner netEconomist On Apr 6, 2005 3:17 AM, bsimon@loxane.com <bsimon@loxane.com> wrote: > > Unfortunately. > > But we are in the the process to choose Postgresql with pgcluster. I'm > currently running some tests (performance, stability...) Save the > money on the license fees, you get it for your hardware ;-) > > I still welcome any advices or comments and I'll let you know how the > project is going on. > > Benjamin. > > > > "Mohan, Ross" <RMohan@arbinet.com> > > 05/04/2005 20:48 > > Pour : <bsimon@loxane.com> > cc : > Objet : RE: [PERFORM] Postgresql vs SQLserver for this > application ? > > > You never got answers on this? Apologies, I don't have one, but'd be > curious to hear about any you did get.... > > thx > > Ross > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf > Of bsimon@loxane.com > Sent: Monday, April 04, 2005 4:02 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Postgresql vs SQLserver for this application ? > > > hi all. > > We are designing a quite big application that requires a > high-performance database backend. The rates we need to obtain are at > least 5000 inserts per second and 15 selects per second for one > connection. There should only be 3 or 4 simultaneous connections. > I think our main concern is to deal with the constant flow of data coming > from the inserts that must be available for selection as fast as possible. > (kind of real time access ...) > > As a consequence, the database should rapidly increase up to more > than one hundred gigs. We still have to determine how and when we > shoud backup old data to prevent the application from a performance > drop. We intend to develop some kind of real-time partionning on our > main table keep the flows up. > > At first, we were planning to use SQL Server as it has features that > in my opinion could help us a lot : > - replication > - clustering > > Recently we started to study Postgresql as a solution for our project : > - it also has replication > - Postgis module can handle geographic datatypes (which would > facilitate our developments) > - We do have a strong knowledge on Postgresql administration > (we use it for production processes) > - it is free (!) and we could save money for hardware > purchase. > > Is SQL server clustering a real asset ? How reliable are Postgresql > replication tools ? Should I trust Postgresql performance for this > kind of needs ? > > My question is a bit fuzzy but any advices are most welcome... > hardware,tuning or design tips as well :)) > > Thanks a lot. > > Benjamin. > > >
Mohan, Ross wrote: > I wish I had a Dell system and run case to show you Alex, but I don't... > however...using Oracle's "direct path" feature, it's pretty straightforward. > > We've done 110,000 rows per second into index-less tables on a big system > (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained > for almost 9 minutes. ) > > Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad > feature enabled us to migrate a 4 TB database...really quickly. How close to this is PG's COPY? I get surprisingly good results using COPY with jdbc on smallish systems (now if that patch would make into the mainstream PG jdbc support!) I think COPY has a bit more overhead than what a Bulkload feature may have, but I suspect it's not that much more. > Now...if you ask me "can this work without Power5 and Hitachi SAN?" > my answer is..you give me a top end Dell and SCSI III on 15K disks > and I'll likely easily match it, yea. > > I'd love to see PG get into this range..i am a big fan of PG (just a > rank newbie) but I gotta think the underlying code to do this has > to be not-too-complex..... It may not be that far off if you can use COPY instead of INSERT. But comparing Bulkload to INSERT is a bit apples<->orangish. -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
On Wed, 2005-04-06 at 16:12 +0000, Mohan, Ross wrote: > I wish I had a Dell system and run case to show you Alex, but I don't... > however...using Oracle's "direct path" feature, it's pretty straightforward. > > We've done 110,000 rows per second into index-less tables on a big system > (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained > for almost 9 minutes. ) Just for kicks I did a local test on a desktop machine (single CPU, single IDE drive) using COPY from STDIN for a set of integers in via a single transaction, no indexes. 1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per second. Okay, no checkpoints and I didn't cross an index boundary, but I also haven't tuned the config file beyond bumping up the buffers. Lets try again with more data this time. 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per second. > I'd love to see PG get into this range..i am a big fan of PG (just a > rank newbie) but I gotta think the underlying code to do this has > to be not-too-complex..... I'd say we're there. > -----Original Message----- > From: Alex Turner [mailto:armtuk@gmail.com] > Sent: Wednesday, April 06, 2005 11:38 AM > To: bsimon@loxane.com > Cc: pgsql-performance@postgresql.org; Mohan, Ross > Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ? > > > I think everyone was scared off by the 5000 inserts per second number. > > I've never seen even Oracle do this on a top end Dell system with copious SCSI attached storage. > > Alex Turner > netEconomist > > On Apr 6, 2005 3:17 AM, bsimon@loxane.com <bsimon@loxane.com> wrote: > > > > Unfortunately. > > > > But we are in the the process to choose Postgresql with pgcluster. I'm > > currently running some tests (performance, stability...) Save the > > money on the license fees, you get it for your hardware ;-) > > > > I still welcome any advices or comments and I'll let you know how the > > project is going on. > > > > Benjamin. > > > > > > > > "Mohan, Ross" <RMohan@arbinet.com> > > > > 05/04/2005 20:48 > > > > Pour : <bsimon@loxane.com> > > cc : > > Objet : RE: [PERFORM] Postgresql vs SQLserver for this > > application ? > > > > > > You never got answers on this? Apologies, I don't have one, but'd be > > curious to hear about any you did get.... > > > > thx > > > > Ross > > > > -----Original Message----- > > From: pgsql-performance-owner@postgresql.org > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf > > Of bsimon@loxane.com > > Sent: Monday, April 04, 2005 4:02 AM > > To: pgsql-performance@postgresql.org > > Subject: [PERFORM] Postgresql vs SQLserver for this application ? > > > > > > hi all. > > > > We are designing a quite big application that requires a > > high-performance database backend. The rates we need to obtain are at > > least 5000 inserts per second and 15 selects per second for one > > connection. There should only be 3 or 4 simultaneous connections. > > I think our main concern is to deal with the constant flow of data coming > > from the inserts that must be available for selection as fast as possible. > > (kind of real time access ...) > > > > As a consequence, the database should rapidly increase up to more > > than one hundred gigs. We still have to determine how and when we > > shoud backup old data to prevent the application from a performance > > drop. We intend to develop some kind of real-time partionning on our > > main table keep the flows up. > > > > At first, we were planning to use SQL Server as it has features that > > in my opinion could help us a lot : > > - replication > > - clustering > > > > Recently we started to study Postgresql as a solution for our project : > > - it also has replication > > - Postgis module can handle geographic datatypes (which would > > facilitate our developments) > > - We do have a strong knowledge on Postgresql administration > > (we use it for production processes) > > - it is free (!) and we could save money for hardware > > purchase. > > > > Is SQL server clustering a real asset ? How reliable are Postgresql > > replication tools ? Should I trust Postgresql performance for this > > kind of needs ? > > > > My question is a bit fuzzy but any advices are most welcome... > > hardware,tuning or design tips as well :)) > > > > Thanks a lot. > > > > Benjamin. > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > --
How close to this is PG's COPY? I get surprisingly good results using COPY with jdbc on smallish systems (now if that patchwould make into the mainstream PG jdbc support!) I think COPY has a bit more overhead than what a Bulkload featuremay have, but I suspect it's not that much more. || Steve, I do not know. But am reading the docs now, and should figure it out. Ask me later if you remember. Oracle's "direct path" is a way of just slamming blocks filled with rows into the table, above the high water mark. It sidesteps freelist management and all manner of intrablock issues. There is a "payback", but the benefits far far outweigh the costs. > Now...if you ask me "can this work without Power5 and Hitachi SAN?" my > answer is..you give me a top end Dell and SCSI III on 15K disks and > I'll likely easily match it, yea. > > I'd love to see PG get into this range..i am a big fan of PG (just a > rank newbie) but I gotta think the underlying code to do this has to > be not-too-complex..... It may not be that far off if you can use COPY instead of INSERT. But comparing Bulkload to INSERT is a bit apples<->orangish. || Oh! I see! I had no idea I was doing that! Thanks for pointing it out clearly to me. Yea, I would say a full transactional INSERT of 5K rows/sec into an indexed-table is a near-mythology without significant caveats (parallelized, deferred buffering, etc.) -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
This thread seems to be focusing in on COPY efficiency, I'd like to ask something I got no answer to, a few months ago. Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, I accidentally strung together several \n-terminated input lines, and sent them to the server with a single "putline". To my (happy) surprise, I ended up with exactly that number of rows in the target table. Is this a bug? Is this fundamental to the protocol? Since it hasn't been documented (but then, "endcopy" isn't documented), I've been shy of investing in perf testing such mass copy calls. But, if it DOES work, it should be reducing the number of network roundtrips. So. Is it a feechur? Worth stress-testing? Could be VERY cool. -- "Dreams come true, not free."
Mischa <mischa.Sandberg@telus.net> writes: > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, > I accidentally strung together several \n-terminated input lines, > and sent them to the server with a single "putline". > To my (happy) surprise, I ended up with exactly that number of rows > in the target table. > Is this a bug? No, it's the way it's supposed to work. "putline" really just sends a stream of data ... there's no semantic significance to the number of putline calls you use to send the stream, only to the contents of the stream. (By the same token, it's unlikely that deliberately aggregating such calls would be much of a win.) regards, tom lane
> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, > I accidentally strung together several \n-terminated input lines, > and sent them to the server with a single "putline". > > To my (happy) surprise, I ended up with exactly that number of rows > in the target table. > > Is this a bug? Is this fundamental to the protocol? > > Since it hasn't been documented (but then, "endcopy" isn't documented), > I've been shy of investing in perf testing such mass copy calls. > But, if it DOES work, it should be reducing the number of network > roundtrips. I think it's documented in the libpq docs... Chris
In article <1112813199.42542e8f17b4d@webmail.telus.net>, Mischa <mischa.Sandberg@telus.net> writes: > This thread seems to be focusing in on COPY efficiency, > I'd like to ask something I got no answer to, a few months ago. > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, > I accidentally strung together several \n-terminated input lines, > and sent them to the server with a single "putline". > To my (happy) surprise, I ended up with exactly that number of rows > in the target table. > Is this a bug? Is this fundamental to the protocol? > Since it hasn't been documented (but then, "endcopy" isn't documented), > I've been shy of investing in perf testing such mass copy calls. > But, if it DOES work, it should be reducing the number of network > roundtrips. > So. Is it a feechur? Worth stress-testing? Could be VERY cool. Using COPY from DBD::Pg _is_ documented - presumed you use DBD::Pg version 1.41 released just today.
Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, > I accidentally strung together several \n-terminated input lines, > and sent them to the server with a single "putline". ... > So. Is it a feechur? Worth stress-testing? Could be VERY cool. As explained elsewhere, not really a feature, more of a side-effect. Keep in mind, however, that any network round-trip time saved has to be balanced against some additional overhead of constructing the combined strings in Perl before sending them over. Most times COPY is used to parse a newline-separated file anyway. If you have a slow network connection to the database, it *might* be a win, but my limited testing shows that it is not an advantage for a "normal" connection: I added 1 million rows via COPY using the normal way (1 million pg_putline calls), via pg_putline of 1000 rows at a time, and via 10,000 rows at a time. They all ran in 22 seconds, with no statistical difference between them. (This was the "real" time, the system time was actually much lower for the combined calls). It can't hurt to test things out on your particular system and see if it makes a real difference: it certainly does no harm as long as you make sure the string you send always *end* in a newline. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200504072201 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCVeZrvJuQZxSWSsgRAoP+AJ9jTNetePMwKv9rdyu6Lz+BjSiDOQCguoSU ie9TaeIxUuvd5fhjFueacvM= =1hWn -----END PGP SIGNATURE-----
Quoting Greg Sabino Mullane <greg@turnstep.com>: > > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, > > I accidentally strung together several \n-terminated input lines, > > and sent them to the server with a single "putline". > ... > > So. Is it a feechur? Worth stress-testing? Could be VERY cool. > > As explained elsewhere, not really a feature, more of a side-effect. > Keep in mind, however, that any network round-trip time saved has to > be balanced against some additional overhead of constructing the > combined strings in Perl before sending them over. Most times COPY > is used to parse a newline-separated file anyway. If you have a slow > network connection to the database, it *might* be a win, but my > limited testing shows that it is not an advantage for a "normal" > connection: I added 1 million rows via COPY using the normal way > (1 million pg_putline calls), via pg_putline of 1000 rows at a > time, and via 10,000 rows at a time. They all ran in 22 seconds, > with no statistical difference between them. (This was the "real" time, > the system time was actually much lower for the combined calls). > > It can't hurt to test things out on your particular system and see > if it makes a real difference: it certainly does no harm as long as > you make sure the string you send always *end* in a newline. Many thanks for digging into it. For the app I'm working with, the time delay between rows being posted is /just/ enough to exceed the TCP Nagle delay, so every row goes across in its own packet :-( Reducing the number of network roundtrips by a factor of 40 is enough to cut elapsed time in half. The cost of join("",@FortyRows), which produces a 1-4K string, is what's negligible in this case. -- "Dreams come true, not free" -- S.Sondheim, ITW