Thread: large dataset with write vs read clients
currently PG 8.1.3. See attached for my postgresql.conf. Server is freebsd 6.2 w/ a fast 3TB storage array and only 2GB of ram. We're running RTG which is a like mrtg, cricket, etc. basically queries network devices via SNMP, throws stats into the DB for making pretty bandwidth graphs. We've got hundreds of devices, with 10K+ ports and probably 100K's of stats being queried every 5 minutes. In order to do all that work, the back end SNMP querier is multi-threaded and opens a PG connection per-thread. We're running 30 threads. This is basically all INSERTS, but only ends up to being about 32,000/5 minutes. The graphing front end CGI is all SELECT. There's 12k tables today, and new tables are created each month. The number of rows per table is 100-700k, with most in the 600-700K range. 190GB of data so far. Good news is that queries have no joins and are limited to only a few tables at a time. Basically, each connection is taking about 100MB resident. As we need to increase the number of threads to be able to query all the devices in the 5 minute window, we're running out of memory. There aren't that many CGI connections at anyone one time, but obviously query performance isn't great, but honestly is surprisingly good all things considered. Honestly, not looking to improve PG's performance, really although I wouldn't complain. Just better manage memory/hardware. I assume I can't start up two instances of PG pointing at the same files, one read-only and one read-write with different memory profiles, so I assume my only real option is throw more RAM at it. I don't have $$$ for another array/server for a master/slave right now. Or perhaps tweaking my .conf file? Are newer PG versions more memory efficient? Thanks, Aaron -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Attachment
On 10/7/10 11:47 AM, Aaron Turner wrote: > <snip> > > Basically, each connection is taking about 100MB resident. As we need > to increase the number of threads to be able to query all the devices > in the 5 minute window, we're running out of memory. I think the first thing to do is look into using a connection pooler like pgpool to reduce your connection memory overhead. -Dan
* Dan Harris (fbsd@drivefaster.net) wrote: > On 10/7/10 11:47 AM, Aaron Turner wrote: >> Basically, each connection is taking about 100MB resident. As we need >> to increase the number of threads to be able to query all the devices >> in the 5 minute window, we're running out of memory. > I think the first thing to do is look into using a connection pooler > like pgpool to reduce your connection memory overhead. Yeah.. Having the number of database connections be close to the number of processors is usually recommended. Stephen
Attachment
* Aaron Turner (synfinatic@gmail.com) wrote: > The graphing front end CGI is all SELECT. There's 12k tables today, > and new tables are created each month. That's a heck of alot of tables.. Probably more than you really need. Not sure if reducing that number would help query times though. > The number of rows per table > is 100-700k, with most in the 600-700K range. 190GB of data so far. > Good news is that queries have no joins and are limited to only a few > tables at a time. Have you got indexes and whatnot on these tables? > Basically, each connection is taking about 100MB resident. As we need > to increase the number of threads to be able to query all the devices > in the 5 minute window, we're running out of memory. There aren't > that many CGI connections at anyone one time, but obviously query > performance isn't great, but honestly is surprisingly good all things > considered. I'm kind of suprised at each connection taking 100MB, especially ones which are just doing simple inserts. Thanks, Stephen
Attachment
* Aaron Turner (synfinatic@gmail.com) wrote: > Basically, each connection is taking about 100MB resident Errr.. Given that your shared buffers are around 100M, I think you're confusing what you see in top with reality. The shared buffers are visible in every process, but it's all the same actual memory, not 100M per process. Thanks, Stephen
Attachment
On Thu, Oct 7, 2010 at 12:02 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Aaron Turner (synfinatic@gmail.com) wrote: >> Basically, each connection is taking about 100MB resident > > Errr.. Given that your shared buffers are around 100M, I think you're > confusing what you see in top with reality. The shared buffers are > visible in every process, but it's all the same actual memory, not 100M > per process. Ah, I had missed that. Thanks for the tip. Sounds like I should still investigate pgpool though. If nothing else it should improve insert performance right? As for the tables, no indexes. We're using a constraint on one of the columns (date) w/ table inheritance to limit which tables are scanned since SELECT's are always for a specific date range. By always querying the inherited table, we're effectively getting a cheap semi-granular index without any insert overhead. Unfortunately, without forking the RTG code significantly, redesigning the schema really isn't viable. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Aaron Turner wrote: > Are newer PG versions more memory efficient? > Moving from PostgreSQL 8.1 to 8.3 or later should make everything you do happen 2X to 3X faster, before even taking into account that you can tune the later versions better too. See http://suckit.blog.hu/2009/09/29/postgresql_history for a simple comparison of how much performance jumped on both reads and writes in the later versions than what you're running. Memory consumption will on average decrease too, simply via the fact that queries start and finish more quickly. Given an even workload, there will be less of them running at a time on a newer version to keep up. Given the size of your database, I'd advise you consider a migration to a new version ASAP. 8.4 is a nice stable release at this point, that's the one to consider moving to. The biggest single problem people upgrading from 8.1 to 8.3 or later see is related to changes in how data is cast between text and integer types; 1 doesn't equal '1' anymore is the quick explanation of that. See http://wiki.postgresql.org/wiki/Version_History for links to some notes on that, as well as other good resources related to upgrading. This may require small application changes to deal with. Even not considering the performance increases, PostgreSQL 8.1 is due to be dropped from active support potentially as early as next month: http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy Also: PostgreSQL 8.1.3 has several known bugs that can lead to various sorts of nasty data corruption. You should at least consider an immediate upgrade to the latest release of that version, 8.1.22. Small version number increases in PostgreSQL only consist of serious bug fixes, not feature changes. See http://www.postgresql.org/support/versioning for notes about the project's standard for changes here, and how it feels about the risks of running versions with known bugs in them vs. upgrading. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, "PostgreSQL 9.0 High Performance" Pre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book
On Thu, Oct 7, 2010 at 2:47 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Aaron Turner wrote: >> >> Are newer PG versions more memory efficient? >> > > Moving from PostgreSQL 8.1 to 8.3 or later should make everything you do > happen 2X to 3X faster, before even taking into account that you can tune > the later versions better too. See > http://suckit.blog.hu/2009/09/29/postgresql_history for a simple comparison > of how much performance jumped on both reads and writes in the later > versions than what you're running. Memory consumption will on average > decrease too, simply via the fact that queries start and finish more > quickly. Given an even workload, there will be less of them running at a > time on a newer version to keep up. > > Given the size of your database, I'd advise you consider a migration to a > new version ASAP. 8.4 is a nice stable release at this point, that's the > one to consider moving to. The biggest single problem people upgrading from > 8.1 to 8.3 or later see is related to changes in how data is cast between > text and integer types; 1 doesn't equal '1' anymore is the quick explanation > of that. See http://wiki.postgresql.org/wiki/Version_History for links to > some notes on that, as well as other good resources related to upgrading. > This may require small application changes to deal with. > > Even not considering the performance increases, PostgreSQL 8.1 is due to be > dropped from active support potentially as early as next month: > http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy > > Also: PostgreSQL 8.1.3 has several known bugs that can lead to various > sorts of nasty data corruption. You should at least consider an immediate > upgrade to the latest release of that version, 8.1.22. Small version number > increases in PostgreSQL only consist of serious bug fixes, not feature > changes. See http://www.postgresql.org/support/versioning for notes about > the project's standard for changes here, and how it feels about the risks of > running versions with known bugs in them vs. upgrading. > > -- > Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD > PostgreSQL Training, Services and Support www.2ndQuadrant.us > Author, "PostgreSQL 9.0 High Performance" Pre-ordering at: > https://www.packtpub.com/postgresql-9-0-high-performance/book > > Thanks for the info Greg. Sounds like I've got an upgrade in the near future! :) Again, thanks to everyone who's responded; it's been really informative and helpful. The PG community has always proven to be awesome! -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
* Greg Smith: > Given the size of your database, I'd advise you consider a migration > to a new version ASAP. 8.4 is a nice stable release at this point, > that's the one to consider moving to. It also offers asynchronous commits, which might be a good tradeoff here (especially if the data gathered is not used for billing purposes 8-).
I have a logical problem with asynchronous commit. The "commit" command should instruct the database to make the outcome of the transaction permanent. The application should wait to see whether the commit was successful or not. Asynchronous behavior in the commit statement breaks the ACID rules and should not be used in a RDBMS system. If you don't need ACID, you may not need RDBMS at all. You may try with MongoDB. MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs Florian Weimer wrote: > * Greg Smith: > > >> Given the size of your database, I'd advise you consider a migration >> to a new version ASAP. 8.4 is a nice stable release at this point, >> that's the one to consider moving to. >> > > It also offers asynchronous commits, which might be a good tradeoff > here (especially if the data gathered is not used for billing purposes > 8-). > > -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On 10/10/2010 5:35 AM, Mladen Gogala wrote: > I have a logical problem with asynchronous commit. The "commit" command > should instruct the database to make the outcome of the transaction > permanent. The application should wait to see whether the commit was > successful or not. Asynchronous behavior in the commit statement breaks > the ACID rules and should not be used in a RDBMS system. If you don't > need ACID, you may not need RDBMS at all. You may try with MongoDB. > MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs That argument makes little sense to me. Because you can afford a clearly defined and bounded loosening of the durability guarantee provided by the database, such that you know and accept the possible loss of x seconds of work if your OS crashes or your UPS fails, this means you don't really need durability guarantees at all - let alone all that atomic commit silliness, transaction isolation, or the guarantee of a consistent on-disk state? Some of the other flavours of non-SQL databases, both those that've been around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those that're new and fashionable Cassandra, CouchDB, etc, provide some ACID properties anyway. If you don't need/want an SQL interface to your database you don't have to throw out all that other database-y goodness if you haven't been drinking too much of the NoSQL kool-aid. There *are* situations in which it's necessary to switch to relying on distributed, eventually-consistent databases with non-traditional approaches to data management. It's awfully nice not to have to, though, and can force you to do a lot more wheel reinvention when it comes to querying, analysing and reporting on your data. FWIW, a common approach in this sort of situation has historically been - accepting that RDBMSs aren't great at continuous fast loading of individual records - to log the records in batches to a flat file, Berkeley DB, etc as a staging point. You periodically rotate that file out and bulk-load its contents into the RDBMS for analysis and reporting. This doesn't have to be every hour - every minute is usually pretty reasonable, and still gives your database a much easier time without forcing you to modify your app to batch inserts into transactions or anything like that. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On 10/10/2010 2:43 AM, Craig Ringer wrote: > > Some of the other flavours of non-SQL databases, both those that've been > around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those > that're new and fashionable Cassandra, CouchDB, etc, provide some ACID > properties anyway. If you don't need/want an SQL interface to your > database you don't have to throw out all that other database-y goodness > if you haven't been drinking too much of the NoSQL kool-aid. This is a terrible misunderstanding. You haven't taken a look at that Youtube clip I sent you, have you? I am an Oracle DBA, first and foremost, disturbing the peace since 1989. I haven't been drinking the NoSQL kool-aid at all. I was simply being facetious. ACID rules are business rules and I am bitterly opposed to relaxing them. BTW, my favorite drink is Sam Adams Ale. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On 10/10/2010 2:55 PM, Mladen Gogala wrote: > On 10/10/2010 2:43 AM, Craig Ringer wrote: >> >> Some of the other flavours of non-SQL databases, both those that've been >> around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those >> that're new and fashionable Cassandra, CouchDB, etc, provide some ACID >> properties anyway. If you don't need/want an SQL interface to your >> database you don't have to throw out all that other database-y goodness >> if you haven't been drinking too much of the NoSQL kool-aid. > This is a terrible misunderstanding. You haven't taken a look at that > Youtube clip I sent you, have you? I'm not so good with video when I'm seeking information not entertainment. I really dislike having to sit and watch someone sloooowly get aroud to the point; give me something to skim read and I'll do that. The trend toward video news etc drives me nuts - IMO just detracting from the guts of the story/argument/explanation in most cases. One of the wonderful things about the written word is that everybody can benefit from it at their own natural pace. Video, like university lectures, takes that away and forces the video to be paced to the needs of the slowest. My dislike of video-as-information is a quirk that's clearly not shared by too many given how trendy video is becoming on the 'net. OTOH, it's probably not a grossly unreasonable choice when dealing with lots of mailing list posts/requests. Imagine if the Pg list accepted video link questions - ugh. Hey, maybe I should try posting YouTube video answers to a few questions for kicks, see how people react ;-) > I am an Oracle DBA, first and > foremost, disturbing the peace since 1989. I haven't been drinking the > NoSQL kool-aid at all. > I was simply being facetious. ACID rules are business rules and I am > bitterly opposed to relaxing them. BTW, my favorite drink is Sam Adams Ale. Aah, thanks. I completely missed it - which is a little scary, in that IMO that message could've been believably written in deadly earnest by a NoSQL over-enthusiast. Good work ... I think. Eek. Sam Adams ale, I'm afrid, does not travel well from Australia. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
* Mladen Gogala: > I have a logical problem with asynchronous commit. The "commit" > command should instruct the database to make the outcome of the > transaction permanent. The application should wait to see whether the > commit was successful or not. Asynchronous behavior in the commit > statement breaks the ACID rules and should not be used in a RDBMS > system. That's a bit over the top. It may make sense to use PostgreSQL even if the file system doesn't guarantuee ACID by keeping multiple checksummed copies of the database files. Asynchronous commits offer yet another trade-off here. Some people use RDBMSs mostly for the *M* part, to get a consistent administration experience across multiple applications. And even with asynchronous commits, PostgreSQL will maintain a consistent state of the database.
On 10/10/2010 7:45 AM, Florian Weimer wrote:
Both Postgres and Oracle have that option and both databases will maintain the consistent state, but both databases will allow the loss of data in case of system crash. Strictly speaking, that does break the "D" in ACID.
Some people use RDBMSs mostly for the *M* part, to get a consistent administration experience across multiple applications. And even with asynchronous commits, PostgreSQL will maintain a consistent state of the database.
Both Postgres and Oracle have that option and both databases will maintain the consistent state, but both databases will allow the loss of data in case of system crash. Strictly speaking, that does break the "D" in ACID.
-- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
craig@postnewspapers.com.au (Craig Ringer) writes: > Hey, maybe I should try posting YouTube video answers to a few > questions for kicks, see how people react ;-) And make sure it uses the same voice as is used in the "MongoDB is web scale" video, to ensure that people interpret it correctly :-). -- output = ("cbbrowne" "@" "gmail.com") http://linuxdatabases.info/info/nonrdbms.html The *Worst* Things to Say to a Police Officer: Hey, is that a 9 mm? That's nothing compared to this .44 magnum.
mladen.gogala@vmsinfo.com (Mladen Gogala) writes: > I have a logical problem with asynchronous commit. The "commit" > command should instruct the database to make the outcome of the > transaction permanent. The application should wait to see whether the > commit was successful or not. Asynchronous behavior in the commit > statement breaks the ACID rules and should not be used in a RDBMS > system. If you don't need ACID, you may not need RDBMS at all. You may > try with MongoDB. MongoDB is web scale: > http://www.youtube.com/watch?v=b2F-DItXtZs The "client" always has the option of connecting to a set of databases, and stowing parts of the data hither and thither. That often leads to the relaxation called "BASE." (And IBM has been selling that relaxation as MQ-Series since the early '90s!) There often *ARE* cases where it is acceptable for some of the data to not be as durable, because that data is readily reconstructed. This is particularly common for calculated/cached/aggregated data. Many things can get relaxed for a "data warehouse" data store, where the database is not authoritative, but rather aggregates data drawn from other authoritative sources. In such applications, neither the A, C, I, nor the D are pointedly crucial, in the DW data store. - We don't put the original foreign key constraints into the DW database; they don't need to be enforced a second time. Ditto for constraints of all sorts. - Batching of the loading of updates is likely to break several of the letters. And I find it *quite* acceptable to lose "D" if the data may be safely reloaded into the DW database. I don't think this is either cavalier nor that it points to "MongoDB is web scale." -- "cbbrowne","@","gmail.com" Rules of the Evil Overlord #181. "I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire."