Thread: speeding up inserts
First of all, we are still running sapdb at the moment but are in the process of moving to postgresql, so it seemed a good idea to post this type of question here. In our environment we have transaction processing, each transaction accounts for 10-30 inserts and 3-4 selects. We also have users that use a management interface for doing all sorts of queries on the data once it gets into the db. Most of the user queries are selects with a few updates, and fewer inserts. The basic problem is that the transaction times are very critical, one second is a big deal. The data inserted into the db from a transaction does not have to happen instantly, it can be delayed (to a point anyways). Being that there is only so much you can do to speed up inserts, I have been testing out a different system of getting the data from the application into the database. Now what our application does is create the queries as it runs, then instead of inserting them into the database it writes them all out to a single file at the end of the transaction. This is a huge performance boost. We then use a separate deamon to run the disk queue once every second and do all the inserts. If for some reason the main application cant' write to disk it will revert to inserting them directly. Is this a crazy way to handle this? No matter what I have tried, opening and writing a single line to a file on disk is way faster than any database I have used. I even tried using BerkeleyDB as the queue instead of the disk, but that wasn't a whole lot faster then using the cached database handles (our application runs under mod perl). Chris
On Dec 31, 2003, at 14:16, Chris Ochs wrote: > Now what our application does is create the queries as it runs, then > instead > of inserting them into the database it writes them all out to a single > file > at the end of the transaction. This is a huge performance boost. We > then > use a separate deamon to run the disk queue once every second and do > all the > inserts. If for some reason the main application cant' write to disk > it > will revert to inserting them directly. > > Is this a crazy way to handle this? No matter what I have tried, > opening > and writing a single line to a file on disk is way faster than any > database > I have used. I even tried using BerkeleyDB as the queue instead of the > disk, but that wasn't a whole lot faster then using the cached database > handles (our application runs under mod perl). In my application, I've built a ``TransactionPipeline'' class that queues up transactions for asynchronous storage. It made an incredible difference in transaction processing speed in places where the transaction isn't critical (the atomicity is important, but the main place this stuff is used is in transactions that record the state of a device, for example). Conceptually, it's somewhat similar to yours. The thread that runs the queue is triggered by the addition of a new transaction to execute using a normal notification mechanism, so it's basically idle unless stuff's going on, and if there's a lot of stuff going on, the queue will just build up until incoming rates are lower than the rates at which we can actually process stuff. It's been the only thing that's kept our application running (against MS SQL Server until we can throw that away in favor of postgres). -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
"Chris Ochs" <chris@paymentonline.com> writes: > Is this a crazy way to handle this? Depends. Do you care if you lose that data (if the system crashes before your daemon can insert it into the database)? I think the majority of the win you are seeing comes from the fact that the data doesn't actually have to get to disk --- your "write to file" never gets further than kernel disk buffers in RAM. I would think that you could get essentially the same win by aggregating your database transactions into bigger ones. From a reliability point of view you're doing that anyway --- whatever work the daemon processes at a time is the real transaction size. regards, tom lane
> "Chris Ochs" <chris@paymentonline.com> writes: > > Is this a crazy way to handle this? > > Depends. Do you care if you lose that data (if the system crashes > before your daemon can insert it into the database)? I think the > majority of the win you are seeing comes from the fact that the data > doesn't actually have to get to disk --- your "write to file" never > gets further than kernel disk buffers in RAM. > > I would think that you could get essentially the same win by aggregating > your database transactions into bigger ones. From a reliability point > of view you're doing that anyway --- whatever work the daemon processes > at a time is the real transaction size. > > regards, tom lane > The transactions are as big as they can be, all the data is committed at once. I'm guessing that for any database to be as fast as I want it, it just needs bigger/better hardware, which isnt' an option at the moment. I was also thinking about data loss with the disk queue. Right now it's such a small risk, but as we do more transactions it gets bigger. So right now yes it's an acceptable risk given the chance of it happening and what a worst case scenario would look like. but at a point it wouldnt' be. Chris