Thread: speeding up inserts

speeding up inserts

From
"Chris Ochs"
Date:
First of all, we are still running sapdb at the moment but are in the
process of moving to postgresql, so it seemed a good idea to post this type
of question here.

In our environment we have transaction processing, each transaction accounts
for 10-30 inserts and 3-4 selects.  We also have users that use a management
interface for doing all sorts of queries on the data once it gets into the
db.  Most of the user queries are selects with a few updates, and fewer
inserts.

The basic problem is that the transaction times are very critical, one
second is a big deal.  The data inserted into the db from a transaction does
not have to happen instantly, it can be delayed (to a point anyways).  Being
that there is only so much you can do to speed up inserts, I have been
testing out a different system of getting the data from the application into
the database.

Now what our application does is create the queries as it runs, then instead
of inserting them into the database it writes them all out to a single file
at the end of the transaction.  This is a huge performance boost.  We then
use a separate deamon to run the disk queue once every second and do all the
inserts.  If for some reason the main application cant' write to disk it
will revert to inserting them directly.

Is this a crazy way to handle this?  No matter what I have tried, opening
and writing a single line to a file on disk is way faster than any database
I have used.  I even tried using BerkeleyDB as the queue instead of the
disk, but that wasn't a whole lot faster then using the cached database
handles (our application runs under mod perl).

Chris





Re: speeding up inserts

From
Dustin Sallings
Date:
On Dec 31, 2003, at 14:16, Chris Ochs wrote:

> Now what our application does is create the queries as it runs, then
> instead
> of inserting them into the database it writes them all out to a single
> file
> at the end of the transaction.  This is a huge performance boost.  We
> then
> use a separate deamon to run the disk queue once every second and do
> all the
> inserts.  If for some reason the main application cant' write to disk
> it
> will revert to inserting them directly.
>
> Is this a crazy way to handle this?  No matter what I have tried,
> opening
> and writing a single line to a file on disk is way faster than any
> database
> I have used.  I even tried using BerkeleyDB as the queue instead of the
> disk, but that wasn't a whole lot faster then using the cached database
> handles (our application runs under mod perl).

    In my application, I've built a ``TransactionPipeline'' class that
queues up transactions for asynchronous storage.  It made an incredible
difference in transaction processing speed in places where the
transaction isn't critical (the atomicity is important, but the main
place this stuff is used is in transactions that record the state of a
device, for example).

    Conceptually, it's somewhat similar to yours.  The thread that runs
the queue is triggered by the addition of a new transaction to execute
using a normal notification mechanism, so it's basically idle unless
stuff's going on, and if there's a lot of stuff going on, the queue
will just build up until incoming rates are lower than the rates at
which we can actually process stuff.

    It's been the only thing that's kept our application running (against
MS SQL Server until we can throw that away in favor of postgres).

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: speeding up inserts

From
Tom Lane
Date:
"Chris Ochs" <chris@paymentonline.com> writes:
> Is this a crazy way to handle this?

Depends.  Do you care if you lose that data (if the system crashes
before your daemon can insert it into the database)?  I think the
majority of the win you are seeing comes from the fact that the data
doesn't actually have to get to disk --- your "write to file" never
gets further than kernel disk buffers in RAM.

I would think that you could get essentially the same win by aggregating
your database transactions into bigger ones.  From a reliability point
of view you're doing that anyway --- whatever work the daemon processes
at a time is the real transaction size.

            regards, tom lane

Re: speeding up inserts

From
"Chris Ochs"
Date:
> "Chris Ochs" <chris@paymentonline.com> writes:
> > Is this a crazy way to handle this?
>
> Depends.  Do you care if you lose that data (if the system crashes
> before your daemon can insert it into the database)?  I think the
> majority of the win you are seeing comes from the fact that the data
> doesn't actually have to get to disk --- your "write to file" never
> gets further than kernel disk buffers in RAM.
>
> I would think that you could get essentially the same win by aggregating
> your database transactions into bigger ones.  From a reliability point
> of view you're doing that anyway --- whatever work the daemon processes
> at a time is the real transaction size.
>
> regards, tom lane
>

The transactions are as big as they can be, all the data is committed at
once.    I'm guessing that for any database to be as fast as I want it, it
just needs bigger/better hardware, which isnt' an option at the moment.

I was also thinking about data loss with the disk queue.    Right now it's
such a small risk, but as we do more transactions it gets bigger.  So right
now yes it's an acceptable risk given the chance of it happening and what a
worst case scenario would look like. but at a point it wouldnt' be.

Chris