Re: Sustained inserts per sec ... ? - Mailing list pgsql-performance

From Christopher Petrilli
Subject Re: Sustained inserts per sec ... ?
Date
Msg-id 59d991c405040513055be3a9b2@mail.gmail.com
Whole thread Raw
In response to Re: Sustained inserts per sec ... ?  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Sustained inserts per sec ... ?
List pgsql-performance
On Apr 5, 2005 3:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Now some amount of slowdown is to be expected as the indexes get larger,
> > since it ought to take roughly O(log N) time to insert a new entry in an
> > index of size N.  The weird thing about your curves is the very sudden
> > jump in the insert times.
>
> Well, ISTM that the curve is far from unique. Mark's OSDL tests show
> them too. What was wierd, for me, was that it "resets" when you move to
> a new table. The index theory does accurately explain that.
>
> Perhaps the jump is not so sudden? Do I see a first small step up at
> about 4.5M rows, then another much bigger one at 7.5M (which looks like
> the only one at first glance)?
>
> > What I think might be happening is that the "working set" of pages
> > touched during index inserts is gradually growing, and at some point it
> > exceeds shared_buffers, and at that point performance goes in the toilet
> > because we are suddenly doing lots of reads to pull in index pages that
> > fell out of the shared buffer area.
>
> So this does seem to be the best explanation and it seems a good one.
>
> It's also an excellent advert for table and index partitioning, and some
> damning evidence against global indexes on partitioned tables (though
> they may still be better than the alternative...)
>
> > The indicated fix of course is to increase shared_buffers.
>
> Splitting your tables at 4M, not 10M would work even better.

Unfortunately, given we are talking about billions of rows
potentially, I'm concerned about that many tables when it comes to
query time. I assume this will kick in the genetic optimizer?


> Anyway, where most of this started was with Christopher's comments:
>
> On Fri, 2005-04-01 at 14:38 -0500, Christopher Petrilli wrote:
> > This was an application originally written for MySQL/MYISAM, and it's
> > looking like PostgreSQL can't hold up for it, simply because it's "too
> > much database" if that makes sense.  The same box, running the MySQL
> > implementation (which uses no transactions) runs around 800-1000
> > rows/second systained.
>
> B-trees aren't unique to PostgreSQL; the explanation developed here
> would work equally well for any database system that used tree-based
> indexes. Do we still think that MySQL can do this when PostgreSQL
> cannot? How?

There are customers in production using MySQL with 10M rows/table, and
I have no evidence of this behavior. I do not have the test jig for
MySQL, but I can create one, which is what I will do. Note that they
are using MyISAM files, so there is no ACID behavior. Also, I have
seen troubling corruption issues that I've never been able to
concretely identify.

Above all, I've been impressed that PostgreSQL, even when it hits this
wall, never corrupts anything.

 > Do we have performance test results showing the same application load
> without the degradation? We don't need to look at the source code to
> measure MySQL performance...

I will see what I can do in the next few days to create a similar
little test for MySQL.

Chris
--
| Christopher Petrilli
| petrilli@gmail.com

pgsql-performance by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Sustained inserts per sec ... ?
Next
From: Christopher Petrilli
Date:
Subject: Re: Sustained inserts per sec ... ?