Re: Sustained inserts per sec ... ? - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Sustained inserts per sec ... ?
Date
Msg-id 1112730509.16721.918.camel@localhost.localdomain
Whole thread Raw
In response to Re: Sustained inserts per sec ... ?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Sustained inserts per sec ... ?  (Christopher Petrilli <petrilli@gmail.com>)
Re: Sustained inserts per sec ... ?  (Christopher Petrilli <petrilli@gmail.com>)
List pgsql-performance
On Mon, 2005-04-04 at 22:36 -0400, Tom Lane wrote:
> Christopher Petrilli <petrilli@gmail.com> writes:
> > On Apr 4, 2005 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> do a test run with *no* indexes on the table, just to see if it behaves
> >> any differently?  Basically I was wondering if index overhead might be
> >> part of the problem.
>
> > http://www.amber.org/~petrilli/diagrams/pgsql_copy500_pkonly.png
>
> > I appologize, I forgot to kill the PK, but as you can see, the curve
> > flattened out a lot.  It still begins to increase in what seems like
> > the same place.  You can find the results themselves at:
>
> Yeah, this confirms the thought that the indexes are the source of
> the issue.  (Which is what I'd expect, because a bare INSERT ought to be
> an approximately constant-time operation.  But it's good to verify.)

Yup, indexes are the best explanation so far - block extension needs
some work, but I doubted that it was the source of this effect.

> Now some amount of slowdown is to be expected as the indexes get larger,
> since it ought to take roughly O(log N) time to insert a new entry in an
> index of size N.  The weird thing about your curves is the very sudden
> jump in the insert times.

Well, ISTM that the curve is far from unique. Mark's OSDL tests show
them too. What was wierd, for me, was that it "resets" when you move to
a new table. The index theory does accurately explain that.

Perhaps the jump is not so sudden? Do I see a first small step up at
about 4.5M rows, then another much bigger one at 7.5M (which looks like
the only one at first glance)?

> What I think might be happening is that the "working set" of pages
> touched during index inserts is gradually growing, and at some point it
> exceeds shared_buffers, and at that point performance goes in the toilet
> because we are suddenly doing lots of reads to pull in index pages that
> fell out of the shared buffer area.

So this does seem to be the best explanation and it seems a good one.

It's also an excellent advert for table and index partitioning, and some
damning evidence against global indexes on partitioned tables (though
they may still be better than the alternative...)

> The indicated fix of course is to increase shared_buffers.

Splitting your tables at 4M, not 10M would work even better.

..

Anyway, where most of this started was with Christopher's comments:

On Fri, 2005-04-01 at 14:38 -0500, Christopher Petrilli wrote:
> This was an application originally written for MySQL/MYISAM, and it's
> looking like PostgreSQL can't hold up for it, simply because it's "too
> much database" if that makes sense.  The same box, running the MySQL
> implementation (which uses no transactions) runs around 800-1000
> rows/second systained.

B-trees aren't unique to PostgreSQL; the explanation developed here
would work equally well for any database system that used tree-based
indexes. Do we still think that MySQL can do this when PostgreSQL
cannot? How?

Do we have performance test results showing the same application load
without the degradation? We don't need to look at the source code to
measure MySQL performance...

Best Regards, Simon Riggs



pgsql-performance by date:

Previous
From: Christopher Petrilli
Date:
Subject: Re: Sustained inserts per sec ... ?
Next
From: Christopher Petrilli
Date:
Subject: Re: Sustained inserts per sec ... ?