Re: 9.4 regression - Mailing list pgsql-hackers

From Andres Freund
Subject Re: 9.4 regression
Date
Msg-id 20130809062002.GN14729@alap2.anarazel.de
Whole thread Raw
In response to Re: 9.4 regression  (Jon Nelson <jnelson+pgsql@jamponi.net>)
List pgsql-hackers
On 2013-08-08 22:58:42 -0500, Jon Nelson wrote:
> On Thu, Aug 8, 2013 at 9:27 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-08-08 16:12:06 -0500, Jon Nelson wrote:
> ...
> 
> >> At this point I'm convinced that the issue is a pathological case in
> >> ext4. The performance impact disappears as soon as the unwritten
> >> extent(s) are written to with real data. Thus, even though allocating
> >> files with posix_fallocate is - frequently - orders of magnitude
> >> quicker than doing it with write(2), the subsequent re-write can be
> >> more expensive.  At least, that's what I'm gathering from the various
> >> threads.
> >
> >
> >>  Why this issue didn't crop up in earlier testing and why I
> >> can't seem to make test_fallocate do it (even when I modify
> >> test_fallocate to write to the newly-allocated file in a mostly-random
> >> fashion) has me baffled.
> >
> > It might be kernel version specific and concurrency seems to play a
> > role. If you reproduce the problem, could you run a "perf record -ga" to
> > collect a systemwide profile?
> 
> Finally, an excuse to learn how to use 'perf'! I'll try to provide
> that info when I am able.

Running perf record as above during the first minute and then doing a
perf report > somefile (redirected to get the noninteractive version)
should get you started.

> > There's some more things to test:
> > - is the slowdown dependent on the scale? I.e is it visible with -j 1 -c
> >   1?
> 
> scale=1 (-j 1 -c 1):
> with fallocate: 685 tps
> without: 727
> 
> scale=20
> with fallocate: 129
> without: 402
> 
> scale=40
> with fallocate: 163
> without: 511

Ok, so there's some clear correlation with the amount of writers.

> > - Does it also occur in synchronous_commit=off configurations? Those
> >   don't fdatasync() from so many backends, that might play a role.
> 
> With synchronous_commit=off, the performance is vastly improved.
> Interestingly, the fallocate case is (immaterially) faster than the
> non-fallocate case:   3766tps vs 3700tps.

That's interesting because in the synchronous_commit=off case most of
the writing and syncing should be done by the wal writer. So there's
another hint that there's some scalability issue causing place,
presumably in the kernel.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump and schema names
Next
From: Vik Fearing
Date:
Subject: Re: [PATCH] Statistics collection for CLUSTER command