Re: Perf Benchmarking and regression. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Perf Benchmarking and regression.
Date
Msg-id 20160512155845.lcbdg563ikj4p624@alap3.anarazel.de
Whole thread Raw
In response to Re: Perf Benchmarking and regression.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Perf Benchmarking and regression.
List pgsql-hackers
On 2016-05-12 10:49:06 -0400, Robert Haas wrote:
> On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > Please find the test results for the following set of combinations taken at
> > 128 client counts:
> >
> > 1) Unpatched master, default *_flush_after :  TPS = 10925.882396
> >
> > 2) Unpatched master, *_flush_after=0 :  TPS = 18613.343529
> >
> > 3) That line removed with #if 0, default *_flush_after :  TPS = 9856.809278
> >
> > 4) That line removed with #if 0, *_flush_after=0 :  TPS = 18158.648023
> 
> I'm getting increasingly unhappy about the checkpoint flush control.
> I saw major regressions on my parallel COPY test, too:

Yes, I'm concerned too.

The workload in this thread is a bit of an "artificial" workload (all
data is constantly updated, doesn't fit into shared_buffers, fits into
the OS page cache), and only measures throughput not latency.  But I
agree that that's way too large a regression to accept, and that there's
a significant number of machines with way undersized shared_buffer
values.


> http://www.postgresql.org/message-id/CA+TgmoYoUQf9cGcpgyGNgZQHcY-gCcKRyAqQtDU8KFE4N6HVkA@mail.gmail.com
> 
> That was a completely different machine (POWER7 instead of Intel,
> lousy disks instead of good ones) and a completely different workload.
> Considering these results, I think there's now plenty of evidence to
> suggest that this feature is going to be horrible for a large number
> of users.  A 45% regression on pgbench is horrible.

I asked you over there whether you could benchmark with just different
values for backend_flush_after... I chose the current value because it
gives the best latency / most consistent throughput numbers, but 128kb
isn't a large window.  I suspect we might need to disable backend guided
flushing if that's not sufficient :(


> > Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
> > WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."
> 
> Given the above results, it's not clear whether that is making things
> better or worse.

Yea, me neither. I think it's doubful that you'd see performance
difference due to the original ac1d7945f866b1928c2554c0f80fd52d7f977772
, independent of the WaitEventSet stuff, at these throughput rates.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Change error code for hstore syntax error
Next
From: Michael Banck
Date:
Subject: Re: Academic help for Postgres