On 2016-05-12 10:49:06 -0400, Robert Haas wrote:
> On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > Please find the test results for the following set of combinations taken at
> > 128 client counts:
> >
> > 1) Unpatched master, default *_flush_after : TPS = 10925.882396
> >
> > 2) Unpatched master, *_flush_after=0 : TPS = 18613.343529
> >
> > 3) That line removed with #if 0, default *_flush_after : TPS = 9856.809278
> >
> > 4) That line removed with #if 0, *_flush_after=0 : TPS = 18158.648023
>
> I'm getting increasingly unhappy about the checkpoint flush control.
> I saw major regressions on my parallel COPY test, too:
Yes, I'm concerned too.
The workload in this thread is a bit of an "artificial" workload (all
data is constantly updated, doesn't fit into shared_buffers, fits into
the OS page cache), and only measures throughput not latency. But I
agree that that's way too large a regression to accept, and that there's
a significant number of machines with way undersized shared_buffer
values.
> http://www.postgresql.org/message-id/CA+TgmoYoUQf9cGcpgyGNgZQHcY-gCcKRyAqQtDU8KFE4N6HVkA@mail.gmail.com
>
> That was a completely different machine (POWER7 instead of Intel,
> lousy disks instead of good ones) and a completely different workload.
> Considering these results, I think there's now plenty of evidence to
> suggest that this feature is going to be horrible for a large number
> of users. A 45% regression on pgbench is horrible.
I asked you over there whether you could benchmark with just different
values for backend_flush_after... I chose the current value because it
gives the best latency / most consistent throughput numbers, but 128kb
isn't a large window. I suspect we might need to disable backend guided
flushing if that's not sufficient :(
> > Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
> > WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."
>
> Given the above results, it's not clear whether that is making things
> better or worse.
Yea, me neither. I think it's doubful that you'd see performance
difference due to the original ac1d7945f866b1928c2554c0f80fd52d7f977772
, independent of the WaitEventSet stuff, at these throughput rates.
Greetings,
Andres Freund