Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date
Msg-id alpine.DEB.2.02.1306191531570.25404@localhost6.localdomain6
Whole thread Raw
In response to Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
> I'm still getting the same sort of pauses waiting for input with your v11.

Alas.

> This is a pretty frustrating problem; I've spent about two days so far trying 
> to narrow down how it happens.  I've attached the test program I'm using.  It 
> seems related to my picking a throttled rate that's close to (but below) the 
> maximum possible on your system.  I'm using 10,000 on a system that can do 
> about 16,000 TPS when running pgbench in debug mode.
>
> This problem is 100% reproducible here; happens every time.  This is a laptop 
> running Mac OS X.  It's possible the problem is specific to that platform. 
> I'm doing all my tests with the database itself setup for development, with 
> debug and assertions on.  The lag spikes seem smaller without assertions on, 
> but they are still there.
>
> Here's a sample:
>
> transaction type: SELECT only

What is this test script? I'm doing pgbench for tests.

> scaling factor: 10
> query mode: simple
> number of clients: 25
> number of threads: 1
> duration: 30 s
> number of transactions actually processed: 301921
> average transaction lag: 1.133 ms (max 137.683 ms)
> tps = 10011.527543 (including connections establishing)
> tps = 10027.834189 (excluding connections establishing)
>
> And those slow ones are all at the end of the latency log file, as shown in 
> column 3 here:
>
> 22 11953 3369 0 1371578126 954881
> 23 11926 3370 0 1371578126 954918
> 3 12238 30310 0 1371578126 984634
> 7 12205 30350 0 1371578126 984742
> 8 12207 30359 0 1371578126 984792
> 11 12176 30325 0 1371578126 984837
> 13 12074 30292 0 1371578126 984882
> 0 12288 175452 0 1371578127 126340
> 9 12194 171948 0 1371578127 126421
> 12 12139 171915 0 1371578127 126466
> 24 11876 175657 0 1371578127 126507

Indeed, there are two spikes, but not all clients are concerned.

As I have not seen that, debuging is hard. I'll give it a try on 
tomorrow.

> When no one is sleeping, the timeout becomes infinite, so only returning data 
> will break it.  This is intended behavior though.

This is not coherent. Under --throttle there should basically always be 
someone asleep, unless the server cannot cope with the load and *all* 
transactions are late. A no time out state looks pretty unrealistic, 
because it means that there is no throttling.

> I don't think the st->listen related code has anything to do with this 
> either.  That flag is only used to track when clients have completed sending 
> their first query over to the server.  Once reaching that point once, 
> afterward they should be "listening" for results each time they exit the 
> doCustom() code.

This assumption seems false if you can have a "sleep" at the beginning of 
the sequence, which is what throttle is doing, but can be done by any 
custom script, so that the client is expected to wait before sending any 
command, thus there can be no select underway in that case.

So listen should be set to 1 when a select as been sent, and set back to 0 
when the result data have all been received.

"doCustom" makes implicit assumptions about what is going on, whereas it 
should focus on looking at the incoming state, performing operations, and 
leaving with a state which correspond to the actual status, without 
assumptions about what is going to happen next.

> st->listen goes to 1 very soon after startup and then it stays there, 
> and that logic seems fine too.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: [PATCH] add --throttle to pgbench (submission 3)
Next
From: Peter Eisentraut
Date:
Subject: Re: Git-master regression failure