Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) |
Date | |
Msg-id | alpine.DEB.2.02.1306191531570.25404@localhost6.localdomain6 Whole thread Raw |
In response to | Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) (Greg Smith <greg@2ndQuadrant.com>) |
List | pgsql-hackers |
> I'm still getting the same sort of pauses waiting for input with your v11. Alas. > This is a pretty frustrating problem; I've spent about two days so far trying > to narrow down how it happens. I've attached the test program I'm using. It > seems related to my picking a throttled rate that's close to (but below) the > maximum possible on your system. I'm using 10,000 on a system that can do > about 16,000 TPS when running pgbench in debug mode. > > This problem is 100% reproducible here; happens every time. This is a laptop > running Mac OS X. It's possible the problem is specific to that platform. > I'm doing all my tests with the database itself setup for development, with > debug and assertions on. The lag spikes seem smaller without assertions on, > but they are still there. > > Here's a sample: > > transaction type: SELECT only What is this test script? I'm doing pgbench for tests. > scaling factor: 10 > query mode: simple > number of clients: 25 > number of threads: 1 > duration: 30 s > number of transactions actually processed: 301921 > average transaction lag: 1.133 ms (max 137.683 ms) > tps = 10011.527543 (including connections establishing) > tps = 10027.834189 (excluding connections establishing) > > And those slow ones are all at the end of the latency log file, as shown in > column 3 here: > > 22 11953 3369 0 1371578126 954881 > 23 11926 3370 0 1371578126 954918 > 3 12238 30310 0 1371578126 984634 > 7 12205 30350 0 1371578126 984742 > 8 12207 30359 0 1371578126 984792 > 11 12176 30325 0 1371578126 984837 > 13 12074 30292 0 1371578126 984882 > 0 12288 175452 0 1371578127 126340 > 9 12194 171948 0 1371578127 126421 > 12 12139 171915 0 1371578127 126466 > 24 11876 175657 0 1371578127 126507 Indeed, there are two spikes, but not all clients are concerned. As I have not seen that, debuging is hard. I'll give it a try on tomorrow. > When no one is sleeping, the timeout becomes infinite, so only returning data > will break it. This is intended behavior though. This is not coherent. Under --throttle there should basically always be someone asleep, unless the server cannot cope with the load and *all* transactions are late. A no time out state looks pretty unrealistic, because it means that there is no throttling. > I don't think the st->listen related code has anything to do with this > either. That flag is only used to track when clients have completed sending > their first query over to the server. Once reaching that point once, > afterward they should be "listening" for results each time they exit the > doCustom() code. This assumption seems false if you can have a "sleep" at the beginning of the sequence, which is what throttle is doing, but can be done by any custom script, so that the client is expected to wait before sending any command, thus there can be no select underway in that case. So listen should be set to 1 when a select as been sent, and set back to 0 when the result data have all been received. "doCustom" makes implicit assumptions about what is going on, whereas it should focus on looking at the incoming state, performing operations, and leaving with a state which correspond to the actual status, without assumptions about what is going to happen next. > st->listen goes to 1 very soon after startup and then it stays there, > and that logic seems fine too. -- Fabien.
pgsql-hackers by date: