On 19/08/17 13:49, Mark Kirkwood wrote:
>
>
> On 19/08/17 02:21, Jeremy Finzel wrote:
>> On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe
>> <scott.marlowe@gmail.com <mailto:scott.marlowe@gmail.com>> wrote:
>>
>> So do iostat or iotop show you if / where your disks are working
>> hardest? Or is this CPU overhead that's killing performance?
>>
>>
>> Sorry for the delayed reply. I took a look in more detail at the
>> query plans from our problem query during this incident. There are
>> actually 6 plans, because there were 6 unique queries. I traced one
>> query through our logs, and found something really interesting. That
>> is that all of the first 5 queries are creating temp tables, and all
>> of them took upwards of 500ms each to run. The final query, however,
>> is a simple select from the last temp table, and that query took
>> 0.035ms! This really confirms that somehow, the issue had to do with
>> /writing /to the SAN, I think. Of course this doesn't answer a whole
>> lot, because we had no other apparent issues with write performance
>> at all.
>>
>> I also provide some graphs below.
>>
>>
> Hi, graphs for latency (or await etc) might be worth looking at too -
> sometimes the troughs between the IO spikes are actually when the
> disks have been overwhelmed with queued up pending IOs...
>
>
Sorry - I see you *did* actually have iowait in there under your CPU
graph...which doesn't look to be showing up a lot of waiting. However
still might be well worth getting graphs showing per device waits and
utilizations.
regards
Mark