Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1509050740290.429@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: checkpointer continuous flushing
List pgsql-hackers
Hello Amit,

>> Woops.... 14 GB/s and 1.2 GB/s?! Is this a *hard* disk??
>
> Yes, there is no SSD in system. I have confirmed the same.  There are RAID
> spinning drives.

Ok...

I guess that there is some kind of cache to explain these great tps 
figures, probably on the RAID controller. What does "lspci" says? Does 
"hdparm" suggests that the write cache is enabled? It would be fine if the 
I/O system has a BBU, but that could also hide some of the patch 
benefits...

A tentative explanation for the similar figures with and without sorting 
could be that depending on the controller cache size (may be 1GB or more) 
and firmware, the I/O system reorders disk writes so that they are 
basically sequential and the fact that pg sorts them beforehand has little 
or no impact. This may also be help by the fact that buffers are not 
really in random order to begin with as the warmup phase does an initial 
"select stuff from table".

There could be other possible factors such as the file system details, 
"WAFL" hacks... the tricks are endless:-)

Checking for the right explanation would involve removing the 
unconditional select warmup to use only a long and random warmup, and 
probably trying a much larger than cache database, and/or disabling the 
write cache, reading the hardware documentation in detail... But this is 
also a lot of bother and time.

Maybe the simplest approach would be to disable the write cache for the 
test. Is that possible?

>> Woops, 1.6 GB/s write... same questions, "rotating plates"??
>
> One thing to notice is that if I don't remove the output file 
> (output.img) the speed is much slower, see the below output. I think 
> this means in our case we will get ~320 MB/s

I would say that the OS was doing something here, and 320 MB/s looks more 
like an actual HDD RAID system sequential write performance.

>> If these are SSD, or if there is some SSD cache on top of the HDD, I would
>> not expect the patch to do much, because the SSD random I/O writes are
>> pretty comparable to sequential I/O writes.
>>
>> I would be curious whether flushing helps, though.
>
> Yes, me too. I think we should try to reach on consensus for exact 
> scenarios and configuration where this patch('es) can give benefit or we 
> want to verify if there is any regression as I have access to this m/c 
> for a very-very limited time.  This m/c might get formatted soon for 
> some other purpose.

Yep, it would be great if you have time for a flush test before it 
disappears... I think it is advisable to disable the write cache as it may 
also hide the impact of flushing.

>> So whether the database fits in 8 GB shared buffer during the 2 hours of
>> the pgbench run is an open question.
>
> With this kind of configuration, I have noticed that more than 80%
> of updates are HOT updates, not much bloat, so I think it won't
> cross 8GB limit, but still I can keep it to 32GB if you have any doubts.

The problem with performance tests is that you want to test one thing, but 
there are many factors that intervene and you may end up testing something 
else, such as lock contention or process scheduler or whatever, rather 
than what you were trying to put in evidence. So I would suggest to be on 
the safe side and use the larger value.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: dinesh kumar
Date:
Subject: Re: [PATCH] SQL function to report log message
Next
From: dinesh kumar
Date:
Subject: Re: [PATCH] SQL function to report log message