On Mon, Mar 16, 2020 at 09:21:22AM +0100, Julien Rouhaud wrote:
> On Mon, Mar 16, 2020 at 12:29:28PM +0900, Michael Paquier wrote:
>> With a large amount of
>> shared buffer eviction you actually increase the risk of torn page
>> reads. Instead of a logic relying on partition mapping locks, which
>> could be unwise on performance grounds, did you consider different
>> approaches? For example a kind of pre-emptive lock on the page in
>> storage to prevent any shared buffer operation to happen while the
>> block is read from storage, that would act like a barrier.
>
> Even with a workload having a large shared_buffers eviction pattern, I don't
> think that there's a high probability of hitting a torn page. Unless I'm
> mistaken it can only happen if all those steps happen concurrently to doing the
> block read just after releasing the LWLock:
>
> - postgres read the same block in shared_buffers (including all the locking)
> - dirties it
> - writes part of the page
>
> It's certainly possible, but it seems so unlikely that the optimistic lock-less
> approach seems like a very good tradeoff.
Having false reports in this area could be very confusing for the
user. That's for example possible now with checksum verification and
base backups.
>> I guess that this leads to the fact that this function may be better as
>> a contrib module, with the addition of some better-suited APIs in core
>> (see paragraph above).
>
> Below?
Above. This thought more precisely:
>> For example a kind of pre-emptive lock on the page in
>> storage to prevent any shared buffer operation to happen while the
>> block is read from storage, that would act like a barrier.
> For the record when I first tested that feature I did try to check dirty
> blocks, and it seemed that dirty blocks of shared relation were sometimes
> wrongly reported as corrupted. I didn't try to investigate more though.
Hmm. It would be good to look at that, correct verification of shared
relations matter.
>> + * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
>> + * disk either before the end of the next checkpoint or during recovery in
>> + * case of unsafe shutdown
>> Not sure that the indentation is going to react well on that part of
>> the patch, perhaps it would be better to add some "/*-------" at the
>> beginning and end of the comment block to tell pgindent to ignore this
>> part?
>
> Ok. Although I think only the beginning comment is needed?
From src/tools/pgindent/README:
"pgindent will reflow any comment block that's not at the left margin.
If this messes up manual formatting that ought to be preserved,
protect the comment block with some dashes:"
/*----------
* Text here will not be touched by pgindent.
*----------
*/
>> Based on the feedback gathered on this thread, I guess that you should
>> have a SRF returning the list of broken blocks, as well as NOTICE
>> messages.
>
> The current patch has an SRF and a WARNING message, do you want an additional
> NOTICE message or downgrade the existing one?
Right, not sure which one is better, for zero_damaged_pages a WARNING
is used.
--
Michael