Hi, Siva!
On Tue, Sep 4, 2018 at 11:01 PM R, Siva <sivasubr@amazon.com> wrote:
> We recently encountered an issue where the opaque data flags on a gin data leaf page was corrupted while replaying a
gininsert WAL record. Upon further examination of the redo code, we found a bug in ginRedoRecompress code, which
extractsthe WAL information and updates the page.
>
> Specifically, when a new segment is inserted in the middle of a page, a memmove operation is performed [1] at the
currentpoint in the page to make room for the new segment. If this segment insertion is followed by delete segment
actionsthat are yet to be processed and the total data size is very close to GinDataPageMaxDataSize, then we may move
thedata portion beyond the boundary causing the opaque data to be corrupted.
>
> One way of solving this problem is to perform the replay work on a scratch space, perform sanity check on the total
sizeof the data portion before copying it back to the actual page. While it involves additional memory allocation and
memcpyoperations, it is safer and similar to the 'do' code path where we ensure to make a copy of all segment past the
firstmodified segment before placing them back on the page [2].
>
> I have attached a patch for that approach here. Please let us know any comments or feedback.
Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company