Thread: questions about the logical decoding implementation

questions about the logical decoding implementation

From

Jeff Davis

Date:

15 August 2018, 22:04:12

1. Why do the files holding the spilled transaction data in reorderbuffer.c have a ".snap" suffix?

2. Those files can get quite large. Would it be reasonable to store them in another directory (e.g. pg_replslot_tmp) so that they can be placed on another mount point? It would also simplify the cleanup code.

3. Why are the files in pg_logical/snapshots (which also have a ".snap" extension) stored on disk at all? If I remove them and restart, they get recreated during decoding. The code adds a fair amount of complexity so I assume there's an important reason.

Regards,
Jeff Davis

Re: questions about the logical decoding implementation

From

Andres Freund

Date:

16 August 2018, 00:35:09

Hi,

On 2018-08-15 12:04:12 -0700, Jeff Davis wrote:
> 1. Why do the files holding the spilled transaction data in reorderbuffer.c
> have a ".snap" suffix?

I don't remember the genesis of that, sorry. I guess .spill or such
would have been better.  Perhaps it was because I initially intended for
them to be persisted at certain points, to allow cheaper restarting of
decoding?

> 2. Those files can get quite large. Would it be reasonable to store them in
> another directory (e.g. pg_replslot_tmp) so that they can be placed on
> another mount point? It would also simplify the cleanup code.

Yes, that sounds reasonable.  It'd also be reasonable to work on making
them smaller, IIRC there should be quite some savings potential.  Not
sure I buy the benefits of the cleanup simplification at this point.

> 3. Why are the files in pg_logical/snapshots (which also have a ".snap"
> extension) stored on disk at all? If I remove them and restart, they get
> recreated during decoding. The code adds a fair amount of complexity so I
> assume there's an important reason.

Yes, they're necessary. Without them we won't necessary have a correct
snapshot of running transactions. If those other transactions didn't do
DDL that doesn't immedialy have apparent effects, but if they do you'd
get corrupted catalog snapshots...  When initially starting decoding in
a new slot we wait for concurrent transactions to end until we have
enough information - but we can't do that once a slot exists without
loosing the ability to decode some transactions.  The on-disk files
basically just serialize the in-memory state of snapbuilder.c so that a
restart starting at that location has enough information.  We then only
allow the restart_lsn to be increased to a point where we have such a
snapshot serialized.

Greetings,

Andres Freund