Move global variables of pgoutput to plugin private scope. - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject Move global variables of pgoutput to plugin private scope.
Date
Msg-id OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: Move global variables of pgoutput to plugin private scope.
Re: Move global variables of pgoutput to plugin private scope.
Re: Move global variables of pgoutput to plugin private scope.
List pgsql-hackers
Hi,

Per complain in another thread[1], I started to look into the global
variables in pgoutput.

Currently we have serval global variables in pgoutput, but each of them is
inherently local to an individual pgoutput instance. This could cause issues if
we switch to different output plugin instance in one session and could miss to
reset their value in case of errors. The analysis for each variable is as
follows:

- static HTAB *RelationSyncCache = NULL;

pgoutput creates this hash table under cacheMemoryContext to remember the
relation schemas that have been sent, but it's local to an individual pgoutput
instance, and because it's under global memory context, the hashtable is not
properly cleared in error paths which means it has a risk of being accessed in
a different output plugin instance. This was also mentioned in another thread[2].

So I think we'd better allocate this under output plugin private context.

But note that, instead of completely moving the hash table into the output
plugin private data, we need to to keep the static pointer variable for the map to
be accessed by the syscache callbacks. This is because syscache callbacks won't
be un-registered even after shutting down the output plugin, so we need a
static pointer to cache the map pointer so that callbacks can check it.

- static bool publish_no_origin;

This flag is also local to pgoutput instance, and we didn't reset the flag in
output shutdown callback, so if we consume changes from different slots, then
the second call would reuse the flag value that is set in the first call which
is unexpected. To completely avoid this issue, we think we'd better move this
flag to output plugin private data structure.

Example:
 SELECT data FROM pg_logical_slot_peek_binary_changes('isolation_slot_1', NULL, NULL, 'proto_version', '1',
'publication_names','pub', 'origin', 'none'); --- Set origin in this call. 
 SELECT data FROM pg_logical_slot_peek_binary_changes('isolation_slot_2', NULL, NULL, 'proto_version', '1',
'publication_names','pub');  -- Didn't set origin, but will reuse the origin flag in the first call. 

- static bool in_streaming;

While on it, I feel we can also move this flag to private data, although I didn't
see problems for this one.

- static bool publications_valid;

I thought we need to move this to private data as well, but we need to access this in a
syscache callback, which means we need to keep the static variable.

Attach the patches to change in_streaming, publish_no_origin and RelationSyncCache.
Suggestions and comments are welcome.

[1] https://www.postgresql.org/message-id/20230821182732.t3qc75i5s5xvovls%40awork3.anarazel.de
[2] https://www.postgresql.org/message-id/CAA4eK1LJ%3DCSsxETs5ydqP58OiWPiwodx%3DJqw89LQ7fMrRWqK9w%40mail.gmail.com

Best Regards,
Hou Zhijie


Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Bug fix for psql's meta-command \ev
Next
From: Andrey Lepikhov
Date:
Subject: Re: POC: GROUP BY optimization