Re: Conflict detection and logging in logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict detection and logging in logical replication
Date
Msg-id CAA4eK1LMvgageidw+w01e=2E+Ki-fWHmFuj-gfd0CtCjycvj+Q@mail.gmail.com
Whole thread Raw
In response to RE: Conflict detection and logging in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Wed, Aug 21, 2024 at 8:35 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, August 21, 2024 9:33 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> > On 8/6/24 4:15 AM, Zhijie Hou (Fujitsu) wrote:
> >
> > > Thanks for the idea! I thought about few styles based on the suggested
> > > format, what do you think about the following ?
> >
> > Thanks for proposing formats. Before commenting on the specifics, I do want to
> > ensure that we're thinking about the following for the log formats:
> >
> > 1. For the PostgreSQL logs, we'll want to ensure we do it in a way that's as
> > convenient as possible for people to parse the context from scripts.
>
> Yeah. And I personally think the current log format is OK for parsing purposes.
>
> >
> > 2. Semi-related, I still think the simplest way to surface this info to a user is
> > through a "pg_stat_..." view or similar catalog mechanism (I'm less opinionated
> > on the how outside of we should make it available via SQL).
>
> We have a patch(v19-0002) in this thread to collect conflict stats and display
> them in the view, and the patch is under review.
>

IIUC, Jonathan is asking to store the conflict information (the one we
display in LOGs). We can do that separately as that is useful.

> Storing it into a catalog needs more analysis as we may need to add addition
> logic to clean up old conflict data in that catalog table. I think we can
> consider it as a future improvement.
>

Agreed. The cleanup part needs more consideration.

> >
> > 3. We should ensure we're able to convey to the user these details about the
> > conflict:
> >
> > * What time it occurred on the local server (which we'd have in the logs)
> > * What kind of conflict it is
> > * What table the conflict occurred on
> > * What action caused the conflict
> > * How the conflict was resolved (ability to include source/origin info)
>
> I think all above are already covered in the current conflict log. Except that
> we have not support resolving the conflict, so we don't log the resolution.
>
> >
> >
> > I think outputting the remote/local tuple value may be a parameter we need to
> > think about (with the desired outcome of trying to avoid another parameter). I
> > have a concern about unintentionally leaking data (and I understand that
> > someone with access to the logs does have a broad ability to view data); I'm
> > less concerned about the size of the logs, as conflicts in a well-designed
> > system should be rare (though a conflict storm could fill up the logs, likely there
> > are other issues to content with at that point).
>
> We could use an option to control, but the tuple value is already output in some
> existing cases (e.g. partition check, table constraints check, view with check
> constraints, unique violation), and it would test the current user's
> privileges to decide whether to output the tuple or not. So, I think it's OK
> to display the tuple for conflicts.
>

The current information is displayed keeping in mind that users should
be able to use that to manually resolve conflicts if required. If we
think there is a leak of information (either from a security angle or
otherwise) like tuple data then we can re-consider. However, as we are
displaying tuple information in other places as pointed out by
Hou-San, we thought it is also okay to display in this case.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Melih Mutlu
Date:
Subject: Re: ANALYZE ONLY
Next
From: Amit Kapila
Date:
Subject: Re: CREATE SUBSCRIPTION - add missing test case