Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date
Msg-id CALj2ACW394mcG+F7SKMt9zvBM2se=k5qtu-oeXC1gKSs97dyZA@mail.gmail.com
Whole thread Raw
In response to Re: pg_walinspect - a new extension to get raw WAL data and WAL stats  (Jeremy Schneider <schneider@ardentperf.com>)
Responses Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
List pgsql-hackers
On Wed, Oct 6, 2021 at 10:26 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:
>
> On 10/5/21 17:43, Bruce Momjian wrote:
> > On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> >> Specifically exposing pg_waldump functionality in SQL could speed up
> >> finding bugs in the PG logical replication code. We found and fixed a
> >> few over this past year, but there are more lurking out there.
> >
> > Uh, why is pg_waldump more important than other command line tool
> > information?
>
> Going down the list of all other utilities in src/bin:
>
> * pg_amcheck, pg_config, pg_controldata: already available in SQL
> * psql, pgbench, pg_dump: already available as client-side apps
> * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
> pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
> possible use case outside server OS access, most of these are too low
> level and don't even make sense in SQL
> * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
> don't feel any deep interest myself
>
> Speaking selfishly, there are a few reasons I would be specifically
> interested in pg_waldump (the only remaining one on the list).

Thanks Jeremy for the analysis.

> First, to better support existing features around logical replication
> and decoding.
>
> In particular, it seems inconsistent to me that all the replication
> management SQL functions take LSNs as arguments - and yet there's no
> SQL-based way to find the LSNs that you are supposed to pass into these
> functions.
>
> https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION
>
> Over the past few years I've been pulled in to help several large PG
> users who ran into these bugs, and it's very painful - because the only
> real remediation is to drop and recreate the replication slot, which
> means either re-copying all the data to the downstream system or
> figuring out a way to resync it. Some PG users have 3rd party tools like
> HVR which can do row-by-row resync (IIUC), but no matter how you slice
> it, we're talking about a lot of pain for people replicating large data
> sets between multiple systems. In most cases, the only/best option even
> with very large tables is to just make a fresh copy of the data - which
> can translate to a business outage of hours or even days.
>
> My favorite example is the SQL function pg_replication_slot_advance() -
> this could really help PG users find less painful solutions to broken
> decoding, however it's not really possible to /know/ an LSN to advance
> to without inspecting WAL. ISTM there's a strong use case here for a SQL
> interface on WAL inspection.
>
> Second: debugging and troubleshooting logical replication and decoding bugs.
>
> I helped track down a few logical replication bugs and get fixed into
> code at postgresql.org this past year. (But I give credit to others who
> are much better at C than I am, and who did a lot more work than I did
> on these bugs!)
>
> Logical decoding bugs are some of the hardest to fix - because all you
> have is a WAL stream, but you don't know the SQL or workload patterns or
> PG code paths which created that WAL stream.
>
> Dumping the WAL - knowing which objects and which types of operations
> are involved and stats like number of changes or number of
> subtransactions - this helps identify which transaction and what SQL in
> the application triggered the failure, which can help find short-term
> workarounds. Businesses need those short-term workarounds so they can
> keep going while we work on finding and fixing bugs, which can take some
> time. This is another place where I think a SQL interface to WAL would
> be helpful to PG users. Especially the ability to filter and trace a
> single transaction through a large number of WAL files in the directory.
>
> Third: statistics on WAL - especially full page writes. Giving users the
> full power of SQL allows much more sophisticated analysis of the WAL
> records. Personally, I'd probably find myself importing all the WAL
> stats into the DB anyway because SQL is my preferred data manipulation
> language.

Just to add to the above points, with the new extension pg_walinspect
we will have following advantages:
1) Usability - SQL callable functions will be easier to use for the
users/admins/developers.
2) Access Control - we can provide better access control for the WAL data/stats.
3) Emitting the actual WAL data(as bytea structure) and stats via SQL
callable functions will help to analyze and answer questions like how
much WAL data is being generated in the system, what kind of WAL data
it is, how many FPWs are happening and so on. Jermey has already given
more realistic use cases.
4) I came across this -  there's a similar capability in SQL server -
https://www.mssqltips.com/sqlservertip/3076/how-to-read-the-sql-server-database-transaction-log/

> >> Having the extension in core is actually the only way to avoid
> >> duplicated effort, by having shared code which the pg_dump binary and
> >> the extension both wrap or call.
> >
> > Uh, aren't you duplicating code by having pg_waldump as a command-line
> > tool and an extension?
>
> Well this whole conversation is just theoretical anyway until the code
> is shared.  :)  But if Bharath is writing functions to decode WAL, then
> wouldn't we just have pg_waldump use these same functions in order to
> avoid duplicating code?
>
> Bharath - was some code already posted and I just missed it?
>
> Looking at the proposed API from the initial email, I like that there's
> both stats functionality and WAL record inspection functionality
> (similar to pg_waldump). I like that there's the ability to pull the
> records as raw bytea data, however I think we're also going to want an
> ability in v1 of the patch to decode it (similar to pageinspect
> heap_page_item_attrs, etc).

I'm yet to start working on the patch. I will be doing it soon.

> Another feature that might be interesting down the road would be the
> ability to provide filtering of WAL records for security purposes. For
> example, allowing a user to only dump raw WAL records for one particular
> database, or maybe excluding WAL records that change system catalogs or
> the like. But I probably wouldn't start here, personally.

+1.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: strange case of "if ((a & b))"
Next
From: Amit Kapila
Date:
Subject: Re: pgsql: Document XLOG_INCLUDE_XID a little better