[HACKERS] Separation walsender & normal backends - Mailing list pgsql-hackers

From Andres Freund
Subject [HACKERS] Separation walsender & normal backends
Date
Msg-id 20170425061941.yw2kb3gumlld7vm6@alap3.anarazel.de
Whole thread Raw
Responses Re: [HACKERS] Separation walsender & normal backends  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

I've for a while suspected that the separation & duplication of
infrastructure between walsenders and normal backends isn't nice.  But
looking at the code changes in v10 drove that home quite a bit.

The major changes in this area are
1) d1ecd539477fe640455dc890216a7c1561e047b4 - Add a SHOW command to the replication command language.
2) 7c4f52409a8c7d85ed169bbbc1f6092274d03920 - Logical replication support for initial data copy

The first adds SHOW support for the replication protocol. With such
oddities that "SHOW hba_file;" works, but "show hba_file;"
doesn't. Unless your're connected to a database, when feature 2) kicks
in and falls back to the normal SQL parser (as it'd for most other SQL
commands).

With 2) we can execute normal SQL over a walsender connection. That's
useful for logical rep because it needs to able to copy data (via
e.g. COPY), read catalogs (SELECT) and stream data (via
START_REPLICATION).

The problem is that we now end up with fairly similar infrastructure,
that's duplicated in walsender and normal backends. We've already seen a
couple bugs stem from this:
- parallel queries hang when executed over walsender connection
- query cancel doesn't work over walsender connection
- parser/scanner hack deciding between replication and normal grammer
  isn't correct
- SnapBuildClearExportedSnapshot isn't called reliably anymore
- config files aren't reloaded while idle in a walsender connection

while all of those are fixable, and several of them already have
proposed fixes, I think this is some indication that the current split
isn't well thought out.  I think in hindsight, the whole idea of
introducing a separate protocol/language for the replication protocol
was a rather bad one.

I think we should give up on the current course, and decide to unify as
much as possible.  Petr already proposed some fixes for the above
(unifying signal and config file handling), but I don't think that
solves the more fundamental issue of different grammars.

For me it's fairly obvious that we should try to merge the two grammars,
and live with the slight uglyness that'll caused by doing so.  The
primary problem here is that both languages "look" different, primary
that the replication protocol uses enforced-all-caps-with-underscores as
style.

Attached is a very rough POC, that unifies the two grammars.
Unsurprisingly that requires the addition of a bunch of additional
keywords, but at least they're all unreserved.  Both grammars seem to
mostly merge well, except for repl_scanner.l's RECPTR which I wasn't
able to add to scan.l within a couple minutes (therefore single quotes
are now necessary).  This really is WIP, and not meant as a patch to be
reviewed in detail, but just as something to be discussed.

I'm probably going to be tarred and feathered for this position, but I
think we need to fix this before v10 is coming out.  We're taking on
quite some architectural burden here, and I'm pretty sure we'd otherwise
have to fix it in v11, so we'll be stuck with some odd-duck v10, that
behaves different from all other versions.

If so, we'd have to:
- gate catalog access more careful in !am_db_walsender connections
- find a better way to deal with repl_scan.l's RECPTR
- remove replnodes.h (or move at least parts of it) into parsenodes.h
- do a lot of other minor cleanup

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Patch - Tcl 8.6 version support for PostgreSQL
Next
From: Simon Riggs
Date:
Subject: Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly