On Thu, Mar 14, 2024 at 10:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > This fact makes me think that the slotsync worker might be able to
> > accept the primary_conninfo value even if there is no dbname in the
> > value. That is, if there is no dbname in the primary_conninfo, it uses
> > the username in accordance with the specs of the connection string.
> > Currently, the slotsync worker connects to the local database first
> > and then establishes the connection to the primary server. But if we
> > can reverse the two steps, it can get the dbname that has actually
> > been used to establish the remote connection and use it for the local
> > connection too. That way, the primary_conninfo generated by
> > pg_basebackup could work even without the patch. For example, if the
> > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > would connect to the postgres database. Given the 'postgres' database
> > is created by default and 'postgres' OS user is used in common, I
> > guess it could cover many cases in practice actually.
> >
>
> I think this is worth investigating but I suspect that in most cases
> users will end up using a replication connection without specifying
> the user name and we may not be able to give a meaningful error
> message when slotsync worker won't be able to connect. The same will
> be true even when the dbname same as the username would be used.
>
I attempted the change as suggested by Swada-San. Attached the PoC
patch .For it to work, I have to expose a new get api in libpq-fe
which gets dbname from stream-connection. Please have a look.
Without this PoC patch, the errors in slot-sync worker:
-----------------
a) If dbname is missing:
[1230932] LOG: slot sync worker started
[1230932] ERROR: slot synchronization requires dbname to be specified
in primary_conninfo
b) If specified db does not exist
[1230913] LOG: slot sync worker started
[1230913] FATAL: database "postgres1" does not exist
-----------------
Now with this patch:
-----------------
a) If the dbname same as user does not exist:
[1232473] LOG: slot sync worker started
[1232473] ERROR: could not connect to the primary server: connection
to server at "127.0.0.1", port 5433 failed: FATAL: database
"bckp_user" does not exist
b) If user itself is removed from primary_conninfo, libpq takes user
who has authenticated the system by default and gives error if db of
same name does not exist
ERROR: could not connect to the primary server: connection to server
at "127.0.0.1", port 5433 failed: FATAL: database "shveta" does not
exist
-----------------
The errors in second case look slightly confusing to me.
thanks
Shveta