Re: WIP: Failover Slots - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: WIP: Failover Slots
Date
Msg-id CALLjQTSCHvcsF6y7=ZhmdMjJUMGLqt1-6Pz2rtb7PfFLxFfBOw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Failover Slots  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: WIP: Failover Slots
List pgsql-hackers
Hi,

here is my code level review:

0001:
This one looks ok except for broken indentation in the new notes in
xlogreader.c and .h. It's maybe slightly overdocumented but given the
complicated way the timeline reading works it's probably warranted.

0002:
+                /*
+                 * No way to wait for the process since it's not a child
+                 * of ours and there's no latch to set, so poll.
+                 *
+                 * We're checking this without any locks held, but
+                 * we'll recheck when we attempt to drop the slot.
+                 */
+                while (slot->in_use && slot->active_pid == active_pid
+                        && max_sleep_micros > 0)
+                {
+                    usleep(micros_per_sleep);
+                    max_sleep_micros -= micros_per_sleep;
+                }

Not sure I buy this, what about postmaster crashes and fast shutdown
requests etc. Also you do usleep for 10s which is quite long. I'd
prefer the classic wait for latch with timeout and pg crash check
here. And even if we go with usleep, then it should be 1s not 10s and
pg_usleep instead of usleep.

0003:
There is a lot of documentation improvements here that are not related
to failover slots or timeline following, it might be good idea to
split those into separate patch as they are separately useful IMHO.

Other than that it looks good to me.

About other things discussed in this thread. Yes it makes sense in
certain situations to handle this outside of WAL and that does require
notions of nodes, etc. That being said, the timeline following is
needed even if this is handled outside of WAL. And once timeline
following is in, the slots can be handled by the replication solution
itself which is good. But I think the failover slots are still a good
thing to have - it provides HA solution for anything that uses slots,
and that includes physical replication as well. If the specific
logical replication solution wants to handle it for some reason itself
outside of WAL, it can create non-failover slot so in my opinion we
ideally need both types of slots (and that's exactly what this patch
gives us).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?
Next
From: Martin Liška
Date:
Subject: [PATCH] Code refactoring related to -fsanitize=use-after-scope