Re: [HACKERS] proposal: extend shm_mq to support more use cases - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] proposal: extend shm_mq to support more use cases
Date
Msg-id CAMsr+YGPhTgXUW72+=Sm_s9etQAmcYRmpLCoFcTpgkJpwJBG5A@mail.gmail.com
Whole thread Raw
In response to [HACKERS] proposal: extend shm_mq to support more use cases  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
List pgsql-hackers
On 1 November 2017 at 21:24, Ildus Kurbangaliev
<i.kurbangaliev@postgrespro.ru> wrote:
> Hello! Apparently the current version of shm_mq supports only one sender
> and one receiver. I think it could be very useful to add possibility to
> change senders and receivers. It could be achieved by adding methods
> that remove sender or receiver for mq.

That'd put the complexity penalty on existing shm_mq users. You'd have
to find a way to make it work cheaply for existing 1:1 lightweight
uses.

I'm using shm_mq pretty extensively now, and mostly in one-to-many
two-way patterns between a persistent endpoint and a pool of transient
peers. It's not simple to get it right, there are a lot of traps
introduced by the queue's design for one-shot contexts where
everything is created together and destroyed together.

I think it'd make a lot of sense to make this easier, but I'd be
inclined to do it by layering on top of shm_mq rather than extending
it.

One thing that would make such patterns much easier would be a way to
safely reset a queue for re-use in a race-free way that didn't need
the use of any external synchronisation primitives. So once the
initial queues are set up, a client can attach, exchange messages on a
queue pair, and detach. And the persistent endpoint process can reset
the queues for re-use. Attempting to attach to a busy queue, or one
recently detached from, should fail gracefully.

Right now it's pretty much necessary to have an per-queue spinlock or
similar that you take before you overwrite a queue and re-create it.
And you need is-in-use flags for both persistent side and transient
side peers. That way the persistent side knows for sure there's no
transient side still attached to the queue when it overwrites it, and
so the transient side knows not to attempt to attach to a queue that's
already been used and detached by someone else because that Assert()s.
And you need the persistent side in-use flag so the transient side
knows the queue exists and is ready to be attached to, and it doesn't
try to attach to a queue that's in the process of being overwritten.

The issues I've had are:

* When one side detaches it can't tell if the other side is still
attached, detached, or has never attached. So you need extra book
keeping to find out "once I've detached, is it possible the other peer
could've attached and not yet detached" and ensure that no peer could
be attached when you zero and overwrite a queue.

* Attempting to set the receive/send proc and attach to an
already-attached queue Assert()s, there's no "try and fail
gracefully". So you need extra book keeping to record "this queue slot
has been used up, detached from, and needs to be reset". You can't
just try a queue until you find a free one, or retry your queue slot
until it's reset, or whatever.

* Receive/send proc is tracked by PGPROC not pid, and those slots are
re-used much faster and more predictably. So failures where you
attempt to set yourself as sender/receiver for a queue that's already
had a sender/receiver set can produce confusing errors.


I'd like to:

* Zero the PGPROC* for the sender when it detaches, and the receiver
when it detaches

* When the queue is marked as peer-detached but the local PGPROC is
still attached, have a function that takes the queue spinlock and
resets the queue to not-detached state with the local proc still
attached, ready for re-use by attaching a new peer.

* (At least optionally) return failure code when attempting to set
sender or receiver on a detached queue, not assert. So you can wait
and try again later when the queue is reset and ready for re-use, or
scan to find another free queue slot to attach to.

If there's a need to keep the sender and receiver PGPROC after detach
we could instead make the detached bool into a flag of
RECEIVER_DETACHED|SENDER_DETACHED. However, this adds
read-modify-write cycles to what's otherwise a simple set, so the
spinlock must be taken on detach an atomic must be used. So I'd rather
just blindly clear the PGPROC pointer for whichever side is detaching.

These changes would make using shm_mq persistently MUCH easier,
without imposing significant cost on existing users. And it'd make it
way simpler to build a layer on top for a 1:m 2-way comms system like
Ildus is talking about.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Ildus Kurbangaliev
Date:
Subject: [HACKERS] proposal: extend shm_mq to support more use cases
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] strange relcache.c debug message