Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5SR1OJz3F-fJJQq1_DcqK+xBDHnbaZ+D5QVrcHScBQr_A@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: On-demand running query plans using auto_explain and signals
List pgsql-hackers
On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism?  Would it not be OK to just let the LWLock do the sequencing of concurrent requests?  Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer.  Did you intend to have a message queue per slot?

The message queue cannot be reused, so I expect one slot per caller to be used passing parameters, - message queue will be created/released by demand by caller.

I don't believe a message queue cannot really be reused.  What would stop us from calling shm_mq_create() on the queue struct again?

To give you an idea, in my current prototype I have only the following struct:

typedef struct {
LWLock   *lock;
/*CmdStatusInfoSlot slots[CMDINFO_SLOTS];*/
pid_t target_pid;
pid_t sender_pid;
int request_type;
int result_code;
shm_mq buffer;
} CmdStatusInfo;

An instance of this is allocated on shared memory once, using BUFFER_SIZE of 8k.

In pg_cmdstatus() I lock on the LWLock to check if target_pid is 0, then it means nobody else is using this communication channel at the moment.  If that's the case, I set the pids and request_type and initialize the mq buffer.  Otherwise I just sleep and retry acquiring the lock (a timeout should be added here probably).

What sort of pathological problems are you concerned of?  The communicating backends should just detach from the message queue properly and have some timeout configured to prevent deadlocks.  Other than that, I don't see how having N slots really help the problem: in case of pathological problems you will just deplete them all sooner or later.

I afraid of unexpected problems :) - any part of signal handling or multiprocess communication is fragile. Slots are simple and simply attached to any process without necessity to alloc/free some memory.

Yes, but do slots solve the actual problem?  If there is only one message queue, you still have the same problem regardless of the number of slots you decide to have.

--
Alex

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Horizontal scalability/sharding
Next
From: Fujii Masao
Date:
Subject: Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index