Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CAFj8pRCv4h3XvDGm0FMfDj9DhM=sDAXLj_RNSd9WZJ91=cAiGA@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-hackers


2015-09-07 11:55 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Fri, Sep 4, 2015 at 6:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Sorry, but I still don't see how the slots help this issue - could you please elaborate?

with slot (or some similiar) there is not global locked resource. If I'll have a time at weekend I'll try to write some prototype.

But you will still lock on the slots list to find an unused one.  How is that substantially different from what I'm doing?

It is not necessary - you can use similar technique to what it does PGPROC. I am sending "lock free" demo.

I don't afraid about locks - short locks, when the range and time are limited. But there are lot of bugs, and fixes with the name "do interruptible some", and it is reason, why I prefer typical design for work with shared memory. 
 

>> Other smaller issues:
>>
>> * probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false

I'm not sending it like that because of the message size - I just find it more convenient. If you think it can be problematic, its easy to do this as before, by splitting lines on the receiving side.

Yes, shm queue sending data immediately - so slicing on sender generates more interprocess communication

Well, we are talking about hundreds to thousands bytes per plan in total.  And if my reading of shm_mq implementation is correct, if the message fits into the shared memory buffer, the receiver gets the direct pointer to the shared memory, no extra allocation/copy to process-local memory.  So this can be actually a win.

you have to calculate with signals and interprocess communication. the cost of memory allocation is not all. 

>> * pg_usleep(1000L); - it is related to single point resource

But not a highly concurrent one.

I believe so it is not becessary - waiting (sleeping) can be deeper in reading from queue - the code will be cleaner

The only way I expect this line to be reached is when a concurrent pg_cmdstatus() call is in progress: the receiving backend has set the target_pid and has created the queue, released the lock and now waits to read something from shm_mq.  So the backend that's trying to also use this communication channel can obtain the lwlock, checks if the channel is not used at the time, fails and then it needs to check again, but that's going to put a load on the CPU, so there's a small sleep.

The real problem could be if the process that was signaled to connect to the message queue never handles the interrupt, and we keep waiting forever in shm_mq_receive().  We could add a timeout parameter or just let the user cancel the call: send a cancellation request, use pg_cancel_backend() or set statement_timeout before running this.

This is valid question - for begin we can use a statement_timeout and we don't need to design some special (if you don't hold some important lock).

My example (the code has prototype quality) is little bit longer, but it work without global lock - the requester doesn't block any other

Pavel
 

--
Alex


Attachment

pgsql-hackers by date:

Previous
From: Victor Wagner
Date:
Subject: Re: Proposal: Implement failover on libpq connect level.
Next
From: Pavel Stehule
Date:
Subject: Re: psql tabcomplete - minor bugfix - tabcomplete for SET ROLE TO xxx