Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) - Mailing list pgsql-hackers
From | Mahi Gurram |
---|---|
Subject | Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) |
Date | |
Msg-id | CAGg=Gue94VZj1Hb37RBB0TDgzSSY-7sq=gSuqHdRdSoxh+3FCQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)
Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) |
List | pgsql-hackers |
Hi Thomas,
Thanks for your response and suggestions to change the code.
Now i have modified my code as per your suggestions. Now dsa_area pointer is not in shared memory, it is a global variable. Also, implemented all your code suggestions but unfortunately, no luck. Still facing the same behaviour. Refer the attachment for the modified code.
I have some doubts in your response. Please clarify.
I didn't try your code but I see a few different problems here. Every
backend is creating a new dsa area, and then storing the pointer to it
in shared memory instead of attaching from other backends using the
handle, and there are synchronisation problems. That isn't going to
work. Here's what I think you might want to try:
Actually i'm not creating dsa_area for every backend. I'm creating it only once(in BufferShmemHook).
* I put prints in my _PG_init and BufferShmemHook function to confirm the same.
As far as i know, _PG_Init of a shared_library/extension is called only once(during startup) by postmaster process, and all the postgres backends are forked/child process to postmaster process.
Since the backends are the postmaster's child processes and are created after the shared memory(dsa_area) has been created and attached, the backend/child process will receive the shared memory segment in its address space and as a result no shared memory operations like dsa_attach are required to access/use dsa data.
Please correct me, if i'm wrong.
3. Whether you are the backend that created it or a backend that
attached to it, I think you'll need to store the dsa_area in a global
variable for your UDFs to access. Note that the dsa_area object will
be different in each backend: there is no point in storing that
address itself in shared memory, as you have it, as you certainly
can't use it in any other backend. In other words, each backend that
attached has its own dsa_area object that it can use to access the
common dynamic shared memory area.
In case of forked processes, the OS actually does share the pages initially, because fork implements copy-on-write semantics. which means that provided none of the processes modifies the pages, they both points to same address and the same data.
Based on above theory, assume i have created dsa_area object in postmaster process(_PG_Init) and is a global variable, all the backends/forked processes can able to access/share the same dsa_area object and it's members.
Hence theoretically, the code should work with out any issues. But i'm sure why it is not working as expected :(
I tried debugging by putting prints, and observed the below things:
1. dsa_area_control address is different among postmaster process and backends.
2. After restarting, they seems to be same and hence it is working after that.
2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process, after dsa_create() +++++
2017-06-16 18:08:50.798 IST [9195] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:08:50.798 IST [9195] LOG: the dsa_area_handle is 1007561696
2017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in forked process +++++
2017-06-16 18:11:01.904 IST [9224] LOG: the address of dsa_area_control is 0x1dac910
2017-06-16 18:11:01.904 IST [9224] LOG: the dsa_area_handle is 0
2017-06-16 18:11:01.907 IST [9195] LOG: server process (PID 9224) was terminated by signal 11: Segmentation fault
2017-06-16 18:11:01.907 IST [9195] DETAIL: Failed process was running: select test_dsa_data_access(1);
2017-06-16 18:11:01.907 IST [9195] LOG: terminating any other active server processes
2017-06-16 18:11:01.907 IST [9227] FATAL: the database system is in recovery mode
2017-06-16 18:11:01.907 IST [9220] WARNING: terminating connection because of crash of another server process
2017-06-16 18:11:01.907 IST [9220] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-06-16 18:11:01.907 IST [9220] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-06-16 18:11:01.907 IST [9195] LOG: all server processes terminated; reinitialising
2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process, after dsa_create() +++++
2017-06-16 18:11:01.937 IST [9195] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:11:01.937 IST [9195] LOG: the dsa_area_handle is 1833840303
2017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in forked process +++++
2017-06-16 18:12:24.247 IST [9239] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:12:24.247 IST [9239] LOG: the dsa_area_handle is 1833840303
I may be wrong in my understanding, and i might be missing something :(
Please help me in sorting it out. Really appreciate for all your help :)
PS: In mac, It is working fine as expected. I'm facing this issue only in linux systems. I'm working over postgres 10.1 beta FYI.
Thanks & Best Regards,
- Mahi
On Thu, Jun 15, 2017 at 5:00 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
Hi MahiOn Thu, Jun 15, 2017 at 6:32 PM, Mahi Gurram <teckymahi@gmail.com> wrote:
> Followed the same as per your suggestion. Refer the code snippet below:
>
>> void
>> _PG_init(void){
>> RequestAddinShmemSpace(100000000);
>> PreviousShmemHook = shmem_startup_hook;
>> shmem_startup_hook = BufferShmemHook;
>> }
>> void BufferShmemHook(){
>> dsa_area *area;
>> dsa_pointer data_ptr;
>> char *mem;
>> area = dsa_create(my_tranche_id());
>> data_ptr = dsa_allocate(area, 42);
>> mem = (char *) dsa_get_address(area, data_ptr);
>> if (mem != NULL){
>> snprintf(mem, 42, "Hello world");
>> }
>> bool found;
>> shmemData = ShmemInitStruct("Mahi_Shared_Data",
>> sizeof(shared_data),
>> &found);
>> shmemData->shared_area = area;
>> shmemData->shared_area_handle = dsa_get_handle(area);
>> shmemData->shared_data_ptr = data_ptr;
>> shmemData->head=NULL;
>> }
>
>
> Wrote one UDF function, which is called by one of the client connection and
> that tries to use the same dsa. But unfortunately it is behaving strange.
>
> First call to my UDF function is throwing segmentation fault and postgres is
> quitting and auto restarting. If i try calling the same UDF function again
> in new connection(after postgres restart) it is working fine.
>
> Put some prints in postgres source code and found that dsa_allocate() is
> trying to use area->control(dsa_area_control object) which is pointing to
> wrong address but after restarting it is pointing to right address and hence
> it is working fine after restart.
>
> I'm totally confused and stuck at this point. Please help me in solving
> this.
>
> PS: It is working fine in Mac.. in only linux systems i'm facing this
> behaviour.
>
> I have attached the zip of my extension code along with screenshot of the
> pgclient and log file with debug prints for better understanding.
> *logfile is edited for providing some comments for better understanding.
>
> Please help me in solving this.
I didn't try your code but I see a few different problems here. Every
backend is creating a new dsa area, and then storing the pointer to it
in shared memory instead of attaching from other backends using the
handle, and there are synchronisation problems. That isn't going to
work. Here's what I think you might want to try:
1. In BufferShmemHook, acquire and release AddinShmemInitLock while
initialising "Mahi_Shared_Data" (just like pgss_shmem_startup does),
because any number of backends could be starting up at the same time
and would step on each other's toes here.
2. When ShmemInitStruct returns, check the value of 'found'. If it's
false, then this backend is the very first one to attach to this bit
of (traditional) shmem. So it should create the DSA area and store
the handle in the traditional shmem. Because we hold
AddinShmemInitLock we know that no one else can be doing that at the
same time. Before even trying to create the DSA area, you should
probably memset the whole thing to zero so that if you fail later, the
state isn't garbage. If 'found' is true, then we know it's already
all set up (or zeroed out), so instead of creating the DSA area it
should attach to it using the published handle.
3. Whether you are the backend that created it or a backend that
attached to it, I think you'll need to store the dsa_area in a global
variable for your UDFs to access. Note that the dsa_area object will
be different in each backend: there is no point in storing that
address itself in shared memory, as you have it, as you certainly
can't use it in any other backend. In other words, each backend that
attached has its own dsa_area object that it can use to access the
common dynamic shared memory area.
4. After creating, in this case I think you should call
dsa_pin(area), so that it doesn't go away when there are no backends
attached (ie because there are no backends running) (if I understand
correctly that you want this DSA area to last as long as the whole
cluster).
By the way, in _PG_init() where you have
RequestAddinShmemSpace(100000000) I think you want
RequestAddinShmemSpace(sizeof(shared_data)).
The key point is: only one backend should use LWLockNewTrancheId() and
dsa_create(), and then make the handle available to others; all the
other backends should use dsa_attach(). Then they'll all be attached
to the same dynamic shared memory area and can share data.
Attachment
pgsql-hackers by date: