Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) - Mailing list pgsql-hackers
From | Mahendranath Gurram |
---|---|
Subject | Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) |
Date | |
Msg-id | 15cc4e54751.c34fdaa87199.2745134041722427880@zohocorp.com Whole thread Raw |
In response to | Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA) (Mahi Gurram <teckymahi@gmail.com>) |
Responses |
Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)
|
List | pgsql-hackers |
Hi Thomas,
Any update on this?
Please let me know how can i proceed further.
Thanks & Best Regards,
-Mahi
---- On Fri, 16 Jun 2017 18:47:37 +0530 Mahi Gurram <teckymahi@gmail.com> wrote ----
Hi Thomas,Thanks for your response and suggestions to change the code.Now i have modified my code as per your suggestions. Now dsa_area pointer is not in shared memory, it is a global variable. Also, implemented all your code suggestions but unfortunately, no luck. Still facing the same behaviour. Refer the attachment for the modified code.I have some doubts in your response. Please clarify.I didn't try your code but I see a few different problems here. Every
backend is creating a new dsa area, and then storing the pointer to it
in shared memory instead of attaching from other backends using the
handle, and there are synchronisation problems. That isn't going to
work. Here's what I think you might want to try:Actually i'm not creating dsa_area for every backend. I'm creating it only once(in BufferShmemHook).* I put prints in my _PG_init and BufferShmemHook function to confirm the same.As far as i know, _PG_Init of a shared_library/extension is called only once(during startup) by postmaster process, and all the postgres backends are forked/child process to postmaster process.Since the backends are the postmaster's child processes and are created after the shared memory(dsa_area) has been created and attached, the backend/child process will receive the shared memory segment in its address space and as a result no shared memory operations like dsa_attach are required to access/use dsa data.Please correct me, if i'm wrong.3. Whether you are the backend that created it or a backend that
attached to it, I think you'll need to store the dsa_area in a global
variable for your UDFs to access. Note that the dsa_area object will
be different in each backend: there is no point in storing that
address itself in shared memory, as you have it, as you certainly
can't use it in any other backend. In other words, each backend that
attached has its own dsa_area object that it can use to access the
common dynamic shared memory area.In case of forked processes, the OS actually does share the pages initially, because fork implements copy-on-write semantics. which means that provided none of the processes modifies the pages, they both points to same address and the same data.Based on above theory, assume i have created dsa_area object in postmaster process(_PG_Init) and is a global variable, all the backends/forked processes can able to access/share the same dsa_area object and it's members.Hence theoretically, the code should work with out any issues. But i'm sure why it is not working as expected :(I tried debugging by putting prints, and observed the below things:1. dsa_area_control address is different among postmaster process and backends.2. After restarting, they seems to be same and hence it is working after that.2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process, after dsa_create() +++++2017-06-16 18:08:50.798 IST [9195] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:08:50.798 IST [9195] LOG: the dsa_area_handle is 10075616962017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in forked process +++++2017-06-16 18:11:01.904 IST [9224] LOG: the address of dsa_area_control is 0x1dac910
2017-06-16 18:11:01.904 IST [9224] LOG: the dsa_area_handle is 02017-06-16 18:11:01.907 IST [9195] LOG: server process (PID 9224) was terminated by signal 11: Segmentation fault2017-06-16 18:11:01.907 IST [9195] DETAIL: Failed process was running: select test_dsa_data_access(1);2017-06-16 18:11:01.907 IST [9195] LOG: terminating any other active server processes2017-06-16 18:11:01.907 IST [9227] FATAL: the database system is in recovery mode2017-06-16 18:11:01.907 IST [9220] WARNING: terminating connection because of crash of another server process2017-06-16 18:11:01.907 IST [9220] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.2017-06-16 18:11:01.907 IST [9220] HINT: In a moment you should be able to reconnect to the database and repeat your command.2017-06-16 18:11:01.907 IST [9195] LOG: all server processes terminated; reinitialising2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process, after dsa_create() +++++2017-06-16 18:11:01.937 IST [9195] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:11:01.937 IST [9195] LOG: the dsa_area_handle is 18338403032017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in forked process +++++2017-06-16 18:12:24.247 IST [9239] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:12:24.247 IST [9239] LOG: the dsa_area_handle is 1833840303I may be wrong in my understanding, and i might be missing something :(Please help me in sorting it out. Really appreciate for all your help :)PS: In mac, It is working fine as expected. I'm facing this issue only in linux systems. I'm working over postgres 10.1 beta FYI.Thanks & Best Regards,- MahiOn Thu, Jun 15, 2017 at 5:00 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:On Thu, Jun 15, 2017 at 6:32 PM, Mahi Gurram <teckymahi@gmail.com> wrote:> Followed the same as per your suggestion. Refer the code snippet below:>>> void>> _PG_init(void){>> RequestAddinShmemSpace(100000000);>> PreviousShmemHook = shmem_startup_hook;>> shmem_startup_hook = BufferShmemHook;>> }>> void BufferShmemHook(){>> dsa_area *area;>> dsa_pointer data_ptr;>> char *mem;>> area = dsa_create(my_tranche_id());>> data_ptr = dsa_allocate(area, 42);>> mem = (char *) dsa_get_address(area, data_ptr);>> if (mem != NULL){>> snprintf(mem, 42, "Hello world");>> }>> bool found;>> shmemData = ShmemInitStruct("Mahi_Shared_Data",>> sizeof(shared_data),>> &found);>> shmemData->shared_area = area;>> shmemData->shared_area_handle = dsa_get_handle(area);>> shmemData->shared_data_ptr = data_ptr;>> shmemData->head=NULL;>> }>>> Wrote one UDF function, which is called by one of the client connection and> that tries to use the same dsa. But unfortunately it is behaving strange.>> First call to my UDF function is throwing segmentation fault and postgres is> quitting and auto restarting. If i try calling the same UDF function again> in new connection(after postgres restart) it is working fine.>> Put some prints in postgres source code and found that dsa_allocate() is> trying to use area->control(dsa_area_control object) which is pointing to> wrong address but after restarting it is pointing to right address and hence> it is working fine after restart.>> I'm totally confused and stuck at this point. Please help me in solving> this.>> PS: It is working fine in Mac.. in only linux systems i'm facing this> behaviour.>> I have attached the zip of my extension code along with screenshot of the> pgclient and log file with debug prints for better understanding.> *logfile is edited for providing some comments for better understanding.>> Please help me in solving this.Hi MahiI didn't try your code but I see a few different problems here. Everybackend is creating a new dsa area, and then storing the pointer to itin shared memory instead of attaching from other backends using thehandle, and there are synchronisation problems. That isn't going towork. Here's what I think you might want to try:1. In BufferShmemHook, acquire and release AddinShmemInitLock whileinitialising "Mahi_Shared_Data" (just like pgss_shmem_startup does),because any number of backends could be starting up at the same timeand would step on each other's toes here.2. When ShmemInitStruct returns, check the value of 'found'. If it'sfalse, then this backend is the very first one to attach to this bitof (traditional) shmem. So it should create the DSA area and storethe handle in the traditional shmem. Because we holdAddinShmemInitLock we know that no one else can be doing that at thesame time. Before even trying to create the DSA area, you shouldprobably memset the whole thing to zero so that if you fail later, thestate isn't garbage. If 'found' is true, then we know it's alreadyall set up (or zeroed out), so instead of creating the DSA area itshould attach to it using the published handle.3. Whether you are the backend that created it or a backend thatattached to it, I think you'll need to store the dsa_area in a globalvariable for your UDFs to access. Note that the dsa_area object willbe different in each backend: there is no point in storing thataddress itself in shared memory, as you have it, as you certainlycan't use it in any other backend. In other words, each backend thatattached has its own dsa_area object that it can use to access thecommon dynamic shared memory area.4. After creating, in this case I think you should calldsa_pin(area), so that it doesn't go away when there are no backendsattached (ie because there are no backends running) (if I understandcorrectly that you want this DSA area to last as long as the wholecluster).By the way, in _PG_init() where you haveRequestAddinShmemSpace(100000000) I think you wantRequestAddinShmemSpace(sizeof(shared_data)).The key point is: only one backend should use LWLockNewTrancheId() anddsa_create(), and then make the handle available to others; all theother backends should use dsa_attach(). Then they'll all be attachedto the same dynamic shared memory area and can share data.--Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)To make changes to your subscription:
pgsql-hackers by date: