Re: Get memory contexts of an arbitrary backend process - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: Get memory contexts of an arbitrary backend process |
Date | |
Msg-id | 9a59401241e9269254e70c8e0ef1cc7a@oss.nttdata.com Whole thread Raw |
In response to | Re: Get memory contexts of an arbitrary backend process (Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com>) |
Responses |
Re: Get memory contexts of an arbitrary backend process
|
List | pgsql-hackers |
On Fri, Dec 25, 2020 at 6:08 PM Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> wrote: Thanks for reviewing and kind suggestion! >> Attached a rewritten patch. > Thanks for updating patch. > > But when I had applyed the patch to the current HEAD and did make, I > got an error due to duplicate OIDs. > You need to rebase the patch. Assigned another OID. >> Accordingly, I also slightly modified the basic design as below. >> >> --- >> # Communication flow between the dumper and the requester >> - (1) When requesting memory context dumping, the dumper changes >> the struct on the shared memory state from 'ACCEPTABLE' to >> 'REQUESTING'. >> - (2) The dumper sends the signal to the dumper process and wait on >> the latch. >> - (3) When the dumper noticed the signal, it changes the state to >> 'DUMPING'. >> - (4) When the dumper completes dumping, it changes the state to >> 'DONE' and set the latch. >> - (5) The requestor reads the dump file and shows it to the user. >> Finally, the requestor removes the dump file and reset the shared >> memory state to 'ACCEPTABLE'. >> >> # Query cancellation >> - When the requestor cancels dumping, e.g. signaling using ctrl-C, >> the requestor changes the state of the shared memory to 'CANCELING'. >> - The dumper checks the state when it tries to change the state to >> 'DONE' at (4), and if the state is 'CANCELING', it initializes the >> dump file and reset the shared memory state to 'ACCEPTABLE'. >> >> # Cleanup dump file and the shared memory >> - In the normal case, the dumper removes the dump file and resets >> the shared memory entry as described in (5). >> - When something like query cancellation or process termination >> happens on the dumper after (1) and before (3), in other words, >> the state is 'REQUESTING', the requestor does the cleanup. >> - When something happens on the dumper or the requestor after (3) >> and before (4), in other words, the state is 'DUMPING', the dumper >> does the cleanup. Specifically, if the requestor cancels the query, >> it just changes the state to 'CANCELING' and the dumper notices it >> and cleans up things later. >> OTOH, when the dumper fails to dump, it cleans up the dump file and >> reset the shared memory state. >> - When something happens on the requestor after (4), i.e., the state >> is 'DONE', the requestor does the cleanup. >> - In the case of receiving SIGKILL or power failure, all dump files >> are removed in the crash recovery process. >> --- > If the dumper is terminated before it dumps, the requestor will appear > to enter an > infinite loop because the status of mcxtdumpShmem will not change. > The following are the steps to reproduce. > > - session1 > BEGIN; LOCK TABLE t; > - session2 > SELECT * FROM t; -- wait > - session3 > select pg_get_target_backend_memory_contexts(<pid of session2>); > -- wait > - session1 > select pg_terminate_backend(<pid of session2>); -- kill session2 > - session3 waits forever. > > Therefore, you may need to set mcxtdumpShmem->dump_status to > MCXTDUMPSTATUS_CANCELING > or other status before the dumper terminates. In this case, it may be difficult for the dumper to change dump_status because it's waiting for latch and dump_memory_contexts() is not called yet. Instead, it's possible for the requestor to check the existence of the dumper process periodically during waiting. I added this logic to the attached patch. > Also, although I have not been able to reproduce it, I believe that > with the current design, > if the requestor disappears right after the dumper dumps the memory > information, > the dump file will remain. > Since the current design appears to allow only one requestor per > instance, when the requestor > requests a dump, it might be a good idea to delete any remaining dump > files, if any. Although I'm not sure when the dump file remains, deleting any remaining dump files seems good for safety. I also added this idea to the attached patch. > The following are comments on the code. > > + proc = BackendPidGetProc(dst_pid); > + > + if (proc == NULL) > + { > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL server process", > dst_pid))); > + > + return (Datum) 1; > + } > For now, background writer, checkpointer and WAL writer are belong to > the auxiliary process. > Therefore, if we specify the PIDs of these processes for > pg_get_target_backend_memory_contexts(), > "PID xxxx is not a PostgreSQL server process" whould be outoput. > This confuses the user. > How about use AuxiliaryPidGetProc() to determine these processes? Thanks and I modified the patch to output the below message when it's an auxiliary process. | PID %d is not a PostgreSQL backend process but an auxiliary process. > > + ereport(INFO, > + (errmsg("The request has failed and now PID %d is > requsting dumping.", > + mcxtdumpShmem->src_pid))); > + > + LWLockRelease(McxtDumpLock); > You can release LWLock before ereport. Modified to release the lock before ereport. > + Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING); > typo? > It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING". Ops, it's a serious typo. Fixed it. > + ereport(INFO, > + (errmsg("PID %d is looked up by another process", > pid))); > This message is not accurate. > Because, in the current design, only one session can request a dump, > so if there are multiple requests, this message will be output even if > the > request is for a PID that is not looked up. Exactly. Modified the message as below. | Only one session can dump at a time and another session is dumping currently. > + if (fpout == NULL) > + { > + ereport(LOG, > + (errcode_for_file_access(), > + errmsg("could not write memory context file \"%s\": > %m", > + dumpfile))); > It would be better to use "dump file" instead of "memory context file". > > +static void > +McxtReqKill(int code, Datum arg) > The function McxtReqKill looks like it should have a different name. > (Since it doesn't do any kill). > How about McxtReqcCleanup or something similar? Thanks for your idea and I changed the name to McxtReqCleanup. Regards,
Attachment
pgsql-hackers by date: