Thread: v12.0: segfault in reindex CONCURRENTLY
One of our servers crashed last night like this: < 2019-10-10 22:31:02.186 EDT postgres >STATEMENT: REINDEX INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx < 2019-10-10 22:31:02.399 EDT >LOG: server process (PID 29857) was terminated by signal 11: Segmentation fault < 2019-10-10 22:31:02.399 EDT >DETAIL: Failed process was running: REINDEX INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx < 2019-10-10 22:31:02.399 EDT >LOG: terminating any other active server processes ts=# \d+ child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx Index "child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx" Column | Type | Key? | Definition | Storage | Stats target ---------+---------+------+------------+---------+-------------- site_id | integer | yes | site_id | plain | btree, for table "child.eric_umts_rnc_utrancell_hsdsch_eul_201910" That's an index on a table partition, but not itself a child of a relkind=I index. Unfortunately, there was no core file, and I'm still trying to reproduce it. I can't see that the table was INSERTed into during the reindex... But looks like it was SELECTed from, and the report finished within 1sec of the crash: (2019-10-10 22:30:50,485 - p1604 t140325365622592 - INFO): PID 1604 finished running report; est=None rows=552; cols=83;[...] duration:12 postgres=# SELECT log_time, pid, session_id, left(message,99), detail FROM postgres_log_2019_10_10_2200 WHERE pid=29857 OR(log_time BETWEEN '2019-10-10 22:31:02.18' AND '2019-10-10 22:31:02.4' AND NOT message~'crash of another') ORDER BY log_timeLIMIT 9; 2019-10-10 22:30:24.441-04 | 29857 | 5d9fe93f.74a1 | temporary file: path "base/pgsql_tmp/pgsql_tmp29857.0.sharedfileset/0.0",size 3096576 | 2019-10-10 22:30:24.442-04 | 29857 | 5d9fe93f.74a1 | temporary file: path "base/pgsql_tmp/pgsql_tmp29857.0.sharedfileset/1.0",size 2809856 | 2019-10-10 22:30:24.907-04 | 29857 | 5d9fe93f.74a1 | process 29857 still waiting for ShareLock on virtual transaction 30/103010after 333.078 ms | Process holding the lock: 29671. Wait queue: 29857. 2019-10-10 22:31:02.186-04 | 29857 | 5d9fe93f.74a1 | process 29857 acquired ShareLock on virtual transaction 30/103010 after37611.995 ms | 2019-10-10 22:31:02.186-04 | 29671 | 5d9fe92a.73e7 | duration: 50044.778 ms statement: SELECT fn, sz FROM +| | | | (SELECT file_name fn, file_size_bytes sz, +| | | | | 2019-10-10 22:31:02.399-04 | 1161 | 5d9cad9e.489 | terminating any other active server processes | 2019-10-10 22:31:02.399-04 | 1161 | 5d9cad9e.489 | server process (PID 29857) was terminated by signal 11: Segmentationfault | Failed process was running: REINDEX INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx Justin
On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote: > That's an index on a table partition, but not itself a child of a relkind=I > index. Interesting. Testing with a partition tree, and indexes on leaves which do not have dependencies with a parent I cannot reproduce anything. Perhaps you have some concurrent operations going on? > Unfortunately, there was no core file, and I'm still trying to reproduce it. Forgot to set ulimit -c? Having a backtrace would surely help. -- Michael
Attachment
On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote: > On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote: > > Unfortunately, there was no core file, and I'm still trying to reproduce it. > > Forgot to set ulimit -c? Having a backtrace would surely help. Fortunately (?) another server hit crashed last night. (Doesn't appear to be relevant, but this table has no inheritence/partition-ness). Looks like it's a race condition and dereferencing *holder=NULL. The first crash was probably the same bug, due to report query running during "reindex CONCURRENTLY", and probably finished at nearly the same time as another locker. Relevant code introduced here: commit ab0dfc961b6a821f23d9c40c723d11380ce195a6 Author: Alvaro Herrera <alvherre@alvh.no-ip.org> Date: Tue Apr 2 15:18:08 2019 -0300 Report progress of CREATE INDEX operations Needs to be conditionalized (as anticipated by the comment) + if (holder) pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID, holder->pid); Core was generated by `postgres: postgres ts [local] REINDEX '. Program terminated with signal 11, Segmentation fault. #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true) atlmgr.c:911 #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) at indexcmds.c:3090 #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352 #3 0x00000000007657fe in standard_ProcessUtility (pstmt=pstmt@entry=0x1d05468, queryString=queryString@entry=0x1d046e0 "REINDEXINDEX CONCURRENTLY loaded_cdr_files_filename", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at utility.c:787 #4 0x00007f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename",context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pg_stat_statements.c:1006 #5 0x0000000000762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175 #6 0x0000000000763267 in PortalRunMulti (portal=portal@entry=0x1d7a4e0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x1d05548, altdest=altdest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at pquery.c:1328 #7 0x0000000000763e45 in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x1d05548, altdest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pquery.c:796 #8 0x000000000075ff45 in exec_simple_query (query_string=<optimized out>) at postgres.c:1215 #9 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4236 #10 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431 #11 BackendStartup (port=0x1d2b340) at postmaster.c:4122 #12 ServerLoop () at postmaster.c:1704 #13 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cff280) at postmaster.c:1377 #14 0x0000000000484c93 in main (argc=3, argv=0x1cff280) at main.c:228 bt f #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true) atlmgr.c:911 holder = 0x0 lockholders = 0x1d9b778 holders = <optimized out> lc = 0x1d9bf80 total = <optimized out> done = 1 #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) at indexcmds.c:3090 heapRelationIds = 0x1d30360 indexIds = 0x1d303b0 newIndexIds = <optimized out> relationLocks = <optimized out> lockTags = <optimized out> lc = 0x0 lc2 = 0x0 private_context = <optimized out> oldcontext = <optimized out> relkind = 105 'i' relationName = 0x0 relationNamespace = 0x0 ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = {tv_sec = 281483566645394, tv_usec = 75668733820930},ru_stime = {tv_sec = 0, tv_usec = 30592272}, { ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 105, __ru_idrss_word= 105}, {ru_isrss = -926385342574214656, __ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, __ru_minflt_word = 8924839}, {ru_majflt = 0,__ru_majflt_word = 0}, {ru_nswap = 17618, __ru_nswap_word = 17618}, {ru_inblock = 139781327898864, __ru_inblock_word = 139781327898864}, {ru_oublock= 30430312, __ru_oublock_word = 30430312}, { ru_msgsnd = 139781327898864, __ru_msgsnd_word = 139781327898864}, {ru_msgrcv = 139781327898864, __ru_msgrcv_word= 139781327898864}, {ru_nsignals = 139781327898864, __ru_nsignals_word = 139781327898864}, {ru_nvcsw = 139781327898864, __ru_nvcsw_word = 139781327898864}, {ru_nivcsw= 139781327898864, __ru_nivcsw_word = 139781327898864}}} __func__ = "ReindexRelationConcurrently" #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352 state = {concurrent = true, locked_table_oid = 17608} indOid = 17618 irel = <optimized out> persistence = 112 'p' ...
Resending this message, which didn't make it to the list when I sent it earlier. (And, notified -www). On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote: > On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote: > > Unfortunately, there was no core file, and I'm still trying to reproduce it. > > Forgot to set ulimit -c? Having a backtrace would surely help. Fortunately (?) another server hit crashed last night. (Doesn't appear to be relevant, but this table has no inheritence/partition-ness). Looks like it's a race condition and dereferencing *holder=NULL. The first crash was probably the same bug, due to report query running during "reindex CONCURRENTLY", and probably finished at nearly the same time as another locker. Relevant code introduced here: commit ab0dfc961b6a821f23d9c40c723d11380ce195a6 Author: Alvaro Herrera <alvherre@alvh.no-ip.org> Date: Tue Apr 2 15:18:08 2019 -0300 Report progress of CREATE INDEX operations Needs to be conditionalized (as anticipated by the comment) + if (holder) pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID, holder->pid); Core was generated by `postgres: postgres ts [local] REINDEX '. Program terminated with signal 11, Segmentation fault. #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true) atlmgr.c:911 #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) at indexcmds.c:3090 #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352 #3 0x00000000007657fe in standard_ProcessUtility (pstmt=pstmt@entry=0x1d05468, queryString=queryString@entry=0x1d046e0 "REINDEXINDEX CONCURRENTLY loaded_cdr_files_filename", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at utility.c:787 #4 0x00007f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename",context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pg_stat_statements.c:1006 #5 0x0000000000762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175 #6 0x0000000000763267 in PortalRunMulti (portal=portal@entry=0x1d7a4e0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x1d05548, altdest=altdest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at pquery.c:1328 #7 0x0000000000763e45 in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x1d05548, altdest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pquery.c:796 #8 0x000000000075ff45 in exec_simple_query (query_string=<optimized out>) at postgres.c:1215 #9 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4236 #10 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431 #11 BackendStartup (port=0x1d2b340) at postmaster.c:4122 #12 ServerLoop () at postmaster.c:1704 #13 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cff280) at postmaster.c:1377 #14 0x0000000000484c93 in main (argc=3, argv=0x1cff280) at main.c:228 bt f #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true) atlmgr.c:911 holder = 0x0 lockholders = 0x1d9b778 holders = <optimized out> lc = 0x1d9bf80 total = <optimized out> done = 1 #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) at indexcmds.c:3090 heapRelationIds = 0x1d30360 indexIds = 0x1d303b0 newIndexIds = <optimized out> relationLocks = <optimized out> lockTags = <optimized out> lc = 0x0 lc2 = 0x0 private_context = <optimized out> oldcontext = <optimized out> relkind = 105 'i' relationName = 0x0 relationNamespace = 0x0 ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = {tv_sec = 281483566645394, tv_usec = 75668733820930},ru_stime = {tv_sec = 0, tv_usec = 30592272}, { ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 105, __ru_idrss_word= 105}, {ru_isrss = -926385342574214656, __ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, __ru_minflt_word = 8924839}, {ru_majflt = 0,__ru_majflt_word = 0}, {ru_nswap = 17618, __ru_nswap_word = 17618}, {ru_inblock = 139781327898864, __ru_inblock_word = 139781327898864}, {ru_oublock= 30430312, __ru_oublock_word = 30430312}, { ru_msgsnd = 139781327898864, __ru_msgsnd_word = 139781327898864}, {ru_msgrcv = 139781327898864, __ru_msgrcv_word= 139781327898864}, {ru_nsignals = 139781327898864, __ru_nsignals_word = 139781327898864}, {ru_nvcsw = 139781327898864, __ru_nvcsw_word = 139781327898864}, {ru_nivcsw= 139781327898864, __ru_nivcsw_word = 139781327898864}}} __func__ = "ReindexRelationConcurrently" #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352 state = {concurrent = true, locked_table_oid = 17608} indOid = 17618 irel = <optimized out> persistence = 112 'p' ...
On 2019-Oct-13, Justin Pryzby wrote: > Looks like it's a race condition and dereferencing *holder=NULL. The first > crash was probably the same bug, due to report query running during "reindex > CONCURRENTLY", and probably finished at nearly the same time as another locker. Ooh, right, makes sense. There's another spot with the same mistake ... this patch should fix it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Oct 13, 2019 at 03:10:21PM -0300, Alvaro Herrera wrote: > On 2019-Oct-13, Justin Pryzby wrote: > > > Looks like it's a race condition and dereferencing *holder=NULL. The first > > crash was probably the same bug, due to report query running during "reindex > > CONCURRENTLY", and probably finished at nearly the same time as another locker. > > Ooh, right, makes sense. There's another spot with the same mistake ... > this patch should fix it. I would maybe chop off the 2nd sentence, since conditionalizing indicates that we do actually care. + * If requested, publish who we're going to wait for. This is not + * 100% accurate if they're already gone, but we don't care. Justin
On 2019-Oct-13, Justin Pryzby wrote: > On Sun, Oct 13, 2019 at 03:10:21PM -0300, Alvaro Herrera wrote: > > On 2019-Oct-13, Justin Pryzby wrote: > > > > > Looks like it's a race condition and dereferencing *holder=NULL. The first > > > crash was probably the same bug, due to report query running during "reindex > > > CONCURRENTLY", and probably finished at nearly the same time as another locker. > > > > Ooh, right, makes sense. There's another spot with the same mistake ... > > this patch should fix it. > > I would maybe chop off the 2nd sentence, since conditionalizing indicates that > we do actually care. > > + * If requested, publish who we're going to wait for. This is not > + * 100% accurate if they're already gone, but we don't care. True. And we can copy the resulting comment to the other spot. (FWIW I expect the crash is possible not just in reindex but also in CREATE INDEX CONCURRENTLY.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Oct 13, 2019 at 04:18:34PM -0300, Alvaro Herrera wrote: > True. And we can copy the resulting comment to the other spot. > > (FWIW I expect the crash is possible not just in reindex but also in > CREATE INDEX CONCURRENTLY.) I need to think about that, but shouldn't we have a way to reproduce that case rather reliably with an isolation test? The patch looks to good to me, these are also the two places I spotted yesterday after a quick lookup. The only other caller is isTempNamespaceInUse() which does its thing correctly. -- Michael
Attachment
On Mon, Oct 14, 2019 at 08:57:16AM +0900, Michael Paquier wrote: > I need to think about that, but shouldn't we have a way to reproduce > that case rather reliably with an isolation test? The patch looks to > good to me, these are also the two places I spotted yesterday after a > quick lookup. The only other caller is isTempNamespaceInUse() which > does its thing correctly. Actually, reindex-concurrently.spec stresses that, except that in order to reproduce the failure we need to close the connection exactly in the waiting loop before sending the progress report but after looking at VirtualTransactionIdIsValid. Using a debugger and a simple checkpoint I can easily reproduce the crash, but we'd need more to make that test case deterministic, like a termination with the correct timing. So, Alvaro, your patch looks good to me. Could you apply it? -- Michael
Attachment
On 2019-Oct-15, Michael Paquier wrote: > So, Alvaro, your patch looks good to me. Could you apply it? Thanks, pushed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Oct 13, 2019 at 04:18:34PM -0300, Alvaro Herrera wrote: > (FWIW I expect the crash is possible not just in reindex but also in > CREATE INDEX CONCURRENTLY.) FWIW, for sake of list archives, and for anyone running v12 hoping to avoid crashing, I believe we hit this for DROP INDEX CONCURRENTLY, although I don't have the backtrace to prove it. Justin
On Wed, Oct 16, 2019 at 09:53:56AM -0300, Alvaro Herrera wrote: > Thanks, pushed. Thanks, Alvaro. -- Michael
Attachment
On Wed, Oct 16, 2019 at 04:11:46PM -0500, Justin Pryzby wrote: > On Sun, Oct 13, 2019 at 04:18:34PM -0300, Alvaro Herrera wrote: >> (FWIW I expect the crash is possible not just in reindex but also in >> CREATE INDEX CONCURRENTLY.) > > FWIW, for sake of list archives, and for anyone running v12 hoping to avoid > crashing, I believe we hit this for DROP INDEX CONCURRENTLY, although I don't > have the backtrace to prove it. You may not have a backtrace, but I think that you are right: WaitForLockers() gets called in index_drop() with progress reporting enabled. index_drop() would also be taken by REINDEX CONCURRENTLY through performMultipleDeletions() but we cannot know if it gets used for REINDEX CONCURRENTLY or for DROP INDEX CONCURRENTLY as it goes through the central deletion machinery, so we have to mark progress reporting as true anyway. Maybe that's worth a comment in index_drop when calling WaitForLockers() because it is not actually that obvious, say like that: @@ -2157,7 +2157,10 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode) /* * Wait till every transaction that saw the old index state has - * finished. + * finished. Progress reporting is enabled here for REINDEX + * CONCURRENTLY, but not for DROP INDEX CONCURRENTLY. Track + * the progress through WaitForLockers() anyway, the information + * will not show up if using DROP INDEX CONCURRENTLY. */ WaitForLockers(heaplocktag, AccessExclusiveLock, true); Thoughts? -- Michael
Attachment
On 2019-Oct-17, Michael Paquier wrote: > You may not have a backtrace, but I think that you are right: > WaitForLockers() gets called in index_drop() with progress reporting > enabled. index_drop() would also be taken by REINDEX CONCURRENTLY > through performMultipleDeletions() but we cannot know if it gets used > for REINDEX CONCURRENTLY or for DROP INDEX CONCURRENTLY as it goes > through the central deletion machinery, so we have to mark progress > reporting as true anyway. Maybe that's worth a comment in index_drop > when calling WaitForLockers() because it is not actually that obvious, > say like that: Hmm, I wonder if it isn't the right solution to set 'progress' to false in that spot, instead. index_drop says it must only be called by the dependency machinery; are we depending on that to pass-through the need to update progress status? I'm going over that code now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 17, 2019 at 05:33:22AM -0300, Alvaro Herrera wrote: > Hmm, I wonder if it isn't the right solution to set 'progress' to false > in that spot, instead. index_drop says it must only be called by the > dependency machinery; are we depending on that to pass-through the need > to update progress status? I'm going over that code now. pgstat_progress_end_command() is done for REINDEX CONCURRENTLY after the concurrent drop, so it made sense to me to still report any PID REINDEX CONC is waiting for at this stage. -- Michael
Attachment
On 2019-Oct-17, Michael Paquier wrote: > On Thu, Oct 17, 2019 at 05:33:22AM -0300, Alvaro Herrera wrote: > > Hmm, I wonder if it isn't the right solution to set 'progress' to false > > in that spot, instead. index_drop says it must only be called by the > > dependency machinery; are we depending on that to pass-through the need > > to update progress status? I'm going over that code now. > > pgstat_progress_end_command() is done for REINDEX CONCURRENTLY after > the concurrent drop, so it made sense to me to still report any PID > REINDEX CONC is waiting for at this stage. Yeah, okay. So let's talk about your proposed new comment. First, there are two spots where WaitForLockers is called in index_drop and you're proposing to patch the second one. I think we should patch the first one and reference that one from the second one. I propose something like this (sorry for crude pasting): * Note: the reason we use actual lock acquisition here, rather than * just checking the ProcArray and sleeping, is that deadlock is * possible if one of the transactions in question is blocked trying * to acquire an exclusive lock on our table. The lock code will * detect deadlock and error out properly. * * Note: we report progress through WaitForLockers() unconditionally * here, even though it will only be used by REINDEX CONCURRENTLY and * not DROP INDEX CONCURRENTLY. */ and then /* * Wait till every transaction that saw the old index state has - * finished. + * finished. See above about progress reporting. */ -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 17, 2019 at 06:56:48AM -0300, Alvaro Herrera wrote: > On 2019-Oct-17, Michael Paquier wrote: >> pgstat_progress_end_command() is done for REINDEX CONCURRENTLY after >> the concurrent drop, so it made sense to me to still report any PID >> REINDEX CONC is waiting for at this stage. > > Yeah, okay. So let's talk about your proposed new comment. First, > there are two spots where WaitForLockers is called in index_drop and > you're proposing to patch the second one. I think we should patch the > first one and reference that one from the second one. I propose > something like this (sorry for crude pasting): > > <comments> What you are proposing here sounds fine to me. Perhaps you would prefer to adjust the code yourself? -- Michael
Attachment
On 2019-Oct-18, Michael Paquier wrote: > What you are proposing here sounds fine to me. Perhaps you would > prefer to adjust the code yourself? Sure thing, thanks, done :-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 18, 2019 at 07:30:37AM -0300, Alvaro Herrera wrote: > Sure thing, thanks, done :-) Thanks, Alvaro. -- Michael