Thread: Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
From
Justin Pryzby
Date:
This commit seems to trigger elog(), not reproducible in the parent commit. 6e086fa2e77 Allow parallel workers to cope with a newly-created session user ID. postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; ERROR: pg_attribute catalog is missing 26 attribute(s) for relation OID 70321 postgres=# \errverbose ERROR: XX000: pg_attribute catalog is missing 26 attribute(s) for relation OID 70321 LOCATION: RelationBuildTupleDesc, relcache.c:658 This is not completely deterministic: postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; CLUSTER postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; CLUSTER postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; CLUSTER postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; CLUSTER postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; ERROR: pg_attribute catalog is missing 26 attribute(s) for relation OID 70391 But I think this will be reproducible in any database with a nontrivial number of attributes.
Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > This commit seems to trigger elog(), not reproducible in the > parent commit. Yeah, I can reproduce that. Will take a look tomorrow. regards, tom lane
Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > This commit seems to trigger elog(), not reproducible in the > parent commit. > 6e086fa2e77 Allow parallel workers to cope with a newly-created session user ID. > postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; > ERROR: pg_attribute catalog is missing 26 attribute(s) for relation OID 70321 I've been poking at this all day, and I still have little idea what's going on. I've added a bunch of throwaway instrumentation, and have managed to convince myself that the problem is that parallel heap scan is broken. The scans done to rebuild pg_attribute's indexes seem to sometimes miss heap pages or visit pages twice (in different workers). I have no idea why this is, and even less idea how 6e086fa2e is provoking it. As you say, the behavior isn't entirely reproducible, but I couldn't make it happen at all after reverting 6e086fa2e's changes in transam/parallel.c, so apparently there is some connection. Another possibly useful data point is that for me it reproduces fairly well (more than one time in two) on x86_64 Linux, but I could not make it happen on macOS ARM64. If it's a race condition, which smells plausible, that's perhaps not hugely surprising. regards, tom lane
Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
From
Tom Lane
Date:
I wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: >> This commit seems to trigger elog(), not reproducible in the >> parent commit. >> 6e086fa2e77 Allow parallel workers to cope with a newly-created session user ID. >> postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING pg_attribute_relid_attnum_index; >> ERROR: pg_attribute catalog is missing 26 attribute(s) for relation OID 70321 > I've been poking at this all day, and I still have little idea what's > going on. Got it, after a good deal more head-scratching. Here's the relevant parts of ParallelWorkerMain: /* * We've changed which tuples we can see, and must therefore invalidate * system caches. */ InvalidateSystemCaches(); /* * Restore GUC values from launching backend. We can't do this earlier, * because GUC check hooks that do catalog lookups need to see the same * database state as the leader. */ gucspace = shm_toc_lookup(toc, PARALLEL_KEY_GUC, false); RestoreGUCState(gucspace); ... /* Restore relmapper state. */ relmapperspace = shm_toc_lookup(toc, PARALLEL_KEY_RELMAPPER_STATE, false); RestoreRelationMap(relmapperspace); InvalidateSystemCaches blows away the worker's relcache. Then RestoreGUCState causes some catalog lookups (tracing shows that restoring default_text_search_config is what triggers this on my setup), and in particular pg_attribute's relcache entry will get constructed to support that. Then we wheel in a new set of relation map entries *without doing anything about what that might invalidate*. In the given test case, the globally-visible relmap says that pg_attribute's relfilenode is, say, XXXX. But we are busy rewriting it, so the parent process has an "active" relmap entry that says pg_attribute's relfilenode is YYYY. Given the above, the worker process will have built a pg_attribute relcache entry that contains XXXX, and even though it now knows YYYY is the value it should be using, that information never makes it to the worker's relcache. The upshot of this is that when the parallel heap scan machinery doles out some block numbers for the parent process to read, and some other block numbers for the worker to read, the worker is reading those block numbers from the pre-clustering copy of pg_attribute, which most likely doesn't match the post-clustering image. This accounts for the missing and duplicate tuples I was seeing in the scan output. Of course, the reason 6e086fa2e made this visible is that before that, any catalog reads triggered by RestoreGUCState were done in an earlier transaction, and then we would blow away the ensuing relcache entries in InvalidateSystemCaches. So there was no bug as long as you assume that the "..." code doesn't cause any catalog reads. I'm not too sure of that though --- it's certainly not very comfortable to assume that functions like SetCurrentRoleId and SetTempNamespaceState will never attempt a catalog lookup. The code has another hazard too, which is that this all implies that the GUC-related catalog lookups will be done against the globally-visible relmap state not whatever is active in the parent process. I have not tried to construct a POC showing that that can give incorrect answers (that is, different from what the parent thinks), but it seems plausible that it could. So the fix seems clear to me: RestoreRelationMap needs to happen before anything that could result in catalog lookups. I'm kind of inclined to move up the adjacent restores of non-transactional low-level stuff too, particularly RestoreReindexState which has direct impact on how catalog lookups are done. Independently of that, it's annoying that the parallel heap scan machinery failed to notice that it was handing out block numbers for two different relfilenodes. I'm inclined to see if we can put some Asserts in there that would detect that. This particular bug would have been far easier to diagnose that way, and it hardly seems unlikely that "worker is reading the wrong relation" could happen with other mistakes in future. regards, tom lane
Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > Thanks for debugging that. RestorePendingSyncs() also changes what > RelationInitPhysicalAddr() puts in the relcache entry, so it needs to stay > with RestoreRelationMap(). I'm attaching the fix I have in mind. Ah. No objections here. regards, tom lane