Re: ResourceOwner refactoring - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: ResourceOwner refactoring |
Date | |
Msg-id | 10399503-9a92-4107-8c36-698da29438db@iki.fi Whole thread Raw |
In response to | Re: ResourceOwner refactoring (Alexander Lakhin <exclusion@gmail.com>) |
Responses |
Re: ResourceOwner refactoring
(Andres Freund <andres@anarazel.de>)
Re: ResourceOwner refactoring (Alexander Lakhin <exclusion@gmail.com>) |
List | pgsql-hackers |
Thanks for the testing again! On 10/11/2023 11:00, Alexander Lakhin wrote: > I could see two failure modes: > 2023-11-10 08:42:28.870 UTC [1163274] ERROR: ResourceOwnerEnlarge called after release started > 2023-11-10 08:42:28.870 UTC [1163274] STATEMENT: drop table t; > 2023-11-10 08:42:28.870 UTC [1163274] WARNING: AbortTransaction while in COMMIT state > 2023-11-10 08:42:28.870 UTC [1163274] PANIC: cannot abort transaction 906, it was already committed > > 2023-11-10 08:43:27.897 UTC [1164148] ERROR: ResourceOwnerEnlarge called after release started > 2023-11-10 08:43:27.897 UTC [1164148] STATEMENT: DROP DATABASE db69; > 2023-11-10 08:43:27.897 UTC [1164148] WARNING: AbortTransaction while in COMMIT state > 2023-11-10 08:43:27.897 UTC [1164148] PANIC: cannot abort transaction 1043, it was already committed > > The stack trace for the second ERROR (ResourceOwnerEnlarge called ...) is: > ... > #6 0x0000558af5b2f35c in ResourceOwnerEnlarge (owner=0x558af716f3c8) at resowner.c:455 > #7 0x0000558af5888f18 in dsm_create_descriptor () at dsm.c:1207 > #8 0x0000558af5889205 in dsm_attach (h=3172038420) at dsm.c:697 > #9 0x0000558af5b1ebed in get_segment_by_index (area=0x558af711da18, index=2) at dsa.c:1764 > #10 0x0000558af5b1ea4b in dsa_get_address (area=0x558af711da18, dp=2199023329568) at dsa.c:970 > #11 0x0000558af5669366 in dshash_seq_next (status=0x7ffdd5912fd0) at dshash.c:687 > #12 0x0000558af5901998 in pgstat_drop_database_and_contents (dboid=16444) at pgstat_shmem.c:830 > #13 0x0000558af59016f0 in pgstat_drop_entry (kind=PGSTAT_KIND_DATABASE, dboid=16444, objoid=0) at pgstat_shmem.c:888 > #14 0x0000558af59044eb in AtEOXact_PgStat_DroppedStats (xact_state=0x558af7111ee0, isCommit=true) at pgstat_xact.c:88 > #15 0x0000558af59043c7 in AtEOXact_PgStat (isCommit=true, parallel=false) at pgstat_xact.c:55 > #16 0x0000558af53c782e in CommitTransaction () at xact.c:2371 > #17 0x0000558af53c709e in CommitTransactionCommand () at xact.c:306 > ... The quick, straightforward fix is to move the "CurrentResourceOwner = NULL" line earlier in CommitTransaction, per attached 0003-Clear-CurrentResourceOwner-earlier-in-CommitTransact.patch. You're not allowed to use the resource owner after you start to release it; it was a bit iffy even before the ResourceOwner rewrite but now it's explicitly forbidden. By clearing CurrentResourceOwner as soon as we start releasing it, we can prevent any accidental use. When CurrentResourceOwner == NULL, dsm_attach() returns a handle that's not associated with any ResourceOwner. That's appropriate for the pgstat case. The DSA is "pinned", so the handle is forgotten from the ResourceOwner right after calling dsm_attach() anyway. Looking closer at dsa.c, I think this is a wider problem though. The comments don't make it very clear how it's supposed to interact with ResourceOwners. There's just this brief comment in dsa_pin_mapping(): > * By default, areas are owned by the current resource owner, which means they > * are detached automatically when that scope ends. The dsa_area struct isn't directly owned by any ResourceOwner though. The DSM segments created by dsa_create() or dsa_attach() are. But the functions dsa_allocate() and dsa_get_address() can create or attach more DSM segments to the area, and they will be owned by the by the current resource owner *at the time of the call*. So if you call dsa_get_address() while in a different resource owner, things get very confusing. The attached new test module demonstrates that (0001-Add-test_dsa-module.patch), here's a shortened version: a = dsa_create(tranche_id); /* Switch to new resource owner */ oldowner = CurrentResourceOwner; childowner = ResourceOwnerCreate(oldowner, "temp owner"); CurrentResourceOwner = childowner; /* make a bunch of allocations */ for (int i = 0; i < 10000; i++) p[i] = dsa_allocate(a, 1000); /* Release the child resource owner */ CurrentResourceOwner = oldowner; ResourceOwnerRelease(childowner, RESOURCE_RELEASE_BEFORE_LOCKS, true, false); ResourceOwnerRelease(childowner, RESOURCE_RELEASE_LOCKS, true, false); ResourceOwnerRelease(childowner, RESOURCE_RELEASE_AFTER_LOCKS, true, false); ResourceOwnerDelete(childowner); dsa_detach(a); This first prints warnings on The ResourceOwnerRelease() calls: 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 2395813396 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 3922992700 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 1155452762 2023-11-10 13:57:21.475 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 4045183168 2023-11-10 13:57:21.476 EET [745346] WARNING: resource was not closed: dynamic shared memory segment 1529990480 And a segfault at the dsm_detach() call: 2023-11-10 13:57:21.480 EET [745246] LOG: server process (PID 745346) was terminated by signal 11: Segmentation fault #0 0x000055a5148f64ee in slist_pop_head_node (head=0x55a516d06bf8) at ../../../../src/include/lib/ilist.h:1034 #1 0x000055a5148f5eba in dsm_detach (seg=0x55a516d06bc0) at dsm.c:822 #2 0x000055a514b86db0 in dsa_detach (area=0x55a516d64dc8) at dsa.c:1939 #3 0x00007fd5dcaee5e0 in test_dsa_resowners (fcinfo=0x55a516d56a28) at test_dsa.c:112 I think that is surprising behavior from the DSA facility. When you make allocations with dsa_allocate() or just call dsa_get_address() on an existing dsa_pointer, you wouldn't expect the current resource owner to matter. I think dsa_create/attach() should store the current resource owner in the dsa_area, for use in subsequent operations on the DSA, per attached patch (0002-Fix-dsa.c-with-different-resource-owners.patch). -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
pgsql-hackers by date: