Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | 20220802222334.umf5wsa2r63koppn@awork3.anarazel.de Whole thread Raw |
In response to | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Justin Pryzby <pryzby@telsasoft.com>) |
List | pgsql-hackers |
On 2022-08-02 17:04:16 -0500, Justin Pryzby wrote: > I got this interesting looking thing. > > ==11628== Invalid write of size 8 > ==11628== at 0x1D12B3A: smgrsetowner (smgr.c:213) > ==11628== by 0x1C7C224: RelationGetSmgr (rel.h:572) > ==11628== by 0x1C7C224: RelationCopyStorageUsingBuffer (bufmgr.c:3725) > ==11628== by 0x1C7C7A6: CreateAndCopyRelationData (bufmgr.c:3817) > ==11628== by 0x14A4518: CreateDatabaseUsingWalLog (dbcommands.c:221) > ==11628== by 0x14AB009: createdb (dbcommands.c:1393) > ==11628== by 0x1D2B9AF: standard_ProcessUtility (utility.c:776) > ==11628== by 0x1D2C46A: ProcessUtility (utility.c:530) > ==11628== by 0x1D265F5: PortalRunUtility (pquery.c:1158) > ==11628== by 0x1D27089: PortalRunMulti (pquery.c:1315) > ==11628== by 0x1D27A7C: PortalRun (pquery.c:791) > ==11628== by 0x1D1E33D: exec_simple_query (postgres.c:1243) > ==11628== by 0x1D218BC: PostgresMain (postgres.c:4505) > ==11628== Address 0x1025bc18 is 2,712 bytes inside a block of size 8,192 free'd > ==11628== at 0x4033A3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==11628== by 0x217D7C2: AllocSetReset (aset.c:608) > ==11628== by 0x219B57A: MemoryContextResetOnly (mcxt.c:181) > ==11628== by 0x217DBD5: AllocSetDelete (aset.c:654) > ==11628== by 0x219C1EC: MemoryContextDelete (mcxt.c:252) > ==11628== by 0x21A109F: PortalDrop (portalmem.c:596) > ==11628== by 0x21A269C: AtCleanup_Portals (portalmem.c:907) > ==11628== by 0x11FEAB1: CleanupTransaction (xact.c:2890) > ==11628== by 0x120A74C: AbortCurrentTransaction (xact.c:3328) > ==11628== by 0x1D2158C: PostgresMain (postgres.c:4232) > ==11628== by 0x1B15DB5: BackendRun (postmaster.c:4490) > ==11628== by 0x1B1D799: BackendStartup (postmaster.c:4218) > ==11628== Block was alloc'd at > ==11628== at 0x40327F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==11628== by 0x217F0DC: AllocSetAlloc (aset.c:920) > ==11628== by 0x219E4D2: palloc (mcxt.c:1082) > ==11628== by 0x14A14BE: ScanSourceDatabasePgClassTuple (dbcommands.c:444) > ==11628== by 0x14A1CD8: ScanSourceDatabasePgClassPage (dbcommands.c:384) > ==11628== by 0x14A20BF: ScanSourceDatabasePgClass (dbcommands.c:322) > ==11628== by 0x14A4348: CreateDatabaseUsingWalLog (dbcommands.c:177) > ==11628== by 0x14AB009: createdb (dbcommands.c:1393) > ==11628== by 0x1D2B9AF: standard_ProcessUtility (utility.c:776) > ==11628== by 0x1D2C46A: ProcessUtility (utility.c:530) > ==11628== by 0x1D265F5: PortalRunUtility (pquery.c:1158) > ==11628== by 0x1D27089: PortalRunMulti (pquery.c:1315) Ick. That looks like somehow we end up with smgr entries still pointing to fake relcache entries, created in a prior attempt at create database. Looks like you'd need error trapping to call FreeFakeRelcacheEntry() (or just smgrclearowner()) in case of error. Or perhaps we can instead prevent the fake relcache entry being set as the owner in the first place? Why do we even need fake relcache entries here? Looks like all that they're used for is a bunch of RelationGetSmgr() calls? Can't we instead just pass the rnode to smgropen()? Given that we're doing that once for every buffer in the body of RelationCopyStorageUsingBuffer(), doing it in a bunch of other less-frequent places can't be a problem. can't Greetings, Andres Freund
pgsql-hackers by date: