Thread: tuplesort Generation memory contexts don't play nicely with index builds
Hackers, I noticed while doing some memory context related work that since we now use generation.c memory contexts for tuplesorts (40af10b57) that tuplesort_putindextuplevalues() causes memory "leaks" in the generation context due to index_form_tuple() being called while we're switched into the state->tuplecontext. I use the word "leak" here slightly loosely. It's only a leak due to how generation.c uses no free lists to allow reuse pfree'd memory. It looks like the code has been this way ever since 9f03ca915 (Avoid copying index tuples when building an index.) That commit did add a big warning at the top of index_form_tuple() that the function must be careful to not leak any memory. A quick fix would be just to add a new bool field, e.g, usegencxt to tuplesort_begin_common() and pass that as false in all functions apart from tuplesort_begin_heap(). That way we'll always be using an aset.c context when we call index_form_tuple(). However, part of me thinks that 9f03ca915 is standing in the way of us doing more in the future to optimize how we store tuples during sorts. We might, one day, want to consider using a hand-rolled bump allocator. If we ever do that we'd need to undo the work done by 9f03ca915. Does anyone have any thoughts on this? Here's a reproducer from the regression tests: CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT); -- Use uncompressed data stored in toast. CREATE INDEX no_index_cleanup_idx ON no_index_cleanup(t); ALTER TABLE no_index_cleanup ALTER COLUMN t SET STORAGE EXTERNAL; INSERT INTO no_index_cleanup(i, t) VALUES (generate_series(1,30), repeat('1234567890',269)); -- index cleanup option is ignored if VACUUM FULL VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup; David
Re: tuplesort Generation memory contexts don't play nicely with index builds
From
David Rowley
Date:
On Wed, 29 Jun 2022 at 12:59, David Rowley <dgrowleyml@gmail.com> wrote: > I noticed while doing some memory context related work that since we > now use generation.c memory contexts for tuplesorts (40af10b57) that > tuplesort_putindextuplevalues() causes memory "leaks" in the > generation context due to index_form_tuple() being called while we're > switched into the state->tuplecontext. I've attached a draft patch which changes things so that we don't use generation contexts for sorts being done for index builds. David
Attachment
Re: tuplesort Generation memory contexts don't play nicely with index builds
From
David Rowley
Date:
On Wed, 29 Jun 2022 at 12:59, David Rowley <dgrowleyml@gmail.com> wrote: > I noticed while doing some memory context related work that since we > now use generation.c memory contexts for tuplesorts (40af10b57) that > tuplesort_putindextuplevalues() causes memory "leaks" in the > generation context due to index_form_tuple() being called while we're > switched into the state->tuplecontext. I voiced my dislike for the patch I came up with to fix this issue to Andres. He suggested that I just add a version of index_form_tuple that can be given a MemoryContext pointer to allocate the returned tuple into. I like that idea much better, so I've attached a patch to fix it that way. If there are no objections, I plan to push this in the next 24 hours. David
Attachment
Re: tuplesort Generation memory contexts don't play nicely with index builds
From
David Rowley
Date:
On Wed, 6 Jul 2022 at 13:34, David Rowley <dgrowleyml@gmail.com> wrote: > If there are no objections, I plan to push this in the next 24 hours. Pushed. David
On Tue, Jul 5, 2022 at 9:34 PM David Rowley <dgrowleyml@gmail.com> wrote: > I voiced my dislike for the patch I came up with to fix this issue to > Andres. He suggested that I just add a version of index_form_tuple > that can be given a MemoryContext pointer to allocate the returned > tuple into. > > I like that idea much better, so I've attached a patch to fix it that way. > > If there are no objections, I plan to push this in the next 24 hours. Apologies for not having looked at this thread sooner, but for what it's worth, I think this is a fine solution. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jul 7, 2022 at 3:16 AM David Rowley <dgrowleyml@gmail.com> wrote: > > Pushed. Hmm, the commit appeared on git.postgresql.org, but apparently not in my email nor the list archives. -- John Naylor EDB: http://www.enterprisedb.com
Re: tuplesort Generation memory contexts don't play nicely with index builds
From
David Rowley
Date:
On Thu, 7 Jul 2022 at 13:41, John Naylor <john.naylor@enterprisedb.com> wrote: > > On Thu, Jul 7, 2022 at 3:16 AM David Rowley <dgrowleyml@gmail.com> wrote: > > > > Pushed. > > Hmm, the commit appeared on git.postgresql.org, but apparently not in > my email nor the list archives. Strange. I'd suspect a temporary hiccup in whatever code pushes the commits onto the mailing list, but I see that my fe3caa143 from yesterday was also missed. The only difference in my workflow is that I'm sshing to the machine I push from via another room rather than sitting right in front of it like I normally am. I struggle to imagine why that would cause this to happen. David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 7 Jul 2022 at 13:41, John Naylor <john.naylor@enterprisedb.com> wrote: >> Hmm, the commit appeared on git.postgresql.org, but apparently not in >> my email nor the list archives. > Strange. I'd suspect a temporary hiccup in whatever code pushes the > commits onto the mailing list, but I see that my fe3caa143 from > yesterday was also missed. Caught in list moderation maybe? regards, tom lane
On 7/6/22 23:15, Tom Lane wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> On Thu, 7 Jul 2022 at 13:41, John Naylor <john.naylor@enterprisedb.com> wrote: >>> Hmm, the commit appeared on git.postgresql.org, but apparently not in >>> my email nor the list archives. > >> Strange. I'd suspect a temporary hiccup in whatever code pushes the >> commits onto the mailing list, but I see that my fe3caa143 from >> yesterday was also missed. > > Caught in list moderation maybe? Actually, yes they are: 8<----------------------- Date: 2022-07-06 07:41:02 List: pgsql-committers Reason: sender is not a confirmed email address From: drowley(at)postgresql(dot)org Size: 4890 bytes Subject: pgsql: Remove size increase in ExprEvalStep caused by hashed saops Date: 2022-07-06 20:14:25 List: pgsql-committers Reason: sender is not a confirmed email address Spam score: -7.1 From: drowley(at)postgresql(dot)org Size: 4703 bytes Subject: pgsql: Overload index_form_tuple to allow the memory context to be supp 8<----------------------- (I manually did the (at) and (dot) obfuscation) I don't ordinarily moderate the pgsql-committers list, and don't know offhand who does, so am a bit hesitant to approve them myself. But perhaps I should? I guess another good question is why the email address no longer confirmed... -- Joe Conway RDS Open Source Databases Amazon Web Services: https://aws.amazon.com