Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+hUKGJrqHy55ihTXrLvAALOYKinTBM+PJf_RDH_701PThDVKA@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Tue, Jul 16, 2019 at 11:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Mon, Jul 1, 2019 at 1:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> /* If we discarded everything, the slot can be given up. */
> + if (entirely_discarded)
> + free_undo_log_slot(slot);
>
> I have noticed that when the undo log was detached and it was full
> then if we discard complete log we release its slot.  But, what is
> bothering me is should we add that log to the free list?  Or I am
> missing something?

Stepping back a bit:  The free lists are for undo logs that someone
might want to attach to and insert into.  If it's full, we probably
can't insert anything into it again (well, technically someone else
who wants to insert something a bit smaller might be able to, but
that's not an interesting case to worry about).  So it doesn't need to
go back on a free list, but it still needs to exist (= occupy a slot)
as long as there is undiscarded data in it, because that data is
needed and we need to be able to test URPs against its discard
pointer.  But once its data is entirely discarded, it ceases to exist
-- there is no reason to waste a slot on it, and any URP in this undo
log will be considered to be discarded (because we can't find a slot,
and we also cache that fact in recent_discard so lookups are fast and
lock-free), and therefore it'll not be checkpointed or reloaded at
next startup; then we couldn't put it on a free list even if we wanted
to, because there is nothing left of it ("logs" don't really exist in
memory, only "slots", currently holding the meta-data for a log, which
is why I renamed UndoLog to UndoLogSlot to reduce confusion on that
point).  One of the goals here is to make a system that doesn't
require an increasing amount of memory as time goes on -- hence desire
to completely remove state relating to entirely discarded undo logs
(you might point out that the recent_discard cache would get
arbitrarily large after we chew through millions of undo logs, but
there is another defence against that in the form of low_logno which
isn't used in that test yet but could be used to miminise that
effect).  Does this make sense, and do you see a problem?

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs
Next
From: Jeff Davis
Date:
Subject: Allow simplehash to use already-calculated hash values