Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+TgmoZfPzo7-Wf0Gi+ZxjTan2Mvy5ip4sPJK6rueH4axW5o0g@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I am not sure but don't we need to retain the color of z as well?

I believe that would be very wrong. If you recolor an internal node,
you'll break the constant-black-height invariant.

> Apart from this, the duplicate key (ex. for size queues the size of
> two requests can be same) handling might need some work.  Basically,
> either special combine function needs to be written (not sure yet what
> we should do there) or we always need to ensure that the key is unique
> like (size + start_urec_ptr).  If the size is the same, then we can
> decide based on start_urec_ptr.

I think that this problem is somewhat independent of whether we use an
rbtree or a binaryheap or some other data structure.  I would be
inclined to use XID as a tiebreak for the size queue, so that it's
more likely to match the ordering of the XID queue, but if that's
inconvenient, then some other arbitrary value like start_urec_ptr
should be fine.

> I think we can go by changing the implementation to rbtree by doing
> some enhancements instead of the binary heap or alternatively, we can
> use one of the two ideas suggested by you in the email above [1] to
> simplify the code and keep using the binary heap for now.  Especially,
> I like the below one.
> "2. However, I don't think we should have a separate request object
> for each queue anyway. We should insert pointers to the same objects
> in all the relevant queue (either size + XID, or else error). So
> instead of having three sets of objects, one for each queue, we'd just
> have one set of objects and point to them with as many as two
> pointers.
> We'd therefore need LESS memory than we're using today, because we
> wouldn't have separate arrays for XID, size, and error queue
> elements."
>
> I think even if we currently go with a binary heap, it will be
> possible to change it to rbtree later, but I am fine either way.

Well, I don't see much point in revising all of this logic twice. We
should pick the way we want it to work and make it work that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: proposal - patch: psql - sort_by_size
Next
From: Tomas Vondra
Date:
Subject: Re: rebased background worker reimplementation prototype