Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1K_eDXnkYWTeFbXzHMamMLfdxnJBB3wLE5c_6idERXMpg@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Re: POC: Cleaning up orphaned files using undo logs  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Jul 12, 2019 at 7:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Apart from this, the duplicate key (ex. for size queues the size of
> > two requests can be same) handling might need some work.  Basically,
> > either special combine function needs to be written (not sure yet what
> > we should do there) or we always need to ensure that the key is unique
> > like (size + start_urec_ptr).  If the size is the same, then we can
> > decide based on start_urec_ptr.
>
> I think that this problem is somewhat independent of whether we use an
> rbtree or a binaryheap or some other data structure.
>

I think then I am missing something because what I am talking about is
below code in rbt_insert:
rbt_insert()
{
..
cmp = rbt->comparator(data, current, rbt->arg);
if (cmp == 0)
{
/*
* Found node with given key.  Apply combiner.
*/
rbt->combiner(current, data, rbt->arg);
*isNew = false;
return current;
}
..
}

If you see, here it doesn't add the duplicate key in the tree and that
is not the case with binary_heap as far as I can understand.

>  I would be
> inclined to use XID as a tiebreak for the size queue, so that it's
> more likely to match the ordering of the XID queue, but if that's
> inconvenient, then some other arbitrary value like start_urec_ptr
> should be fine.
>

I think it would be better to use start_urec_ptr because XID can be
non-unique in our case.  As I explained in one of the emails above [1]
that we register the requests for logged and unlogged relations
separately, so XID can be non-unique.

> >
> > I think even if we currently go with a binary heap, it will be
> > possible to change it to rbtree later, but I am fine either way.
>
> Well, I don't see much point in revising all of this logic twice. We
> should pick the way we want it to work and make it work that way.
>

Yeah, I agree.  So, I am assuming here that as you have discussed this
idea with Andres offlist, he is on board with changing it as he has
originally suggested using binary_heap.  Andres, do let us know if you
think differently here.  It would be good if anyone else following the
thread can also weigh in.

[1] - https://www.postgresql.org/message-id/CAA4eK1LEKyPZD5Dy4j1u2smUUyMzxgC2YLj8E%2BaJpsvG7sVJYA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Jose Luis Tallon
Date:
Subject: Re: [PATCH] Implement uuid_version()
Next
From: Thomas Munro
Date:
Subject: Re: refactoring - share str2*int64 functions