Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1+1Vf0tS=hT+rxUz02JvLPqbDSqwrCjpW8N-K8tFPdTzA@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Andres Freund <andres@anarazel.de>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Jul 17, 2019 at 3:37 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-07-13 15:55:51 +0530, Amit Kapila wrote:
> > On Fri, Jul 12, 2019 at 7:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > > I think even if we currently go with a binary heap, it will be
> > > > possible to change it to rbtree later, but I am fine either way.
> > >
> > > Well, I don't see much point in revising all of this logic twice. We
> > > should pick the way we want it to work and make it work that way.
> > >
> >
> > Yeah, I agree.  So, I am assuming here that as you have discussed this
> > idea with Andres offlist, he is on board with changing it as he has
> > originally suggested using binary_heap.  Andres, do let us know if you
> > think differently here.  It would be good if anyone else following the
> > thread can also weigh in.
>
> Yes, I think using an rbtree makes sense.
>

Okay.

> I'm not yet sure whether we'd want the rbtree nodes being pointed to
> directly by the hashtable, or whether we'd want one indirection.
>
> e.g. either something like:
>
>
> typedef struct UndoWorkerQueue
> {
>     /* priority ordered tree */
>     RBTree *tree;
>     ....
> }
>

I think we also need the size of rbtree (aka how many nodes/undo
requests it has) to know whether we can add more.  This information is
available in binary heap, but here I think we need to track it in
UndoWorkerQueue.  Basically, at each enqueue/dequeue, we need to
increment/decrement the same.

> typedef struct UndoWorkerQueueEntry
> {
>      RBTNode tree_node;
>
>      /*
>       * Reference hashtable via key, not pointers, entries might be
>       * moved.
>       */
>      RollbackHashKey rollback_key
>      ...
> } UndoWorkerQueueEntry;
>

In UndoWorkerQueueEntry, we might also want to include some other info
like dbid, request_size, next_retry_at, err_occurred_at so that while
accessing queue entry in comparator functions or other times, we don't
always need to perform hash table search.  OTOH, we can do hash_search
as well, but may be code-wise it will be better to keep additional
information.

Another thing is we need some freelist/array for
UndoWorkerQueueEntries equivalent to size of three queues?

> typedef struct RollbackHashEntry
> {
>      ...
>      UndoWorkerQueueEntry *queue_memb_size;
>      UndoWorkerQueueEntry *queue_memb_age;
>      UndoWorkerQueueEntry *queue_memb_error;
> }
>
> and call rbt_delete() for any non-NULL queue_memb_* whenever an entry is
> dequeued via one of the queues (after setting the one already dequeued
> from to NULL, of course).  Which requires - as Robert mentioned - that
> rbtree pointers remain stable after insertions.
>

Right.

BTW, do you have any preference for using dynahash or simplehash for
RollbackHashTable?

>
> Alternatively we can have a more complicated arrangement without the
> "stable pointer" requirement (which'd also similarly work for a binary
> heap):
>
>
> I think the first approach is clearly preferrable from a simplicity POV,
> but the second approach would be a bit more generic (applicable to heap
> as well) and wouldn't require adjusting the rbtree code.
>

+1 for the first approach, the second one appears to be quite
complicated as compared to first.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Intermittent pg_ctl failures on Windows
Next
From: Andres Freund
Date:
Subject: Re: partition routing layering in nodeModifyTable.c