Re: Inefficiency in parallel pg_restore with many tables - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Inefficiency in parallel pg_restore with many tables
Date
Msg-id 20230913183450.GA1042742@nathanxps13
Whole thread Raw
In response to Re: Inefficiency in parallel pg_restore with many tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Inefficiency in parallel pg_restore with many tables
List pgsql-hackers
On Sun, Sep 10, 2023 at 12:35:10PM -0400, Tom Lane wrote:
> Hmm ... I do not like v7 very much at all.  It requires rather ugly
> changes to all of the existing callers, and what are we actually
> buying?  If anything, it makes things slower for pass-by-value items
> like integers.  I'd stick with the Datum convention in the backend.
> 
> Instead, I took a closer look through the v6 patch set.
> I think that's in pretty good shape and nearly committable,
> but I have a few thoughts:

Thanks for reviewing.  I'm fine with proceeding with the v6 approach.  Even
though the alternative approach makes the API consistent for the frontend
and backend, I'm also not a huge fan of the pointer gymnastics required in
the comparators.  Granted, we still have to do some intptr_t conversions in
pg_dump_sort.c with the v6 approach, but that seems to be an exception.

> * I'm not sure about defining bh_node_type as a macro:
> 
> +#ifdef FRONTEND
> +#define bh_node_type void *
> +#else
> +#define bh_node_type Datum
> +#endif
> 
> rather than an actual typedef:
> 
> +#ifdef FRONTEND
> +typedef void *bh_node_type;
> +#else
> +typedef Datum bh_node_type;
> +#endif
> 
> My concern here is that bh_node_type is effectively acting as a
> typedef, so that pgindent might misbehave if it's not declared as a
> typedef.  On the other hand, there doesn't seem to be any indentation
> problem in the patchset as it stands, and we don't expect any code
> outside binaryheap.h/.c to refer to bh_node_type, so maybe it's fine.
> (If you do choose to make it a typedef, remember to add it to
> typedefs.list.)

I think a typedef makes more sense here.

> * As a matter of style, I'd recommend adding braces in places
> like this:
> 
>      if (heap->bh_size >= heap->bh_space)
> +    {
> +#ifdef FRONTEND
> +        pg_fatal("out of binary heap slots");
> +#else
>          elog(ERROR, "out of binary heap slots");
> +#endif
> +    }
>      heap->bh_nodes[heap->bh_size] = d;
> 
> It's not wrong as you have it, but I think it's more readable
> and less easy to accidentally break with the extra braces.

Fair point.

> * In 0002, isn't the comment for binaryheap_remove_node wrong?
> 
> + * Removes the nth node from the heap.  The caller must ensure that there are
> + * at least (n - 1) nodes in the heap.  O(log n) worst case.
> 
> Shouldn't that be "(n + 1)"?  Also, I'd specify "n'th (zero based) node"
> for clarity.

Yeah, that's a mistake.

> * I would say that this bit in 0004:
> 
> -        j = removeHeapElement(pendingHeap, heapLength--);
> +        j = (intptr_t) binaryheap_remove_first(pendingHeap);
> 
> needs an explicit cast to int:
> 
> +        j = (int) (intptr_t) binaryheap_remove_first(pendingHeap);
> 
> otherwise some compilers might complain about the result possibly
> not fitting in "j".

Sure.  IMO it's a tad more readable, too.

> Other than those nitpicks, I like v6.  I'll mark this RfC.

Great.  I've posted a v8 with your comments addressed in order to get one
more round of cfbot coverage.  Assuming those tests pass and there is no
additional feedback, I'll plan on committing this in the next few days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pg_resetwal tests, logging, and docs update
Next
From: Andres Freund
Date:
Subject: Re: BufferUsage counters' values have changed