Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date
Msg-id 229c4f7219ed164088dadc935df21e1cf125e191.camel@j-davis.com
Whole thread Raw
In response to Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
List pgsql-hackers
On Wed, 2024-05-15 at 12:56 +0530, Bharath Rupireddy wrote:
> Because of this, the
> buffers get flushed sooner than that of the existing COPY with
> table_multi_insert AM causing regression in pgbench which uses COPY
> extensively.

The flushing behavior is entirely controlled by the table AM. The heap
can use the same flushing logic that it did before, which is to hold
1000 tuples.

I like that it's accounting for memory, too, but it doesn't need to be
overly restrictive. Why not just use work_mem? That should hold 1000
reasonably-sized tuples, plus overhead.

Even better would be if we could take into account partitioning. That
might be out of scope for your current work, but it would be very
useful. We could have a couple new GUCs like modify_table_buffer and
modify_table_buffer_per_partition or something like that.

> 1. Try to get the actual tuple sizes excluding header sizes for each
> column in the new TAM.

I don't see the point in arbitrarily excluding the header.

> v21 also adds code to maintain tuple size for virtual tuple slots.
> This helps make better memory-based flushing decisions in the new
> TAM.

That seems wrong. We shouldn't need to change the TupleTableSlot
structure for this patch.


Comments on v21:

* All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

* The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
ExecInsert(). What's the disadvantage to using a bulk insert state
there?

* I'm a bit confused by TableModifyState->modify_end_callback. The AM
both sets the callback and calls the callback -- why can't the code
just go into the table_modify_end method?

* The code structure in table_modify_begin() (and related) is strange.
Can it be simplified or am I missing something?

* Why are table_modify_state and insert_modify_buffer_flush_context
globals? What if there are multiple modify nodes in a plan?

* Can you explain the design in logical rep?

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Log details for stats dropped more than once
Next
From: Tom Lane
Date:
Subject: Re: Why does pgindent's README say to download typedefs.list from the buildfarm?