Re: Custom table AMs need to include heapam.h because of BulkInsertState - Mailing list pgsql-hackers

From David Rowley
Subject Re: Custom table AMs need to include heapam.h because of BulkInsertState
Date
Msg-id CAKJS1f95sB21LBF=1MCsEV+XLtA_JC3mtXx5kgDuHDsOGoWhKg@mail.gmail.com
Whole thread Raw
In response to Re: Custom table AMs need to include heapam.h because of BulkInsertState  (Andres Freund <andres@anarazel.de>)
Responses Re: Custom table AMs need to include heapam.h because ofBulkInsertState
List pgsql-hackers
On Fri, 14 Jun 2019 at 07:53, Andres Freund <andres@anarazel.de> wrote:
>
> On June 12, 2019 6:42:11 PM PDT, David Rowley <david.rowley@2ndquadrant.com> wrote:
> >Do you see any issue with calling table_finish_bulk_insert() when the
> >partition's CopyMultiInsertBuffer is evicted from the
> >CopyMultiInsertInfo rather than at the end of the copy? It can mean
> >that we call the function multiple times per partition. I assume the
> >function is only really intended to flush bulk inserted tuple to the
> >storage, so calling it more than once would just mean an inefficiency
> >rather than a bug.
> >
> >Let me know your thoughts.
>
> I'm out on vacation until Monday (very needed, pretty damn exhausted). So I can't really give you a in depth answer
rightnow. 
>
> Off the cuff, I'd say it's worthwhile to try somewhat hard to avoid superfluous finish calls. They can be quite
expensive(fsync), and we ought to have nearly all the state for doing it only as much as necessary. Possibly we need
onebool per partition to track whether any rows where inserted, but thats peanuts in comparison to all the other state. 

No worries. I'll just park this patch here until you're ready to give it a look.

With the attached version I'm just calling table_finish_bulk_insert()
once per partition at the end of CopyFrom().  We've got an array with
all the ResultRelInfos we touched in the proute, so it's mostly a
matter of looping over that array and calling the function on each
ResultRelInfo's ri_RelationDesc.  However, to make it more complex,
PartitionTupleRouting is private to execPartition.c so we can't do
this directly... After staring at my screen for a while, I decided to
write a function that calls a callback function on each ResultRelInfo
in the PartitionTupleRouting.

The three alternative ways I thought of were:

1) Put PartitionTupleRouting back into execPartition.h and write the
loop over each ResultRelInfo in copy.c.
2) Write a specific function in execPartition.c that calls
table_finish_bulk_insert()
3) Modify ExecCleanupTupleRouting to pass in the ti_options and a bool
to say if it should call table_finish_bulk_insert() or not.

I didn't really like either of those. For #1, I'd rather keep it
private. For #2, it just seems a bit too specific a function to go
into execPartition.c. For #3 I really don't want to slow down
ExecCleanupTupleRouting() any. I designed those to be as fast as
possible since they're called for single-row INSERTs into partitioned
tables. Quite a bit of work went into PG12 to make those fast.

Of course, someone might see one of the alternatives as better than
what the patch does, so comments welcome.

The other thing I noticed is that we call
table_finish_bulk_insert(cstate->rel, ti_options); in copy.c
regardless of if we've done any bulk inserts or not. Perhaps that
should be under an if (insertMethod != CIM_SINGLE)

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: fix psql \conninfo & \connect when using hostaddr
Next
From: Joe Conway
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)