Re: Batch insert in CTAS/MatView code - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Batch insert in CTAS/MatView code
Date
Msg-id 20190930073802.t3idbremgyuklvuf@alap3.anarazel.de
Whole thread Raw
In response to Re: Batch insert in CTAS/MatView code  (Paul Guo <pguo@pivotal.io>)
Responses Re: Batch insert in CTAS/MatView code
List pgsql-hackers
Hi,

On 2019-09-30 12:12:31 +0800, Paul Guo wrote:
> > > > However, I can also see that there is no better alternative.  We need
> > to
> > > > compute the size of accumulated tuples so far, in order to decide
> > whether
> > > > to stop accumulating tuples.  There is no convenient way to obtain the
> > > > length of the tuple, given a slot.  How about making that decision
> > solely
> > > > based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple
> > call
> > > > altogether?
> > >
> > > ... maybe we should add a new operation to slots, that returns the
> > > (approximate?) size of a tuple?
> >
> > Hm, I'm not convinced that it's worth adding that as a dedicated
> > operation. It's not that clear what it'd exactly mean anyway - what
> > would it measure? As referenced in the slot? As if it were stored on
> > disk? etc?
> >
> > I wonder if the right answer wouldn't be to just measure the size of a
> > memory context containing the batch slots, or something like that.
> >
> >
> Probably a better way is to move those logic (append slot to slots, judge
> when to flush, flush, clean up slots) into table_multi_insert()?

That does not strike me as a good idea. The upper layer is going to need
to manage some resources (e.g. it's the only bit that knows about how to
manage lifetime of the incoming data), and by exposing it to each AM
we're going to duplicate the necessary code too.


> Generally the final implementation of table_multi_insert() should be
> able to know the sizes easily. One concern is that currently just COPY
> in the repo uses multi insert, so not sure if other callers in the
> future want their own logic (or set up a flag to allow customization
> but seems a bit over-designed?).

And that is also a concern, it seems unlikely that we'll get the
interface good.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Skip recovery/standby signal files in pg_basebackup
Next
From: Kyotaro Horiguchi
Date:
Subject: Inconsistent usage of BACKEND_* symbols