Thread: should INSERT SELECT use a BulkInsertState?

should INSERT SELECT use a BulkInsertState?

From

Justin Pryzby

Date:

08 May 2020, 07:25:45

Seems to me it should, at least conditionally.  At least if there's a function
scan or a relation or ..

I mentioned a bit about our use-case here:
https://www.postgresql.org/message-id/20200219173742.GA30939%40telsasoft.com
=> I'd prefer our loaders to write their own data rather than dirtying large
fractions of buffer cache and leaving it around for other backends to clean up.

commit 7f9e061363e58f30eee0cccc8a0e46f637bf137b
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Fri May 8 02:17:32 2020 -0500

    Make INSERT SELECT use a BulkInsertState

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..6da4325225 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -578,7 +578,7 @@ ExecInsert(ModifyTableState *mtstate,
             table_tuple_insert_speculative(resultRelationDesc, slot,
                                            estate->es_output_cid,
                                            0,
-                                           NULL,
+                                           mtstate->bistate,
                                            specToken);
 
             /* insert index entries for tuple */
@@ -617,7 +617,7 @@ ExecInsert(ModifyTableState *mtstate,
             /* insert the tuple normally */
             table_tuple_insert(resultRelationDesc, slot,
                                estate->es_output_cid,
-                               0, NULL);
+                               0, mtstate->bistate);
 
             /* insert index entries for tuple */
             if (resultRelInfo->ri_NumIndices > 0)
@@ -2332,6 +2332,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
     mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
     mtstate->mt_nplans = nplans;
+    mtstate->bistate = NULL;
+    if (operation == CMD_INSERT)
+    {
+        Plan *p = linitial(node->plans);
+        Assert(nplans == 1);
+        if (!IsA(p, Result) && !IsA(p, ValuesScan))
+            mtstate->bistate = GetBulkInsertState();
+    }
 
     /* set up epqstate with dummy subplan data for the moment */
     EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
@@ -2809,6 +2817,9 @@ ExecEndModifyTable(ModifyTableState *node)
      */
     for (i = 0; i < node->mt_nplans; i++)
         ExecEndNode(node->mt_plans[i]);
+
+    if (node->bistate)
+        FreeBulkInsertState(node->bistate);
 }
 
 void
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4fee043bb2..daf365f181 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -14,6 +14,7 @@
 #ifndef EXECNODES_H
 #define EXECNODES_H
 
+#include "access/heapam.h"
 #include "access/tupconvert.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
@@ -1177,6 +1178,7 @@ typedef struct ModifyTableState
     List      **mt_arowmarks;    /* per-subplan ExecAuxRowMark lists */
     EPQState    mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
     bool        fireBSTriggers; /* do we need to fire stmt triggers? */
+    BulkInsertState    bistate;    /* State for bulk insert like INSERT SELECT */
 
     /*
      * Slot for storing tuples in the root partitioned table's rowtype during

Re: should INSERT SELECT use a BulkInsertState?

From

Justin Pryzby

Date:

10 May 2020, 15:58:18

On Fri, May 08, 2020 at 02:25:45AM -0500, Justin Pryzby wrote:
> Seems to me it should, at least conditionally.  At least if there's a function
> scan or a relation or ..
> 
> I mentioned a bit about our use-case here:
> https://www.postgresql.org/message-id/20200219173742.GA30939%40telsasoft.com
> => I'd prefer our loaders to write their own data rather than dirtying large
> fractions of buffer cache and leaving it around for other backends to clean up.

Nobody suggested otherwise so I added here and cleaned up to pass tests.
https://commitfest.postgresql.org/28/2553/

-- 
Justin

Attachment

v1-0001-Make-INSERT-SELECT-use-a-BulkInsertState.patch

Re: should INSERT SELECT use a BulkInsertState?

From

Michael Paquier

Date:

11 May 2020, 06:19:34

On Fri, May 08, 2020 at 02:25:45AM -0500, Justin Pryzby wrote:
> Seems to me it should, at least conditionally.  At least if there's a function
> scan or a relation or ..
>
> I mentioned a bit about our use-case here:
> https://www.postgresql.org/message-id/20200219173742.GA30939%40telsasoft.com
> => I'd prefer our loaders to write their own data rather than dirtying large
> fractions of buffer cache and leaving it around for other backends to clean up.

Does it matter in terms of performance and for which cases does it
actually matter?

> diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
> index 4fee043bb2..daf365f181 100644
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -14,6 +14,7 @@
>  #ifndef EXECNODES_H
>  #define EXECNODES_H
>
> +#include "access/heapam.h"
>  #include "access/tupconvert.h"
>  #include "executor/instrument.h"
>  #include "fmgr.h"
> @@ -1177,6 +1178,7 @@ typedef struct ModifyTableState
>      List      **mt_arowmarks;    /* per-subplan ExecAuxRowMark lists */
>      EPQState    mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
>      bool        fireBSTriggers; /* do we need to fire stmt triggers? */
> +    BulkInsertState    bistate;    /* State for bulk insert like INSERT SELECT */

I think that this needs more thoughts.  You are introducing a
dependency between some generic execution-related nodes and heap, a
table AM.
--
Michael

Attachment

signature.asc

Re: should INSERT SELECT use a BulkInsertState?

From

Andres Freund

Date:

04 June 2020, 17:30:47

Hi,

On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:
> Seems to me it should, at least conditionally.  At least if there's a function
> scan or a relation or ..

Well, the problem is that this can cause very very significant
regressions. As in 10x slower or more. The ringbuffer can cause constant
XLogFlush() calls (due to the lsn interlock), and the eviction from
shared_buffers (regardless of actual available) will mean future vacuums
etc will be much slower.  I think this is likely to cause pretty
widespread regressions on upgrades.

Now, it sucks that we have this problem in the general facility that's
supposed to be used for this kind of bulk operation. But I don't really
see it realistic as expanding use of bulk insert strategies unless we
have some more fundamental fixes.

Regards,

Andres

Re: should INSERT SELECT use a BulkInsertState?

From

Daniel Gustafsson

Date:

12 July 2020, 20:04:21

> On 4 Jun 2020, at 19:30, Andres Freund <andres@anarazel.de> wrote:
> On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:

>> Seems to me it should, at least conditionally.  At least if there's a function
>> scan or a relation or ..
>
> Well, the problem is that this can cause very very significant
> regressions. As in 10x slower or more. The ringbuffer can cause constant
> XLogFlush() calls (due to the lsn interlock), and the eviction from
> shared_buffers (regardless of actual available) will mean future vacuums
> etc will be much slower.  I think this is likely to cause pretty
> widespread regressions on upgrades.
>
> Now, it sucks that we have this problem in the general facility that's
> supposed to be used for this kind of bulk operation. But I don't really
> see it realistic as expanding use of bulk insert strategies unless we
> have some more fundamental fixes.

Based on the above, and the lack of activity in the thread, it sounds like this
patch should be marked Returned with Feedback; but Justin: you set it to
Waiting on Author at the start of the commitfest, are you working on a new
version?

cheers ./daniel

Re: should INSERT SELECT use a BulkInsertState?

From

Justin Pryzby

Date:

13 July 2020, 01:57:00

On Thu, Jun 04, 2020 at 10:30:47AM -0700, Andres Freund wrote:
> On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:
> > Seems to me it should, at least conditionally.  At least if there's a function
> > scan or a relation or ..
> 
> Well, the problem is that this can cause very very significant
> regressions. As in 10x slower or more. The ringbuffer can cause constant
> XLogFlush() calls (due to the lsn interlock), and the eviction from
> shared_buffers (regardless of actual available) will mean future vacuums
> etc will be much slower.  I think this is likely to cause pretty
> widespread regressions on upgrades.
> 
> Now, it sucks that we have this problem in the general facility that's
> supposed to be used for this kind of bulk operation. But I don't really
> see it realistic as expanding use of bulk insert strategies unless we
> have some more fundamental fixes.

I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.

postgres=# \t on \\ \set QUIET \\ VACUUM FULL t; \dt+ t \\ begin ; \timing on \\ INSERT INTO t SELECT * FROM t;
rollback;SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2; 
 
| public | t    | table | pryzbyj | 35 MB | 
|Time: 9497.318 ms (00:09.497)
|    33 |          1
|     3 |          2
|    18 |          3
|     5 |          4
|  4655 |          5
| 11670 |           

vs

postgres=# \t on \\ \set QUIET \\ VACUUM FULL t; \dt+ t \\ begin BULK ; \timing on \\ INSERT INTO t SELECT * FROM t;
rollback;SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2; 
 
| public | t    | table | pryzbyj | 35 MB | 
|Time: 8268.780 ms (00:08.269)
|  2080 |          1
|     3 |          2
|    19 |          4
|   234 |          5
| 14048 |           

And:

postgres=# begin ; \x \\ \t \\ SELECT statement_timestamp(); \o /dev/null \\ SELECT 'INSERT INTO t VALUES(0)' FROM
generate_series(1,999999);\set ECHO errors \\ \set QUIET on \\ \o \\ \gexec \\ SELECT statement_timestamp(); abort; \x
\\SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2; a
 
|statement_timestamp | 2020-07-12 20:31:43.717328-05
|statement_timestamp | 2020-07-12 20:36:16.692469-05
|
|    52 |          1
|    24 |          2
|    17 |          3
|     6 |          4
|  4531 |          5
| 11754 |           

vs

postgres=# begin BULK ; \x \\ \t \\ SELECT statement_timestamp(); \o /dev/null \\ SELECT 'INSERT INTO t VALUES(0)' FROM
generate_series(1,999999);\set ECHO errors \\ \set QUIET on \\ \o \\ \gexec \\ SELECT statement_timestamp(); abort; \x
\\SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2; a
 
|statement_timestamp | 2020-07-12 20:43:47.089538-05
|statement_timestamp | 2020-07-12 20:48:04.798138-05
|
|  4456 |          1
|    22 |          2
|     1 |          3
|     7 |          4
|    79 |          5
| 11819 |

-- 
Justin

Attachment

v2-0001-Allow-INSERT-SELECT-to-use-a-BulkInsertState.patch

Re: should INSERT SELECT use a BulkInsertState?

From

Justin Pryzby

Date:

19 September 2020, 13:32:15

On Sun, Jul 12, 2020 at 08:57:00PM -0500, Justin Pryzby wrote:
> On Thu, Jun 04, 2020 at 10:30:47AM -0700, Andres Freund wrote:
> > On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:
> > > Seems to me it should, at least conditionally.  At least if there's a function
> > > scan or a relation or ..
> > 
> > Well, the problem is that this can cause very very significant
> > regressions. As in 10x slower or more. The ringbuffer can cause constant
> > XLogFlush() calls (due to the lsn interlock), and the eviction from
> > shared_buffers (regardless of actual available) will mean future vacuums
> > etc will be much slower.  I think this is likely to cause pretty
> > widespread regressions on upgrades.
> > 
> > Now, it sucks that we have this problem in the general facility that's
> > supposed to be used for this kind of bulk operation. But I don't really
> > see it realistic as expanding use of bulk insert strategies unless we
> > have some more fundamental fixes.
> 
> I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.

@cfbot: rebased

Attachment

v3-0001-Allow-INSERT-SELECT-to-use-a-BulkInsertState.patch

Re: should INSERT SELECT use a BulkInsertState?

From

Justin Pryzby

Date:

16 October 2020, 21:05:12

On Sat, Sep 19, 2020 at 08:32:15AM -0500, Justin Pryzby wrote:
> On Sun, Jul 12, 2020 at 08:57:00PM -0500, Justin Pryzby wrote:
> > On Thu, Jun 04, 2020 at 10:30:47AM -0700, Andres Freund wrote:
> > > On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:
> > > > Seems to me it should, at least conditionally.  At least if there's a function
> > > > scan or a relation or ..
> > > 
> > > Well, the problem is that this can cause very very significant
> > > regressions. As in 10x slower or more. The ringbuffer can cause constant
> > > XLogFlush() calls (due to the lsn interlock), and the eviction from
> > > shared_buffers (regardless of actual available) will mean future vacuums
> > > etc will be much slower.  I think this is likely to cause pretty
> > > widespread regressions on upgrades.
> > > 
> > > Now, it sucks that we have this problem in the general facility that's
> > > supposed to be used for this kind of bulk operation. But I don't really
> > > see it realistic as expanding use of bulk insert strategies unless we
> > > have some more fundamental fixes.
> > 
> > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.
> 
> @cfbot: rebased

again

-- 
Justin

Attachment

v4-0001-Allow-INSERT-SELECT-to-use-a-BulkInsertState.patch

Re: should INSERT SELECT use a BulkInsertState?

From

Simon Riggs

Date:

22 October 2020, 10:41:14

On Thu, 4 Jun 2020 at 18:31, Andres Freund <andres@anarazel.de> wrote:

> On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:
> > Seems to me it should, at least conditionally.  At least if there's a function
> > scan or a relation or ..
>
> Well, the problem is that this can cause very very significant
> regressions. As in 10x slower or more. The ringbuffer can cause constant
> XLogFlush() calls (due to the lsn interlock), and the eviction from
> shared_buffers (regardless of actual available) will mean future vacuums
> etc will be much slower.  I think this is likely to cause pretty
> widespread regressions on upgrades.
>
> Now, it sucks that we have this problem in the general facility that's
> supposed to be used for this kind of bulk operation. But I don't really
> see it realistic as expanding use of bulk insert strategies unless we
> have some more fundamental fixes.

Are you saying that *anything* that uses the BulkInsertState is
generally broken? We use it for VACUUM and COPY writes, so you are
saying they are broken??

When we put that in, the use of the ringbuffer for writes required a
much larger number of blocks to smooth out the extra XLogFlush()
calls, but overall it was a clear win in those earlier tests. Perhaps
the ring buffer needs to be increased, or made configurable. The
eviction behavior was/is deliberate, to avoid large data loads
spoiling cache - perhaps that could also be configurable for the case
where data fits in shared buffers.

Anyway, if we can discuss what you see as broken, we can fix that and
then extend the usage to other cases, such as INSERT SELECT.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: should INSERT SELECT use a BulkInsertState?

From

Simon Riggs

Date:

22 October 2020, 12:29:53

On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pryzby@telsasoft.com> wrote:

> > > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.

I think it would be better if this was self-tuning. So that we don't
allocate a bulkinsert state until we've done say 100 (?) rows
inserted.

If there are other conditions under which this is non-optimal
(Andres?), we can also autodetect that and avoid them.

You should also use table_multi_insert() since that will give further
performance gains by reducing block access overheads. Switching from
single row to multi-row should also only happen once we've loaded a
few rows, so we don't introduce overahads for smaller SQL statements.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: should INSERT SELECT use a BulkInsertState?

From

Justin Pryzby

Date:

30 October 2020, 04:51:38

On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote:
> On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pryzby@telsasoft.com> wrote:
> 
> > > > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.
> 
> I think it would be better if this was self-tuning. So that we don't
> allocate a bulkinsert state until we've done say 100 (?) rows
> inserted.

I made it an optional, non-default behavior in response to the legitimate
concern for performance regression for the cases where a loader needs to be as
fast as possible - as compared with our case, where we want instead to optimize
for our reports by making the loaders responsible for their own writes, rather
than leaving behind many dirty pages, and clobbering the cache, too.

Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use
INSERT .. VALUES () .. ON CONFLICT.  This would handle that case, which is
great, even though that wasn't a design goal.  It could also be an integer GUC
to allow configuring the size of the ring buffer.

> You should also use table_multi_insert() since that will give further
> performance gains by reducing block access overheads. Switching from
> single row to multi-row should also only happen once we've loaded a
> few rows, so we don't introduce overahads for smaller SQL statements.

Good idea...multi_insert (which reduces the overhead of individual inserts) is
mostly independent from BulkInsert state (which uses a ring-buffer to avoid
dirtying the cache).  I made this 0002.

This makes INSERT SELECT several times faster, and not clobber the cache too.

Time: 4700.606 ms (00:04.701)
   123 |          1
    37 |          2
    20 |          3
    11 |          4
  4537 |          5
 11656 |           

Time: 1125.302 ms (00:01.125)
  2171 |          1
    37 |          2
    20 |          3
    11 |          4
   111 |          5
 14034 |           

When enabled, this passes nearly all regression tests, and all but 2 of the
changes are easily understood.  The 2nd patch still needs work.

-- 
Justin

Attachment

Re: should INSERT SELECT use a BulkInsertState?

From

Luc Vlaming

Date:

02 November 2020, 06:53:45

On 30.10.20 05:51, Justin Pryzby wrote:
> On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote:
>> On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pryzby@telsasoft.com> wrote:
>>
>>>>> I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.
>>
>> I think it would be better if this was self-tuning. So that we don't
>> allocate a bulkinsert state until we've done say 100 (?) rows
>> inserted.
> 
> I made it an optional, non-default behavior in response to the legitimate
> concern for performance regression for the cases where a loader needs to be as
> fast as possible - as compared with our case, where we want instead to optimize
> for our reports by making the loaders responsible for their own writes, rather
> than leaving behind many dirty pages, and clobbering the cache, too.
> 
> Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use
> INSERT .. VALUES () .. ON CONFLICT.  This would handle that case, which is
> great, even though that wasn't a design goal.  It could also be an integer GUC
> to allow configuring the size of the ring buffer.
> 
>> You should also use table_multi_insert() since that will give further
>> performance gains by reducing block access overheads. Switching from
>> single row to multi-row should also only happen once we've loaded a
>> few rows, so we don't introduce overahads for smaller SQL statements.
> 
> Good idea...multi_insert (which reduces the overhead of individual inserts) is
> mostly independent from BulkInsert state (which uses a ring-buffer to avoid
> dirtying the cache).  I made this 0002.
> 
> This makes INSERT SELECT several times faster, and not clobber the cache too.
> 
> Time: 4700.606 ms (00:04.701)
>     123 |          1
>      37 |          2
>      20 |          3
>      11 |          4
>    4537 |          5
>   11656 |
> 
> Time: 1125.302 ms (00:01.125)
>    2171 |          1
>      37 |          2
>      20 |          3
>      11 |          4
>     111 |          5
>   14034 |
> 
> When enabled, this passes nearly all regression tests, and all but 2 of the
> changes are easily understood.  The 2nd patch still needs work.
> 

Hi,

Came across this thread because I'm working on an improvement for the 
relation extension to improve the speed of the bulkinsert itself in 
(highly) parallel cases and would like to make sure that our approaches 
work nicely together.

Given what I've seen and tried so far with various benchmarks I would 
also really like to see a different approach here. The "BEGIN BULK" can 
be problematic for example if you mix small amounts of inserts and big 
amounts in the same transaction, or if your application possibly does a 
bulk insert but otherwise mostly OLTP transactions.

To me the idea from Simon sounds good to only use a bulk insert state 
after inserting e.g. a 1000 rows, and this also seems more applicable to 
most applications compared to requiring a change to any application that 
wishes to have faster ingest.

Another approach could be to combine this, for example, with a few extra 
requirements to limit the amount of regressions and first learn more how 
this behaves in the field.
We could, for example, only (just throwing out some ideas), require that:
- the relation has a certain size
- a BufferStrategy a maximum certain size is used
- there is a certain amount of lock waiters on relation extension. (like 
we do with bulk extend)
- we have extended the relation for at least e.g. 4 MB and not used the 
FSM anymore thereby proving that we are doing bulk operations instead of 
random small extensions everywhere into the relation that use the FSM.

Another thing is that we first try to improve the bulk operation 
facilities in general and then have another shot at this? Not sure if 
there is some benchmark / query that shows where such a 10x slowdown 
would appear but maybe that would be worth a look as well possibly.

Regards,
Luc

Re: should INSERT SELECT use a BulkInsertState?

From

Bharath Rupireddy

Date:

03 December 2020, 05:29:34

On Wed, Dec 2, 2020 at 10:24 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> One loose end in this patch is how to check for volatile default expressions.
>
> copyfrom.c is a utility statement, so it can look at the parser's column list:
> COPY table(c1,c2)...
>
> However, for INSERT, in nodeModifyTable.c, it looks like parsing, rewriting,
> and planning are done, at which point I don't know if there's a good way to
> find that.  The default expressions will have been rewritten into the planned
> statement.
>
> We need the list of columns whose default is volatile, excluding columns for
> which a non-default value is specified.
>
> INSERT INTO table (c1,c2) VALUES (1,default);
>
> We'd want the list of any column in the table with a volatile default,
> excluding columns c1, but not-excluding explicit default columns c2 or any
> implicit default columns (c3, etc).
>
> Any idea ?
>

I think we should be doing all the necessary checks in the planner and
have a flag in the planned stmt to indicate whether to go with multi
insert or not. For the required checks, we can have a look at how the
existing COPY decides to go with either CIM_MULTI or CIM_SINGLE.

Now, the question of how we can get to know whether a given relation
has default expressions or volatile expressions, it is worth to look
at build_column_default() and contain_volatile_functions().

I prefer to have the multi insert deciding code in COPY and INSERT
SELECT, in a single common function which can be reused. Though COPY
has somethings like default expressions and others ready unlike INSERT
SELECT, we can try to keep them under a common function and say for
COPY we can skip some code and for INSERT SELECT we can do extra work
to find default expressions.

Although unrelated, for parallel inserts in INSERT SELECT[1], in the
planner there are some checks to see if the parallelism is safe or
not. Check max_parallel_hazard_for_modify() in
v8-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch from
[1]. On the similar lines, we can also have multi insert deciding
code.

[1] https://www.postgresql.org/message-id/CAJcOf-fy3P%2BkDArvmbEtdQTxFMf7Rn2%3DV-sqCnMmKO3QKBsgPA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: should INSERT SELECT use a BulkInsertState?

From

Ibrar Ahmed

Date:

18 March 2021, 13:37:41

On Mon, Mar 8, 2021 at 2:18 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

> > > I am very interested in this patch, and I plan to do some
> > > experiments with the
> > patch.
> > > Can you please rebase the patch because it seems can not applied to
> > > the
> > master now.
> >
> > Thanks for your interest.
> >
> > I was sitting on a rebased version since the bulk FDW patch will cause
> > conflicts, and since this should maybe be built on top of the table-am patch
> (2871).
> > Have fun :)
> Hi,
>
> When I testing with the patch, I found I can not use "\d tablename".
> It reports the following error, it this related to the patch?

Sorry, solved by re-initdb.

Best regards,
houzj

One of the patch (v10-0001-INSERT-SELECT-to-use-BulkInsertState-and-multi_i.patch) from the patchset does not apply.

http://cfbot.cputube.org/patch_32_2553.log

1 out of 13 hunks FAILED -- saving rejects to file src/backend/commands/copyfrom.c.rej

It is a minor change, therefore I fixed that to make cfbot happy. Please take a look if that works for you.

Ibrar Ahmed

Attachment

v11-0001-INSERT-SELECT-to-use-BulkInsertState-and-multi_i.patch

Re: should INSERT SELECT use a BulkInsertState?

From

Zhihong Yu

Date:

18 March 2021, 15:29:50

Hi,

+ mtstate->ntuples > bulk_insert_ntuples &&
+ bulk_insert_ntuples >= 0)

I wonder why bulk_insert_ntuples == 0 is included in the above. It seems bulk_insert_ntuples having value of 0 should mean not enabling bulk insertions.

+ }
+ else
+ {

nit: the else should be on the same line as the right brace.

Cheers

On Thu, Mar 18, 2021 at 6:38 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

On Mon, Mar 8, 2021 at 2:18 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> > > I am very interested in this patch, and I plan to do some
> > > experiments with the
> > patch.
> > > Can you please rebase the patch because it seems can not applied to
> > > the
> > master now.
> >
> > Thanks for your interest.
> >
> > I was sitting on a rebased version since the bulk FDW patch will cause
> > conflicts, and since this should maybe be built on top of the table-am patch
> (2871).
> > Have fun :)
> Hi,
>
> When I testing with the patch, I found I can not use "\d tablename".
> It reports the following error, it this related to the patch?

Sorry, solved by re-initdb.

Best regards,
houzj

One of the patch (v10-0001-INSERT-SELECT-to-use-BulkInsertState-and-multi_i.patch) from the patchset does not apply.

http://cfbot.cputube.org/patch_32_2553.log

1 out of 13 hunks FAILED -- saving rejects to file src/backend/commands/copyfrom.c.rej

It is a minor change, therefore I fixed that to make cfbot happy. Please take a look if that works for you.
--
Ibrar Ahmed

RE: should INSERT SELECT use a BulkInsertState?

From

"houzj.fnst@fujitsu.com"

Date:

22 March 2021, 09:44:49

Hi,

About the 0002-patch [Check for volatile defaults].

I wonder if we can check the volatile default value by traversing "query->targetList" in planner.

IMO, the column default expression was written into the targetList, and the current parallel-safety check
travere the "query->targetList" to determine whether it contains unsafe column default expression.
Like: standard_planner-> query_tree_walker
    if (walker((Node *) query->targetList, context))
        return true;
May be we can do the similar thing to check the volatile defaults, if so, we do not need to add a field to
TargetEntry.

Best regards,
houzj

Re: should INSERT SELECT use a BulkInsertState?

From

Daniel Gustafsson

Date:

04 November 2021, 12:24:21

> On 27 Sep 2021, at 02:08, Justin Pryzby <pryzby@telsasoft.com> wrote:

> I split off the simple part again.  If there's no interest in the 0001 patch
> alone, then I guess it should be closed in the CF.

Since the thread has stalled, maybe that's the best course of action here.  Any
objections from anyone on the thread?

--
Daniel Gustafsson        https://vmware.com/