Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel copy
Date	October 29, 2020 09:15:51
Msg-id	CAA4eK1K30X9DpbNrrp2GccFNJNCKfeoXoBT-f=qU5-ApRhAsuQ@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (vignesh C <vignesh21@gmail.com>)
Responses	Re: Parallel copy Re: Parallel copy
List	pgsql-hackers

Tree view

On Tue, Oct 27, 2020 at 7:06 PM vignesh C <vignesh21@gmail.com> wrote:
>
[latest version]

I think the parallel-safety checks in this patch
(v9-0002-Allow-copy-from-command-to-process-data-from-file) are
incomplete and wrong. See below comments.
1.
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }

I don't understand the above check. Why do we only need to check where
clause for parallel-safety when it contains volatile functions? It
should be checked otherwise as well, no? The similar comment applies
to other checks in this function. Also, I don't think there is a need
to make this function inline.

2.
+/*
+ * IsParallelCopyAllowed
+ *
+ * Check if parallel copy can be allowed.
+ */
+bool
+IsParallelCopyAllowed(CopyState cstate)
{
..
+ * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do
+ * not allow parallelism in such cases because such triggers might query
+ * the table we are inserting into and act differently if the tuples that
+ * have already been processed and prepared for insertion are not there.
+ * Now, if we allow parallelism with such triggers the behaviour would
+ * depend on if the parallel worker has already inserted or not that
+ * particular tuples.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_new_table ||
+ cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_after_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return false;
..

Why do we need to disable parallelism for before/after row triggers
unless they have parallel-unsafe functions? I see a few lines down in
this function you are checking parallel-safety of trigger functions,
what is the use of the same if you are already disabling parallelism
with the above check.

3. What about if the index on table has expressions that are
parallel-unsafe? What is your strategy to check parallel-safety for
partitioned tables?

I suggest checking Greg's patch for parallel-safety of Inserts [1]. I
think you will find that most of those checks are required here as
well and see how we can use that patch (at least what is common). I
feel the first patch should be just to have parallel-safety checks and
we can test that by trying to enable Copy with force_parallel_mode. We
can build the rest of the patch atop of it or in other words, let's
move all parallel-safety work into a separate patch.

Few assorted comments:
========================
1.
+/*
+ * ESTIMATE_NODE_SIZE - Estimate the size required for  node type in shared
+ * memory.
+ */
+#define ESTIMATE_NODE_SIZE(list, listStr, strsize) \
+{ \
+ uint32 estsize = sizeof(uint32); \
+ if ((List *)list != NIL) \
+ { \
+ listStr = nodeToString(list); \
+ estsize += strlen(listStr) + 1; \
+ } \
+ \
+ strsize = add_size(strsize, estsize); \
+}

This can be probably a function instead of a macro.

2.
+/*
+ * ESTIMATE_1BYTE_STR_SIZE - Estimate the size required for  1Byte strings in
+ * shared memory.
+ */
+#define ESTIMATE_1BYTE_STR_SIZE(src, strsize) \
+{ \
+ strsize = add_size(strsize, sizeof(uint8)); \
+ strsize = add_size(strsize, (src) ? 1 : 0); \
+}

This could be an inline function.

3.
+/*
+ * SERIALIZE_1BYTE_STR - Copy 1Byte strings to shared memory.
+ */
+#define SERIALIZE_1BYTE_STR(dest, src, copiedsize) \
+{ \
+ uint8 len = (src) ? 1 : 0; \
+ memcpy(dest + copiedsize, (uint8 *) &len, sizeof(uint8)); \
+ copiedsize += sizeof(uint8); \
+ if (src) \
+ dest[copiedsize++] = src[0]; \
+}

Similarly, this could be a function. I think keeping such things as
macros in-between code makes it difficult to read. Please see if you
can make these and similar macros as functions unless they are doing
few memory instructions. Having functions makes it easier to debug the
code as well.

[1] - https://www.postgresql.org/message-id/CAJcOf-cgfjj0NfYPrNFGmQJxsnNW102LTXbzqxQJuziar1EKfQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Craig Ringer
Date: 29 October 2020, 09:10:56
Subject: Re: Internal key management system

From: Fujii Masao
Date: 29 October 2020, 09:21:57
Subject: Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next