Re: Parallel copy - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CAA4eK1K30X9DpbNrrp2GccFNJNCKfeoXoBT-f=qU5-ApRhAsuQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Parallel copy
Re: Parallel copy |
List | pgsql-hackers |
On Tue, Oct 27, 2020 at 7:06 PM vignesh C <vignesh21@gmail.com> wrote: > [latest version] I think the parallel-safety checks in this patch (v9-0002-Allow-copy-from-command-to-process-data-from-file) are incomplete and wrong. See below comments. 1. +static pg_attribute_always_inline bool +CheckExprParallelSafety(CopyState cstate) +{ + if (contain_volatile_functions(cstate->whereClause)) + { + if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE) + return false; + } I don't understand the above check. Why do we only need to check where clause for parallel-safety when it contains volatile functions? It should be checked otherwise as well, no? The similar comment applies to other checks in this function. Also, I don't think there is a need to make this function inline. 2. +/* + * IsParallelCopyAllowed + * + * Check if parallel copy can be allowed. + */ +bool +IsParallelCopyAllowed(CopyState cstate) { .. + * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do + * not allow parallelism in such cases because such triggers might query + * the table we are inserting into and act differently if the tuples that + * have already been processed and prepared for insertion are not there. + * Now, if we allow parallelism with such triggers the behaviour would + * depend on if the parallel worker has already inserted or not that + * particular tuples. + */ + if (cstate->rel->trigdesc != NULL && + (cstate->rel->trigdesc->trig_insert_after_statement || + cstate->rel->trigdesc->trig_insert_new_table || + cstate->rel->trigdesc->trig_insert_before_row || + cstate->rel->trigdesc->trig_insert_after_row || + cstate->rel->trigdesc->trig_insert_instead_row)) + return false; .. Why do we need to disable parallelism for before/after row triggers unless they have parallel-unsafe functions? I see a few lines down in this function you are checking parallel-safety of trigger functions, what is the use of the same if you are already disabling parallelism with the above check. 3. What about if the index on table has expressions that are parallel-unsafe? What is your strategy to check parallel-safety for partitioned tables? I suggest checking Greg's patch for parallel-safety of Inserts [1]. I think you will find that most of those checks are required here as well and see how we can use that patch (at least what is common). I feel the first patch should be just to have parallel-safety checks and we can test that by trying to enable Copy with force_parallel_mode. We can build the rest of the patch atop of it or in other words, let's move all parallel-safety work into a separate patch. Few assorted comments: ======================== 1. +/* + * ESTIMATE_NODE_SIZE - Estimate the size required for node type in shared + * memory. + */ +#define ESTIMATE_NODE_SIZE(list, listStr, strsize) \ +{ \ + uint32 estsize = sizeof(uint32); \ + if ((List *)list != NIL) \ + { \ + listStr = nodeToString(list); \ + estsize += strlen(listStr) + 1; \ + } \ + \ + strsize = add_size(strsize, estsize); \ +} This can be probably a function instead of a macro. 2. +/* + * ESTIMATE_1BYTE_STR_SIZE - Estimate the size required for 1Byte strings in + * shared memory. + */ +#define ESTIMATE_1BYTE_STR_SIZE(src, strsize) \ +{ \ + strsize = add_size(strsize, sizeof(uint8)); \ + strsize = add_size(strsize, (src) ? 1 : 0); \ +} This could be an inline function. 3. +/* + * SERIALIZE_1BYTE_STR - Copy 1Byte strings to shared memory. + */ +#define SERIALIZE_1BYTE_STR(dest, src, copiedsize) \ +{ \ + uint8 len = (src) ? 1 : 0; \ + memcpy(dest + copiedsize, (uint8 *) &len, sizeof(uint8)); \ + copiedsize += sizeof(uint8); \ + if (src) \ + dest[copiedsize++] = src[0]; \ +} Similarly, this could be a function. I think keeping such things as macros in-between code makes it difficult to read. Please see if you can make these and similar macros as functions unless they are doing few memory instructions. Having functions makes it easier to debug the code as well. [1] - https://www.postgresql.org/message-id/CAJcOf-cgfjj0NfYPrNFGmQJxsnNW102LTXbzqxQJuziar1EKfQ%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: