Problem with parallel_workers option (Was Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead) - Mailing list pgsql-hackers

From Nikolay Shaplov
Subject Problem with parallel_workers option (Was Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead)
Date
Msg-id 10823031.l8oNqghpbg@x200m
Whole thread Raw
Responses Re: Problem with parallel_workers option (Was Re: [PATCH] get rid ofStdRdOptions, use individual binary reloptions representation for eachrelation kind instead)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
В письме от четверг, 3 января 2019 г. 17:15:08 MSK пользователь Alvaro Herrera
написал:

> I would have liked to get a StaticAssert in the definition, but I don't
> think it's possible.  A standard Assert() should be possible, though.

Asserts are cool thing. I found some unexpected stuff.

parallel_workers option is claimed to be heap-only option.

But in src/backend/optimizer/util/plancat.c in get_relation_info
RelationGetParallelWorkers is being called for both heap and toast tables (and
not only for them).

Because usually there are no reloptions for toast,  it returns default -1
value. If some reloptions were set for toast table, RelationGetParallelWorkers
will return a value from uninitialized memory.

This will happen because StdRdOptions structure is filled with values in
fillRelOptions function. And fillRelOptions first iterates on options list, and
then on elems. So if option is not in "option list" than it's value will not
be set.

And options list comes from parseRelOptions where only options with proper
relation kind is selected from [type]RelOpts[] arrays. So parallel_workers
will not be added to options in the case of toast table because it is claimed
to be applicable only to RELOPT_KIND_HEAP

Thus if toast has some options set, and rd_options in relation is not NULL,
get_relation_info will use value from uninitialized memory as number of
parallel workers.

To reproduce this Assert you can change RelationGetParallelWorkers macros in
src/include/utils/rel.h

#define IsHeapRelation(relation)                        \
    (relation->rd_rel->relkind == RELKIND_RELATION ||   \
     relation->rd_rel->relkind == RELKIND_MATVIEW  )

#define RelationGetParallelWorkers(relation, defaultpw)                 \
    (AssertMacro(IsHeapRelation(relation)),                             \
     ((relation)->rd_options ?                                          \
      ((StdRdOptions *) (relation)->rd_options)->parallel_workers :   \
            (defaultpw)))

and see how it asserts. It will happen just on the db_initiaisation phase of
make check.

If you add relation->rd_rel->relkind == RELKIND_TOASTEDVALUE to the Assertion,
it will Assert in cases when get_relation_info is called for partitioned
table. This case is not a problem for now, because partitioned table has no
options and you will always have NULL in rd_options and get default value. But
it will become a problem when somebody adds some options, especially it would
be a problem if this options do not use StdRdOptions structure.
Also it is called for foreign tables and sequences.

So my suggestion of a hotfix is to replcace
rel->rel_parallel_workers = RelationGetParallelWorkers(relation,-1);
from get_relation_info with the following code

    switch (relation->rd_rel->relkind)
    {
        case RELKIND_RELATION:
        case RELKIND_MATVIEW:
            rel->rel_parallel_workers =
                                    RelationGetParallelWorkers(relation,-1);
            break;
        case RELKIND_TOASTVALUE:
        case RELKIND_PARTITIONED_TABLE:
        case RELKIND_SEQUENCE:
        case REPLICA_IDENTITY_FULL: /* Foreign table */
            rel->rel_parallel_workers = -1;
            break;
        default:
            /* Other relkinds are not supported */
            Assert(false);
    }

But I am not familiar with get_relation_info and parallel_workers  specific. So
I suspect real fix may be quite different.

Also I would suggest to fix it in all supported stable branches that have
parallel_workers option, because this bug may give something unexpected when
some toast options are set.



pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: pgsql: Remove WITH OIDS support, change oid catalog column visibility.
Next
From: Peter Eisentraut
Date:
Subject: Re: Using logical replication with older version subscribers