Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query - Mailing list pgsql-bugs
From | Andrew Gierth |
---|---|
Subject | Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query |
Date | |
Msg-id | 87sh3g4h8n.fsf@news-spur.riddles.org.uk Whole thread Raw |
In response to | Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> Marking the function parallel safe doesn't seem wrong to me. The >> non-parallel-safe part is that the input gets fed to it in different >> order in different workers. And I don't really think that to be the >> function's fault. Tom> So that basically opens the question of whether *any* window Tom> function calculation can safely be pushed down to parallel Tom> workers. Grepping the spec for the phrase "possibly non-deterministic" is quite enlightening. Leaving out non-determinisms caused by timezone or actual volatility, leaving out cases of non-determinism that we'd call "stable", and leaving out features like multisets that we don't support at all, here's the list of interesting cases (comments after each quoted paragraph are mine): 6.28 <value expression> d) An <array value constructor by query>. i.e. ARRAY(select) o) An <aggregate function> that specifies MIN or MAX and that simply contains a <value expression> whose declared type is based on a character string type, user-defined type, or datetime with time zone type. i.e. MIN(x) is non-deterministic if "x" can have distinguishable values that compare equal. PG doesn't have that for text or timestamptz, unlike the spec, but it does for citext or other user-defined types. q) An <array aggregate function>. i.e. array_agg() u) A <window function> that specifies ROW_NUMBER, FIRST_VALUE, LAST_VALUE, NTH_VALUE, NTILE, LEAD, or LAG, or whose associated <window specification> specifies ROWS. This covers those cases where window functions don't treat peer rows together. 7.6 <table reference> 27) A <table reference> is possibly non-deterministic if the simply contained <table primary> or <joined table> is possibly non-deterministic or if <sample clause> is specified. i.e. TABLESAMPLE is non-deterministic 7.16 <query specification> a) The <set quantifier> DISTINCT is specified and one of the columns of T has a data type of character string, user-defined type, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE. c) The <select list>, <having clause>, or <window clause> contains a reference to a column C of T that has a data type of character string, user-defined type, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE, and the functional dependency G C, where G is the set consisting of the grouping columns of T, holds in T. For both the above two cases, if distinguishable values of a type compare equal, it's non-deterministic which gets into the result. 7.17 <query expression> a) The <query expression> contains a <result offset clause>. b) The <query expression> contains a <fetch first clause>. f) Both of the following are true: i) T contains a set operator UNION and ALL is not specified, or T contains either of the set operators EXCEPT or INTERSECT. ii) At least one of the following is true: 1) The first or second operand contains a column that has a declared type of character string. 2) The first or second operand contains a column that has a declared type of datetime with time zone. 3) The first or second operand contains a column that has a declared type that is a user-defined type. (I've left out the many clauses which just amount to "if $thing contains something which is possibly non-deterministic then it is possibly non-deterministic") -- Andrew (irc:RhodiumToad)
pgsql-bugs by date: