Re: BUG #15577: Query returns different results when executedmultiple times - Mailing list pgsql-bugs
From | David Rowley |
---|---|
Subject | Re: BUG #15577: Query returns different results when executedmultiple times |
Date | |
Msg-id | CAKJS1f8zFL8539hpy_gBV3JOcNRqhb1wVVhuVkp_H_E_qJU_iw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15577: Query returns different results when executedmultiple times (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: BUG #15577: Query returns different results when executedmultiple times
|
List | pgsql-bugs |
On Wed, 9 Jan 2019 at 17:01, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > Now the number of results from the query in the script is not stable, > > varying between 235 or 246 results for me. Not sure what's going on > > yet. > > When there are multiple matches found by the Index Scan, sometimes > only the first one makes it through to the Nested Loop, in the leader. > It's not because of single_match being inappropriately set as I > speculated earlier (though that'd behave similarly I guess) but > because the quals are somehow broken in between invocations so that > ExecScan() filters the rest of the matches out, and then fixed again > on the next rescan. This happens whenever the leader takes a break to > read tuple from parallel workers in between executing the Nested Loop > node, which it often does after reading the first of several tuples > resulting from a single rescan. > > What exactly is being corrupted and how, I don't yet know, and I need > to leave this here for today, but that's what I've got so far. From what I can see it seems to be the parameter for the index scan that gets the wrong value. I changed int4eq to become: Datum int4eq(PG_FUNCTION_ARGS) { int32 arg1 = PG_GETARG_INT32(0); int32 arg2 = PG_GETARG_INT32(1); if (arg1 == 100112 || arg2 == 100112) elog(NOTICE, "%d %d", arg1, arg2); PG_RETURN_BOOL(arg1 == arg2); } When the query returns the correct number of rows I only see NOTICES with: NOTICE: 100112 100112 but when the wrong results are returned I see a few such as: NOTICE: 100112 57772 If I put a conditional breakpoint on the elog with the condition "arg2 != 100112" I see that the arg2 value is coming from the nested loop's parameter. If I modify the query to remove the redundant tba.target_id = tc.id from the WHERE clause, then I can't trigger the issue. i.e. AND ( (tba.target_id = tc.id AND tba.group_type = 'A') OR tba.source_id = tc.id ); becomes: AND ( tba.group_type = 'A' OR tba.source_id = tc.id ); So the parameter does not need to have its value checked again during the qual check. (If Bartosz is looking for a quick fix then he should be able to alter the query to that) The index scan should have already ensured that tba.target_id = tc.id. So the nested loop parameter must have had the correct value at the start of the index scan, but somehow it must have changed sometime after the first tuple was returned from the index scan and before we asked the index for the next tuple (hence why it looked like a semi-join's result) I've just not managed to find out how the parameter value gets changed. The value does not seem to have been stomped on as each time I see the incorrect parameter value, it's set to something else that occurs in the table. At the moment I'm a bit suspicious of the following code in ParallelQueryMain(). paramexec_space = dsa_get_address(area, fpes->param_exec); RestoreParamExecParams(paramexec_space, queryDesc->estate); On first look, it seems that the exec params are in shared memory? ... but I need more time to figure out how this all fits together. I've attached the updated script per Thomas' alterations to make it break. The breakage is very old. I found it existed when parallel joins were added, but not necessarily blaming that commit. 45be99f8cd Support parallel joins, and make related improvements. -- BAD a7de3dc5c3 Support multi-stage aggregation. -- GOOD -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-bugs by date: