Home > mailing lists

Thread: postgres_fdw bug in 9.6

postgres_fdw bug in 9.6

From

Jeff Janes

Date:

08 December 2016, 17:07:24

I have a setup where a 9.6.1 server uses postgres_fdw to connect to a 9.4.9 hot standby server.

I have a DML statement which triggers the error:

ERROR: XX000: outer pathkeys do not match mergeclauses

LOCATION: create_mergejoin_plan, createplan.c:3722

The error first starts appearing with this commit (on the local side):

commit aa09cd242fa7e3a694a31f8aed521e80d1e626a4

Author: Robert Haas <rhaas@postgresql.org>

Date: Wed Mar 9 10:51:49 2016 -0500

postgres_fdw: Consider foreign joining and foreign sorting together.

The version of the remote side does not seem to matter. I've also promoted a test instance of the remote from hot standby to master and then upgraded to 9.6.1, and neither step fixes the issue.

The statement is like this:

explain update foo_local set col3=foo_remote.col3 from foo_remote where foo_local.id=foo_remote.id and foo_local.id in ('aaa','bbb','ccc','ddd');

Where foo_remote is a pretty complicated view (defined locally) over the join of 8 foreign tables.

I am having trouble producing a self-contained, disclosable test case for this. Small changes causes the error to go away. On the local side, it doesn't seem to depend on the contents of the table, only the structure. But on the remote side, truncating the central table for the query makes the error go away.

Any tips on investigating this further in situ? Or is the best option just to work harder on a minimal and disclosable test case?

Cheers,

Jeff

Re: postgres_fdw bug in 9.6

From

Tom Lane

Date:

08 December 2016, 17:51:00

Jeff Janes <jeff.janes@gmail.com> writes:
> I have a DML statement which triggers the error:
> ERROR:  XX000: outer pathkeys do not match mergeclauses
> LOCATION:  create_mergejoin_plan, createplan.c:3722

Hmm.

> Any tips on investigating this further in situ?  Or is the best option just
> to work harder on a minimal and disclosable test case?

I think we need a test case --- not minimal necessarily, but something
other people can reproduce.  You might find that setting enable_hashjoin
and/or enable_nestloop to false makes it easier to provoke the error,
since evidently this requires that we (a) generate a faulty mergejoin Path
and then (b) choose it as the cheapest one, since the error occurs while
converting it to a Plan.

BTW, if you're not doing this in a debug (--enable-cassert) build, it'd
be useful to try it in one.  I'm a little suspicious that the root cause
might be a memory-stomp type of problem, ie somebody scribbling on a
pathkey data structure without accounting for it being shared with another
path.  It's possible that cassert memory checking would help catch that.
        regards, tom lane

Re: postgres_fdw bug in 9.6

From

Robert Haas

Date:

08 December 2016, 18:04:38

On Thu, Dec 8, 2016 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I have a DML statement which triggers the error:
>> ERROR:  XX000: outer pathkeys do not match mergeclauses
>> LOCATION:  create_mergejoin_plan, createplan.c:3722
>
> Hmm.
>
>> Any tips on investigating this further in situ?  Or is the best option just
>> to work harder on a minimal and disclosable test case?
>
> I think we need a test case --- not minimal necessarily, but something
> other people can reproduce.  You might find that setting enable_hashjoin
> and/or enable_nestloop to false makes it easier to provoke the error,
> since evidently this requires that we (a) generate a faulty mergejoin Path
> and then (b) choose it as the cheapest one, since the error occurs while
> converting it to a Plan.

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgres_fdw bug in 9.6

From

Tom Lane

Date:

08 December 2016, 18:28:13

Robert Haas <robertmhaas@gmail.com> writes:
> Maybe it would help for Jeff to use elog_node_display() to the nodes
> that are causing the problem - e.g. outerpathkeys and innerpathkeys
> and best_path->path_mergeclauses, or just best_path - at the point
> where the error is thrown. That might give us enough information to
> see what's broken.

I'll be astonished if that's sufficient evidence.  We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight.  But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.
        regards, tom lane

Thread: postgres_fdw bug in 9.6

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment