Thread: BUG #15485: Order by of inlineable stable function incorrect

BUG #15485: Order by of inlineable stable function incorrect

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15485
Logged by:          Tommas Factorylab
Email address:      tommas@factorylab.nl
PostgreSQL version: 9.6.10
Operating system:   ubuntu 14.04 x64
Description:

Hi,

I am encountering a problem where the resultset is not correctly ordered
(and that results in incorrect behaviour of my application). The ordering is
done by a table function that is referenced in the "from clause" (function
"fA"). The query also left joins various other table functions. These table
functions do not perform any ordering.

I expect column "fA_c6" to be ordered ascending, but instead it has an
undefined order, or even a descending order.
What I have tried/discovered so far:
- The order is consistent over multiple runs
- The order is not ignored (as in, the order is different than a query
without any order specification would return)
- If I mark the functions as "volatile", the resultset is ordered
correctly
- If I remove certain columns from the "select clause", the resultset is
ordered correctly. I have not found any pattern, it is not like one specific
column causes the behaviour.
- If I only execute "fA", the resultset is ordered correctly

The way I understand the function volatility categories, my functions are
allowed to be "stable". From the documentation I understand that using table
functions in left joins like this is allowed. I am aware of the existence of
"row from", but am unable to pass arguments to the functions using that. If
using table functions in left joins like this is not allowed, I would expect
more defined behaviour.

The obfuscated SQL that reproduces the behaviour:
CREATE TYPE public.datatype AS ENUM (
    'a',
    'b',
    'c'
);

CREATE TABLE public.tA (
    c1 integer NOT NULL,
    c2 integer,
    c3 integer NOT NULL,
    c4 integer NOT NULL,
    c5 integer NOT NULL,
    c6 integer NOT NULL,
    c7 integer NOT NULL,
    c8 double precision,
    CONSTRAINT check1 CHECK ((((c7 = 1) AND (c8 IS NULL)) OR ((c7 > 1) AND
(c8 IS NOT NULL)))),
    CONSTRAINT check2 CHECK ((c7 <> 0))
);

CREATE TABLE public.tB (
    c1 integer NOT NULL,
    c2 integer NOT NULL,
    c3 character varying(200) NOT NULL,
    c4 integer NOT NULL
);

CREATE TABLE public.tC (
    c1 character varying(200) NOT NULL,
    c2 text
);

CREATE TABLE public.tD (
    c1 integer NOT NULL,
    c2 character varying(200) NOT NULL,
    c3 character varying(200) NOT NULL,
    c4 character varying(200) NOT NULL
);

CREATE TABLE public.tE (
    c1 character varying(200) NOT NULL,
    c2 text
);

CREATE TABLE public.tF (
    c1 character varying(200) NOT NULL,
    c2 character varying(20) NOT NULL
);

CREATE TABLE public.tG (
    c1 integer NOT NULL,
    c2 character varying(200) NOT NULL,
    c3 character varying(200) NOT NULL,
    c4 text
);

CREATE TABLE public.tH (
    c1 integer NOT NULL,
    c2 character varying(200) NOT NULL,
    c3 character varying(200) NOT NULL,
    c4 character varying(200) NOT NULL,
    c5 public.datatype NOT NULL,
    c6 character varying(200) NOT NULL
);


CREATE OR REPLACE FUNCTION public.fA(p1 integer) RETURNS TABLE(c1 integer,
c2 integer, c3 integer, c4 integer, c5 integer, c6 integer, c7 integer, c8
double precision)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2, c3, c4, c5, c6, c7, c8
    FROM tA
    WHERE c3 = p1
    ORDER BY c6 ASC
$$;

CREATE OR REPLACE FUNCTION public.fB(p1 integer) RETURNS TABLE(c1 integer,
c2 integer, c3 character varying, c4 integer)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2, c3, c4
    FROM tB
    WHERE c1 = p1
$$;

CREATE OR REPLACE FUNCTION public.fC(p1 character varying) RETURNS TABLE(c1
character varying, c2 text)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2
    FROM tC
    WHERE c1 = p1
$$;

CREATE OR REPLACE FUNCTION public.fD(p1 integer) RETURNS TABLE(c1 integer,
c2 character varying, c3 character varying, c4 character varying)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2, c3, c4
    FROM tD
    WHERE c1 = p1
$$;

CREATE OR REPLACE FUNCTION public.fE(p1 character varying) RETURNS TABLE(c1
character varying, c2 text)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2
    FROM tE
    WHERE c1 = p1
$$;

CREATE OR REPLACE FUNCTION public.fF(p1 character varying) RETURNS TABLE(c1
character varying, c2 character varying)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2
    FROM tF
    WHERE c1 = p1
$$;

CREATE OR REPLACE FUNCTION public.fG(p1 integer) RETURNS TABLE(c1 integer,
c2 character varying, c3 character varying, c4 text)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2, c3, c4
    FROM tG
    WHERE c1 = p1
$$;

CREATE OR REPLACE FUNCTION public.fH(p1 integer) RETURNS TABLE(c1 integer,
c2 character varying, c3 character varying, c4 character varying, c5
public.datatype, c6 character varying)
    LANGUAGE sql STABLE
    AS $$
    SELECT c1, c2, c3, c4, c5, c6
    FROM tH
    WHERE c1 = p1
$$;


INSERT INTO public.ta (c1, c2, c3, c4, c5, c6, c7, c8) VALUES (1, 1, 1, 1,
1, 0, 1, NULL);
INSERT INTO public.ta (c1, c2, c3, c4, c5, c6, c7, c8) VALUES (2, 2, 1, 1,
2, 1, 1, NULL);
INSERT INTO public.ta (c1, c2, c3, c4, c5, c6, c7, c8) VALUES (3, 3, 1, 1,
1, 2, 1, NULL);
INSERT INTO public.ta (c1, c2, c3, c4, c5, c6, c7, c8) VALUES (4, 4, 1, 1,
2, 3, 1, NULL);

INSERT INTO public.tb (c1, c2, c3, c4) VALUES (1, 1, 'nA', 1);
INSERT INTO public.tb (c1, c2, c3, c4) VALUES (2, 2, 'nA', 1);
INSERT INTO public.tb (c1, c2, c3, c4) VALUES (3, 1, 'nA', 2);
INSERT INTO public.tb (c1, c2, c3, c4) VALUES (4, 2, 'nA', 2);

INSERT INTO public.tc (c1, c2) VALUES ('nA', '');
INSERT INTO public.tc (c1, c2) VALUES ('nB', '');

INSERT INTO public.td (c1, c2, c3, c4) VALUES (1, 'msA', 'qA', '');
INSERT INTO public.td (c1, c2, c3, c4) VALUES (2, 'msA', 'qB', '');
INSERT INTO public.td (c1, c2, c3, c4) VALUES (3, 'msB', 'qA', '');
INSERT INTO public.td (c1, c2, c3, c4) VALUES (4, 'msB', 'qB', '');

INSERT INTO public.te (c1, c2) VALUES ('msA', '');
INSERT INTO public.te (c1, c2) VALUES ('msB', '');

INSERT INTO public.tf (c1, c2) VALUES ('qA', 'uA');
INSERT INTO public.tf (c1, c2) VALUES ('qB', 'uB');

INSERT INTO public.tg (c1, c2, c3, c4) VALUES (1, 'nA', 'nsA', '');

INSERT INTO public.th (c1, c2, c3, c4, c5, c6) VALUES (1, 'mA', 'mB', 'qA',
'a', 'fA');
INSERT INTO public.th (c1, c2, c3, c4, c5, c6) VALUES (2, 'mA', 'tB', 'qB',
'b', 'fB');


SELECT
    fA.c1 AS fA_c1, fA.c4 AS fA_c4, fA.c6 AS fA_c6, fA.c7 AS fA_c7, fA.c8 AS
fA_c8,
    fB.c1 AS fB_c1, fB.c4 AS fB_c4, 
    fC.c1 AS fC_c1, fC.c2 AS fC_c2,
    fD.c1 AS fD_c1, fD.c4 AS fD_c4,
    fE.c1 AS fE_c1, fE.c2 AS fE_c2,
    fF.c1 AS fF_c1, fF.c2 AS fF_c2,
    fG.c1 AS fG_c1, fG.c2 AS fG_c2, fG.c3 AS fG_c3, fG.c4 AS fG_c4,
    fH.c1 AS fH_c1, fH.c2 AS fH_c2, fH.c3 AS fH_c3, fH.c4 AS fH_c4, fH.c5 AS
fH_c5, fH.c6 AS fH_c6
FROM fA(1) AS fA
LEFT OUTER JOIN fB(fA.c2) AS fB ON true
LEFT OUTER JOIN fC(fB.c3) AS fC ON true
LEFT OUTER JOIN fD(fB.c2) AS fD ON true
LEFT OUTER JOIN fE(fD.c2) AS fE ON true
LEFT OUTER JOIN fF(fD.c3) AS fF ON true
LEFT OUTER JOIN fG(fA.c3) AS fG ON true
LEFT OUTER JOIN fH(fA.c5) AS fH ON true


Re: BUG #15485: Order by of inlineable stable function incorrect

From
Gavin Flower
Date:
On 04/11/2018 13:11, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      15485
> Logged by:          Tommas Factorylab
> Email address:      tommas@factorylab.nl
> PostgreSQL version: 9.6.10
> Operating system:   ubuntu 14.04 x64
> Description:
>
> Hi,
>
> I am encountering a problem where the resultset is not correctly ordered
> (and that results in incorrect behaviour of my application). The ordering is
> done by a table function that is referenced in the "from clause" (function
> "fA"). The query also left joins various other table functions. These table
> functions do not perform any ordering.
[...]
>
> SELECT
>     fA.c1 AS fA_c1, fA.c4 AS fA_c4, fA.c6 AS fA_c6, fA.c7 AS fA_c7, fA.c8 AS
> fA_c8,
>     fB.c1 AS fB_c1, fB.c4 AS fB_c4,
>     fC.c1 AS fC_c1, fC.c2 AS fC_c2,
>     fD.c1 AS fD_c1, fD.c4 AS fD_c4,
>     fE.c1 AS fE_c1, fE.c2 AS fE_c2,
>     fF.c1 AS fF_c1, fF.c2 AS fF_c2,
>     fG.c1 AS fG_c1, fG.c2 AS fG_c2, fG.c3 AS fG_c3, fG.c4 AS fG_c4,
>     fH.c1 AS fH_c1, fH.c2 AS fH_c2, fH.c3 AS fH_c3, fH.c4 AS fH_c4, fH.c5 AS
> fH_c5, fH.c6 AS fH_c6
> FROM fA(1) AS fA
> LEFT OUTER JOIN fB(fA.c2) AS fB ON true
> LEFT OUTER JOIN fC(fB.c3) AS fC ON true
> LEFT OUTER JOIN fD(fB.c2) AS fD ON true
> LEFT OUTER JOIN fE(fD.c2) AS fE ON true
> LEFT OUTER JOIN fF(fD.c3) AS fF ON true
> LEFT OUTER JOIN fG(fA.c3) AS fG ON true
> LEFT OUTER JOIN fH(fA.c5) AS fH ON true
>
I suspect that you still need to use an 'ORDER BY' clause.


Cheers,
Gavin



Re: BUG #15485: Order by of inlineable stable function incorrect

From
Andrew Gierth
Date:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

This isn't a bug; please don't abuse the bug report system.

 PG> I am encountering a problem where the resultset is not correctly
 PG> ordered (and that results in incorrect behaviour of my
 PG> application). The ordering is done by a table function that is
 PG> referenced in the "from clause" (function "fA"). The query also
 PG> left joins various other table functions. These table functions do
 PG> not perform any ordering.

Nothing has ever guaranteed that the order of rows returned by a table
function is preserved in the query result (in practice, it is preserved
ONLY if there are no joins, no grouping, no aggregates, no window
functions, no set operations, no DISTINCT, and (obviously) no other
ORDER BY clause, but even that should not normally be relied on since it
only happens by accident).

More generally: the ONLY thing that guarantees the order of rows in a
query result is an _explicit_ ORDER BY clause _at the topmost query
level_. No ORDER BY clause anywhere else in the query counts; the
planner is completely free to reorder rows as it sees fit.

To get the result you seem to want, you need to do:

SELECT
    ...
FROM fA(1) WITH ORDINALITY AS fA
     LEFT OUTER JOIN fB(fA.c2) AS fB ON true
     LEFT OUTER JOIN fC(fB.c3) AS fC ON true
     LEFT OUTER JOIN fD(fB.c2) AS fD ON true
     LEFT OUTER JOIN fE(fD.c2) AS fE ON true
     LEFT OUTER JOIN fF(fD.c3) AS fF ON true
     LEFT OUTER JOIN fG(fA.c3) AS fG ON true
     LEFT OUTER JOIN fH(fA.c5) AS fH ON true
ORDER BY fA.ordinality;

The planner knows that the result of the fA() call is already sorted by
its ordinality column, so it will only do an explicit additional sort if
it estimates that the query will be cheaper that way.

There is one slight complication with this approach, which is that the
use of WITH ORDINALITY will block inlining of the function (see
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions for details).
If you need to preserve inlinability of the function fA(), then instead
of using WITH ORDINALITY you will instead need to duplicate the actual
ORDER BY condition used inside the function in an ORDER BY at the top
query level (the planner will know that the function result is already
ordered this way IF the function was in fact inlined).

You can check if a FROM-clause function was inlined by looking at the
EXPLAIN output: a FunctionScan node for the function is present only if
the function was _not_ inlined.

 PG> I expect column "fA_c6" to be ordered ascending, but instead it has
 PG> an undefined order, or even a descending order.

 PG> What I have tried/discovered so far:
 PG> - The order is consistent over multiple runs

This is pure coincidence.

 PG> - The order is not ignored (as in, the order is different than a
 PG> query without any order specification would return)

This is also pure coincidence.

 PG> - If I mark the functions as "volatile", the resultset is ordered
 PG> correctly

This is also pure coincidence, caused by the fact that marking the
functions volatile prevents them from being inlined, forcing the planner
to use explicit FunctionScans and blocking various performance
optimizations that would incidentally affect the order of rows. You
still CANNOT assume that this will be true in the general case - the
planner is free to reorder the result as it chooses.

 PG> - If I remove certain columns from the "select clause", the
 PG> resultset is ordered correctly. I have not found any pattern, it is
 PG> not like one specific column causes the behaviour.

This is also pure coincidence.

 PG> - If I only execute "fA", the resultset is ordered correctly

If you do a simple "select * from fA(1)" then the planner simply has
nothing else to add to the query plan after constructing the function
call (whether as a FunctionScan or an inlined query), so the result
order is coincidentally unaffected. 

 PG> The way I understand the function volatility categories, my
 PG> functions are allowed to be "stable". From the documentation I
 PG> understand that using table functions in left joins like this is
 PG> allowed.

It is allowed. A table function (any function used as if it were a table
in a FROM-clause, whether declared RETURNS SETOF/TABLE or not) can be
used in any way that a base table can be.

Why did you think that (a) function volatility had anything to do with
this, and (b) that the result order of your function would be preserved?

 PG> I am aware of the existence of "row from", but am unable to pass
 PG> arguments to the functions using that.

ROWS FROM serves a different purpose: to call multiple functions (which
can have parameters, but not parameters referring to results of other
functions in the same construct) with their results returned as if
joined by a FULL JOIN USING (ordinality) - that is, the first result row
is the first row of each function result, the second result row is the
second row of each result, and so on.

-- 
Andrew (irc:RhodiumToad)