Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Date
Msg-id CAFjFpRcTr-A6x_XwQmv+DWEkVs_Ti_v6P3XLLOKCEOZOQmc3ag@mail.gmail.com
Whole thread Raw
In response to Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs  (Atri Sharma <atri.jiit@gmail.com>)
Responses Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
List pgsql-hackers


On Mon, Jan 5, 2015 at 8:42 PM, Atri Sharma <atri.jiit@gmail.com> wrote:

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that might be present inherently in the SRF's input or algorithm (the above example and think generate_series).

This allows for eliminating sorts that can be based off the known existing preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort by e since          existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY clause is false. For checking the order, a new node is added which is top node of the plan and is responsible for projecting result rows. It tracks the previous row seen and given a sort order, ensures that the current tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
                  QUERY PLAN                  
-----------------------------------------------
 OrderCheck
   ->  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially useful when SRF inputs are large and/or might be pipelined from another function (SRFs are used in pipelines in analytical systems many times, with large data).

The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule.


We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function.
 
In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: On partitioning
Next
From: Atri Sharma
Date:
Subject: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs