Inline non-SQL SRFs using SupportRequestSimplify - Mailing list pgsql-hackers

From Paul Jungwirth
Subject Inline non-SQL SRFs using SupportRequestSimplify
Date
Msg-id 09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
Whole thread Raw
List pgsql-hackers
Hi Hackers,

Here is a proof-of-concept patch to inline set-returning functions (SRFs) besides those written in 
SQL. We already try to inline SQL-language functions,[1] but that means you must have a static SQL 
query. There is no way to get an inline-able query by dynamically building the sql in, say, plpgsql.

We also have a SupportRequestSimplify request type for functions that use SUPPORT to declare a 
support function, and it can replace the FuncExpr with an arbitrary nodetree.[2] I think this was 
intended for constant-substitution, but we can also use it to let functions generate dynamic SQL and 
then inline it. In this patch, if a SRF replaces itself with a Query node, then 
inline_set_returning_function will use that.

So far there are no tests or docs; I'm hoping to hear feedback on the idea before going further.

Here is my concrete use-case: I wrote a function to do a temporal semijoin,[3] and I want it to be 
inlined. There is a support function that builds the same SQL and lets Postgres parse it into a 
Query.[4] (In practice I would rewrite the main function in C too, so it could share the 
SQL-building code there, but this is just a POC.) If you build and install that extension on its 
`inlined` branch,[5] then you can do this:

```
\i bench.sql
explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 
'valid_at') j(id bigint, valid_at daterange);
explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 
'valid_at') j(id bigint, valid_at daterange) where j.id = 10::bigint;
```

Without this patch, you get `ERROR:  unrecognized node type: 58`. But with this patch you get these 
plans:

```
postgres=# explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 
'employee_id', 'valid_at') j(id bigint, valid_at daterange);
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  ProjectSet  (cost=4918.47..6177.06 rows=22300 width=40)
    ->  Hash Join  (cost=4918.47..6062.77 rows=223 width=53)
          Hash Cond: (employees.id = j.employee_id)
          Join Filter: (employees.valid_at && j.valid_at)
          ->  Seq Scan on employees  (cost=0.00..1027.39 rows=44539 width=21)
          ->  Hash  (cost=4799.15..4799.15 rows=9545 width=40)
                ->  Subquery Scan on j  (cost=4067.61..4799.15 rows=9545 width=40)
                      ->  HashAggregate  (cost=4067.61..4703.70 rows=9545 width=40)
                            Group Key: positions.employee_id
                            Planned Partitions: 16
                            ->  Seq Scan on positions  (cost=0.00..897.99 rows=44099 width=21)
(11 rows)

postgres=# explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 
'employee_id', 'valid_at') j(id bigint, valid_at daterange) where j.id = 10::bigint;
                                                       QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------
  ProjectSet  (cost=0.56..9.22 rows=100 width=40)
    ->  Nested Loop  (cost=0.56..8.71 rows=1 width=53)
          ->  GroupAggregate  (cost=0.28..4.39 rows=1 width=40)
                ->  Index Only Scan using idx_positions_on_employee_id on positions 
(cost=0.28..4.36 rows=5 width=21)
                      Index Cond: (employee_id = '10'::bigint)
          ->  Index Only Scan using employees_pkey on employees  (cost=0.28..4.30 rows=1 width=21)
                Index Cond: ((id = '10'::bigint) AND (valid_at && (range_agg(positions.valid_at))))
(7 rows)

```

In particular I'm excited to see in the second plan that the predicate gets pushed into the subquery.

If it seems good to let people use SupportRequestSimplify to make their SRFs be inlineable, I'm 
happy to add tests and docs. We should really document the idea of inlined functions in general, so 
I'll do that too.

Another approach I considered is using a separate support request, e.g. SupportRequestInlineSRF, and 
just calling it from inline_set_returning_function. I didn't like having two support requests that 
did almost exactly the same thing. OTOH my current approach means you'll get an error if you do this:

```
postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 
'valid_at');
ERROR:  unrecognized node type: 66
```

I'll look into ways to fix that.

I think SupportRequestSimplify is a really cool feature. It is nearly like having macros.
I'm dreaming about other ways I can (ab)use it. Just making inline-able SRFs has many applications. 
 From my own client work, I could use this for a big permissions query or a query with complicated 
pricing logic.

The sad part though is that SUPPORT functions must be written in C. That means few people will use 
them, especially these days when so many are in the cloud. Since they take a Node and return a Node, 
maybe there is no other way. But I would love to have a different mechanism that receives the 
function's arguments (evaluated) and returns a string, which we parse as a SQL query and then 
inline. The arguments would have to be const-reducible to strings, of course. You could specify that 
function with a new INLINE keyword when you create your target function. That feature would be less 
powerful, but with broader reach.

I'd be glad to hear your thoughts!

[1] https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions (I couldn't find any mention in our 
docs though, so we should add that.)
[2] https://www.postgresql.org/docs/current/xfunc-optimization.html
[3] https://github.com/pjungwir/temporal_ops/blob/master/temporal_ops--1.0.0.sql
[4] https://github.com/pjungwir/temporal_ops/blob/inlined/temporal_ops.c
[5] https://github.com/pjungwir/temporal_ops/tree/inlined

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

pgsql-hackers by date:

Previous
From: Paul Jungwirth
Date:
Subject: Re: SQL:2011 application time
Next
From: "David E. Wheeler"
Date:
Subject: Re: Proposal: Document ABI Compatibility