Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries - Mailing list pgsql-performance

From Maxim Boguk
Subject Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Date
Msg-id CAK-MWwQMx11HP9capawJtzaY1h-HriTAYdmgzPmem=UDqajdcA@mail.gmail.com
Whole thread Raw
In response to Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-performance


On Mon, Nov 28, 2011 at 9:50 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 11/25/2011 06:53 AM, Maxim Boguk wrote:
I understand that position.
However if assumption: " the definition of ORDER BY --- it happens after
computing the select list, according to the SQL standard"
is correct,
then plans like:

postgres=# EXPLAIN ANALYZE SELECT * from test order by _data limit 10
offset 1000;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2884.19..2913.03 rows=10 width=8) (actual
time=3.584..3.620 rows=10 loops=1)
   ->  Index Scan using random_key on test  (cost=0.00..2884190.16
rows=1000000 width=8) (actual time=0.103..3.354 rows=1010 loops=1)
 Total runtime: 3.663 ms
(3 rows)
should not be used at all.


`LIMIT' and `OFFSET' are explicitly defined to compute only that part of the SELECT list that is required. If they weren't specifically defined with that exception then you'd be right.

LIMIT and OFFSET aren't standard anyway, so Pg can define them to mean whatever is most appropriate. The SQL standard is adding new and (as usual) painfully clumsily worded features that work like LIMIT and OFFSET, but I don't know whether they have the same rules about whether execution of functions can be skipped or not.


And it is not possible to predict in advance where and when you get hit
by that problem.

That's the biggest problem with statistics- and heuristics-based query planners in general, but this does seem to be a particularly difficult case.

Setting a cost on the function call that more accurately reflects how expensive it is so PostgreSQL will work harder to avoid calling it might help. See http://www.postgresql.org/docs/current/static/sql-createfunction.html .

--
Craig Ringer

Change cost for the functions in that case simple ignored by planner/executor.

I think it should be possible always delay execution functions/subqueries unrelated to order by list untill limit/offset were applied (even in the worst case that will provide same performance as today), and no heuristics need at all.


Hm, one more idea:  lets say I call the next sql query -
'SELECT ...,very_log_sure_toasted_field  FROM ... ORDER BY (something but not very_log_toasted_field) LIMIT N'
which will use sort as top node.

Is detoasting of very_log_sure_toasted_field will be performed after applying ORDER BY... LIMIT N or before it?

If  detoasting performed before applying order by/limit, than there exists large class of queries where delayed/lazy detoasting can be huge performance win.
If detoasting performed after applying order by/limit, than the same mechanics can be used to delay subquery/function execution.


PS: Yes I know good response to my complaints: 'patch welcome', but I only started study of postgresql source code and recovering my C coding skills.
Unfortunately, I don't think I will be ready to start hacking planner/executor code in short future (planner/executor is most complicated and easiest to break part of the postgresql code, that  is definitely not newbie task).

--
Maxim Boguk

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Next
From: Joost Kraaijeveld
Date:
Subject: PostgreSQL 9.1 : why is this query slow?