Thread: Efficiently selecting single row from a select with window functions row_number, lag and lead
Efficiently selecting single row from a select with window functions row_number, lag and lead
From
Andrew Bailey
Date:
I would like to do the following:
select id, row_number() over w as rownum, lag(id, 1) over w as prev, lead(id, 1) over w as next from route where id=1350 window w as (order by shortname, id asc rows between 1 preceding and 1 following) order by shortname, id ;
select * from (select id, row_number() over w as rownum,
lag(id, 1) over w as prev, lead(id, 1) over w as next
from route window w as (order by shortname, id
rows between 1 preceding and 1 following) order by shortname, id) as s where id=1350
1350;3;1815;1813
select id, row_number() over w as rownum, lag(id, 1) over w as prev, lead(id, 1) over w as next from route where id=1350 window w as (order by shortname, id asc rows between 1 preceding and 1 following) order by shortname, id ;
However this gives the result
1350;1;;
The following query gives the result I am expecting
select * from (select id, row_number() over w as rownum,
lag(id, 1) over w as prev, lead(id, 1) over w as next
from route window w as (order by shortname, id
rows between 1 preceding and 1 following) order by shortname, id) as s where id=1350
1350;3;1815;1813
The explain plan is
"Subquery Scan on s (cost=0.14..15.29 rows=1 width=32)"
" Filter: (s.id = 1350)"
" -> WindowAgg (cost=0.14..13.51 rows=143 width=12)"
" -> Index Only Scan using route_idx on route (cost=0.14..10.29 rows=143 width=12)"
"Subquery Scan on s (cost=0.14..15.29 rows=1 width=32)"
" Filter: (s.id = 1350)"
" -> WindowAgg (cost=0.14..13.51 rows=143 width=12)"
" -> Index Only Scan using route_idx on route (cost=0.14..10.29 rows=143 width=12)"
as it makes use of the index created as follows
CREATE INDEX route_idx
ON route
USING btree
(shortname COLLATE pg_catalog."default", id);
CREATE INDEX route_idx
ON route
USING btree
(shortname COLLATE pg_catalog."default", id);
I believe that the index has all the data that is needed to obtain the results in a single query.
Is it possible to write the query as a single select and if so how?
Thanks in advance
Andrew Bailey
Re: Efficiently selecting single row from a select with window functions row_number, lag and lead
From
Jim Nasby
Date:
On 1/1/16 9:39 PM, Andrew Bailey wrote: > select id, row_number() over w as rownum, lag(id, 1) over w as prev, > lead(id, 1) over w as next from route where id=1350 window w as (order > by shortname, id asc rows between 1 preceding and 1 following) order by > shortname, id ; > > However this gives the result > 1350;1;; > > The following query gives the result I am expecting > > select * from (select id, row_number() over w as rownum, > lag(id, 1) over w as prev, lead(id, 1) over w as next > from route window w as (order by shortname, id > rows between 1 preceding and 1 following) order by shortname, id) as s > where id=1350 > > 1350;3;1815;1813 > > The explain plan is > "Subquery Scan on s (cost=0.14..15.29 rows=1 width=32)" > " Filter: (s.id <http://s.id> = 1350)" > " -> WindowAgg (cost=0.14..13.51 rows=143 width=12)" > " -> Index Only Scan using route_idx on route (cost=0.14..10.29 > rows=143 width=12)" I'm pretty sure the issue here is that the WHERE clause is limiting your result set before the window can find what you're looking for. You could probably switch the WHERE in your original query to a HAVING and get the same results. I'm not sure the filter can actually be pushed past the window functions to get the result you want. That Index Only Scan could still be pulling every row in the table. BTW, if you switch the order by to id, shortname then it might be able to use the index, but of course the results would be different. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: Efficiently selecting single row from a select with window functions row_number, lag and lead
From
David Rowley
Date:
On 2 January 2016 at 16:39, Andrew Bailey <hazlorealidad@gmail.com> wrote:
I would like to do the following:
select id, row_number() over w as rownum, lag(id, 1) over w as prev, lead(id, 1) over w as next from route where id=1350 window w as (order by shortname, id asc rows between 1 preceding and 1 following) order by shortname, id ;However this gives the result1350;1;;
This does not work due to the id=1350 is always applied before the rows make it into the window therefore you only have rows which match id=1350, which is not what you want in this case.
The following query gives the result I am expecting
select * from (select id, row_number() over w as rownum,
lag(id, 1) over w as prev, lead(id, 1) over w as next
from route window w as (order by shortname, id
rows between 1 preceding and 1 following) order by shortname, id) as s where id=1350
1350;3;1815;1813
This works because the id=1350 is not pushed down into the subquery which contain the windowing functions, this also means that the entire route table is processed and you may suffer from performance problems if the route table is, or gets big. You'll be able to confirm this by looking at the EXPLAIN output and noticing the lack of filter on the seqscan.
The explain plan is
"Subquery Scan on s (cost=0.14..15.29 rows=1 width=32)"
" Filter: (s.id = 1350)"
" -> WindowAgg (cost=0.14..13.51 rows=143 width=12)"
" -> Index Only Scan using route_idx on route (cost=0.14..10.29 rows=143 width=12)"as it makes use of the index created as follows
CREATE INDEX route_idx
ON route
USING btree
(shortname COLLATE pg_catalog."default", id);I believe that the index has all the data that is needed to obtain the results in a single query.Is it possible to write the query as a single select and if so how?
That should be much more efficient for a larger table as it should avoid the seqscan and allow the index to be used for all 3 numbers.
Re: Efficiently selecting single row from a select with window functions row_number, lag and lead
From
Vitaly Burovoy
Date:
On 1/1/16, Andrew Bailey <hazlorealidad@gmail.com> wrote: > I would like to do the following: > > select id, row_number() over w as rownum, lag(id, 1) over w as prev, > lead(id, 1) over w as next from route where id=1350 window w as (order by > shortname, id asc rows between 1 preceding and 1 following) order by > shortname, id ; > > However this gives the result > 1350;1;; > > The following query gives the result I am expecting > > select * from (select id, row_number() over w as rownum, > lag(id, 1) over w as prev, lead(id, 1) over w as next > from route window w as (order by shortname, id > rows between 1 preceding and 1 following) order by shortname, id) as s > where id=1350 > > 1350;3;1815;1813 > > The explain plan is > "Subquery Scan on s (cost=0.14..15.29 rows=1 width=32)" > " Filter: (s.id = 1350)" > " -> WindowAgg (cost=0.14..13.51 rows=143 width=12)" > " -> Index Only Scan using route_idx on route (cost=0.14..10.29 > rows=143 width=12)" > > as it makes use of the index created as follows > > CREATE INDEX route_idx > ON route > USING btree > (shortname COLLATE pg_catalog."default", id); > > I believe that the index has all the data that is needed to obtain the > results in a single query. > Is it possible to write the query as a single select and if so how? > > Thanks in advance > > Andrew Bailey > It works as expected. You can read it at http://www.postgresql.org/docs/current/static/tutorial-window.html --- The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. --- So your WHERE clause from the first query selects the only row to a "virtual table", and lead and lag works with the table contains the only one row and doesn't have any other before and after it. In the second query subselect selects ALL rows to the "virtual table", lead and lag fill values and WHERE in the external select gets a single row filled by subselect. It is also in the documentation: "If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select." Unfortunately it is impossible to give an access to window function to rows not selected by a current query. -- Best regards, Vitaly Burovoy