Thread: nth_value out of more than n values returns null

nth_value out of more than n values returns null

From
Guyren Howe
Date:
This query:

SELECT NTH_VALUE(id, 5000000) OVER (ORDER BY created_at, id ASC) FROM table

in a table where SELECT COUNT(*) returns a value a few thousand over 5 million, where id is the primary key, returns null.

The inclusion of the primary key should make the order by a total order. So there should be a 5 millionth row.

How can this happen?

Re: nth_value out of more than n values returns null

From
Adrian Klaver
Date:
On 11/4/24 15:17, Guyren Howe wrote:
> This query:
> 
> SELECT NTH_VALUE(id, 5000000) OVER (ORDER BY created_at, id ASC) FROM table
> 
> in a table where SELECT COUNT(*) returns a value a few thousand over 5 
> million, where id is the primary key, returns null.
> 
> The inclusion of the primary key should make the order by a total order. 
> So there should be a 5 millionth row.
> 
> How can this happen?

If I where to hazard a guess it has to do with this:

https://www.postgresql.org/docs/current/functions-window.html

"Note that first_value, last_value, and nth_value consider only the rows 
within the “window frame”, which by default contains the rows from the 
start of the partition through the last peer of the current row. This is 
likely to give unhelpful results for last_value and sometimes also 
nth_value. You can redefine the frame by adding a suitable frame 
specification (RANGE, ROWS or GROUPS) to the OVER clause. See Section 
4.2.8 for more information about frame specifications."

Further I am pretty sure that this ORDER BY created_at, id ASC is 
creating a window frame over created_at, id and that there are duplicate 
created_at values which means the frame has less then 5000000 rows. Try 
a smaller number and see what happens.

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: nth_value out of more than n values returns null

From
Erik Wienhold
Date:
On 2024-11-05 00:17 +0100, Guyren Howe wrote:
> This query:
> 
> SELECT NTH_VALUE(id, 5000000) OVER (ORDER BY created_at, id ASC) FROM table
> 
> in a table where SELECT COUNT(*) returns a value a few thousand over 5
> million, where id is the primary key, returns null.
> 
> The inclusion of the primary key should make the order by a total
> order. So there should be a 5 millionth row.
> 
> How can this happen?

https://www.postgresql.org/docs/current/functions-window.html explains
it:

    Note that first_value, last_value, and nth_value consider only the rows
    within the "window frame", which by default contains the rows from the
    start of the partition through the last peer of the current row. This is
    likely to give unhelpful results for last_value and sometimes also
    nth_value. You can redefine the frame by adding a suitable frame
    specification (RANGE, ROWS or GROUPS) to the OVER clause. See Section
    4.2.8 for more information about frame specifications.

You probably want to extend the window frame with this:

    SELECT NTH_VALUE(id, 5000000) OVER (
        ORDER BY created_at, id ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) FROM table

-- 
Erik



Re: nth_value out of more than n values returns null

From
Guyren Howe
Date:
I’m trying to get the id of the 5,000,000th record, so I can join against it to get a name. I didn’t fully understand what the docs say there. What am I missing?
On 4 Nov 2024 at 15:36 -0800, Erik Wienhold <ewie@ewie.name>, wrote:
On 2024-11-05 00:17 +0100, Guyren Howe wrote:
This query:

SELECT NTH_VALUE(id, 5000000) OVER (ORDER BY created_at, id ASC) FROM table

in a table where SELECT COUNT(*) returns a value a few thousand over 5
million, where id is the primary key, returns null.

The inclusion of the primary key should make the order by a total
order. So there should be a 5 millionth row.

How can this happen?

https://www.postgresql.org/docs/current/functions-window.html explains
it:

Note that first_value, last_value, and nth_value consider only the rows
within the "window frame", which by default contains the rows from the
start of the partition through the last peer of the current row. This is
likely to give unhelpful results for last_value and sometimes also
nth_value. You can redefine the frame by adding a suitable frame
specification (RANGE, ROWS or GROUPS) to the OVER clause. See Section
4.2.8 for more information about frame specifications.

You probably want to extend the window frame with this:

SELECT NTH_VALUE(id, 5000000) OVER (
ORDER BY created_at, id ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) FROM table

--
Erik

Re: nth_value out of more than n values returns null

From
"David G. Johnston"
Date:
On Monday, November 4, 2024, Guyren Howe <guyren@gmail.com> wrote:
I’m trying to get the id of the 5,000,000th record, so I can join against it to get a name. I didn’t fully understand what the docs say there. What am I missing?

So just use “offset 5_000_000 limit 1”.  Bringing in a window function here seems unhelpful.

David J.

Re: nth_value out of more than n values returns null

From
Guyren Howe
Date:
Wouldn’t it be offset 4_999_999?

I’d still like to understand why nth_value doesn’t work.
On 4 Nov 2024 at 15:44 -0800, David G. Johnston <david.g.johnston@gmail.com>, wrote:
On Monday, November 4, 2024, Guyren Howe <guyren@gmail.com> wrote:
I’m trying to get the id of the 5,000,000th record, so I can join against it to get a name. I didn’t fully understand what the docs say there. What am I missing?

So just use “offset 5_000_000 limit 1”.  Bringing in a window function here seems unhelpful.

David J.

Re: nth_value out of more than n values returns null

From
"David G. Johnston"
Date:
On Monday, November 4, 2024, Guyren Howe <guyren@gmail.com> wrote:
Wouldn’t it be offset 4_999_999?

Probably.  I tend to expect off-by-one for these kinds of things and test my way out.
 

I’d still like to understand why nth_value doesn’t work.

When you perform an order by in a window clause the frame you get by default ends at the current row.  Consider “count(*) over ()” versus “count(*) over (order by id)”.

You need to not use defaults for the window frame if this doesn’t suit you.

David J.

Re: nth_value out of more than n values returns null

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> So just use “offset 5_000_000 limit 1”.  Bringing in a window function here
> seems unhelpful.

Yeah, that.  A bite-size example might help clarify what the window
function is doing:

regression=# create table zed(f1 int) ;
CREATE TABLE
regression=# insert into zed select generate_series(1, 10);
INSERT 0 10
regression=# select f1, nth_value(f1, 5) over (order by f1) from zed;
 f1 | nth_value
----+-----------
  1 |
  2 |
  3 |
  4 |
  5 |         5
  6 |         5
  7 |         5
  8 |         5
  9 |         5
 10 |         5
(10 rows)

For the first four rows, the window frame doesn't include the row
you want, so you get NULL.  You can fix that with a non-default
window frame:

regression=# select f1, nth_value(f1, 5) over (order by f1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
fromzed; 
 f1 | nth_value
----+-----------
  1 |         5
  2 |         5
  3 |         5
  4 |         5
  5 |         5
  6 |         5
  7 |         5
  8 |         5
  9 |         5
 10 |         5
(10 rows)

So yeah, you can get the fifth (or five million'th) row this way, but
you'll get N copies of it, which I assume is not what you want.
Better

regression=# select f1 from zed order by f1 offset 4 limit 1;
 f1
----
  5
(1 row)

which gets you just the one row and is a lot cheaper too.

            regards, tom lane