Planing edge case for sorts with limit on non null column - Mailing list pgsql-hackers

From Mayrom Rabinovich
Subject Planing edge case for sorts with limit on non null column
Date
Msg-id CAH-Ro_22oaP1B8oYsaiukY3QzC3yDueB-rGGG_Rtb-QtG+hb+w@mail.gmail.com
Whole thread Raw
Responses Re: Planing edge case for sorts with limit on non null column
List pgsql-hackers
Hi,
I am not very familiar with mailing lists so forgive me if I am committing some sort of cardinal sin.
I found a weird edge case within this simple query:

```
-- setup table with an non null column and index on it
create table t(i serial primary key);

-- query by the reverse order of the index
explain select * from t order by i desc limit 1;
-- works as expected with the following plan:
-- "Limit  (cost=0.15..0.19 rows=1 width=4)"
-- "  ->  Index Only Scan Backward using t_pkey on t  (cost=0.15..82.41 rows=2550 width=4)"

-- same deal query by the reverse order of the index, but also specify the wrong null order
-- from my understanding this should not matter because we don't have any nulls on the table
-- due to the constraint.
explain select * from t order by i desc nulls last limit 1;
-- here is the issue, when I ran the following query I get this plan:
-- "Limit  (cost=48.25..48.25 rows=1 width=4)"
-- "  ->  Sort  (cost=48.25..54.63 rows=2550 width=4)"
-- "        Sort Key: i DESC NULLS LAST"
-- "        ->  Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)"
```

It seems that the planner ignores the fact that the column does not contain nulls, and looks for a match between order of the index nulls and the order of the nulls specified in the query, even though the nulls order is irrelevant in this case.

I think that patching `build_index_pathkeys` would lead to the smallest amount of changes, my concern with the patch is the fact that `list_member_ptr` iterates over all of the pathkeys in the planner info.

I did this weird step creating an alternative pathkey and testing if its relevant because the call sites to that function does some sort of deduplication of useless pathkey and when I tried adding both directions of `nulls_first` to the `retval` the last one was deduplicated (or at least that is what I think that is happening).

I am not very familiar with the Postgres codebase but I hacked a simple patch that from my testing, fixes the issue. But I don't know if it's the correct place to apply that sort of logic, and I haven't written any tests yet. The patch is very much work in progress, it's basically a toy example.

I would like to contribute if possible but I wanted to hear your opinion before digging further into it.

Thanks,
Mayrom Rabinovich
Attachment

pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Fix pg_stat_get_backend_wait_event() for aux processes
Next
From: Jim Jones
Date:
Subject: Re: Truncate logs by max_log_size