Re: Functionally dependent columns in SELECT DISTINCT - Mailing list pgsql-general

From Willow Chargin
Subject Re: Functionally dependent columns in SELECT DISTINCT
Date
Msg-id CAALRJs7PVbctiMzb9g5NrLET-jPEZpjToY1F-w8CmGbWid5J7g@mail.gmail.com
Whole thread Raw
In response to Re: Functionally dependent columns in SELECT DISTINCT  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Functionally dependent columns in SELECT DISTINCT
List pgsql-general
Thanks both for your suggestions so far.

On Fri, Sep 13, 2024 at 8:43 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Friday, September 13, 2024, Willow Chargin <postgresql@wchargin.com> wrote:
>>
>> In reality I really do want the ID columns of the
>> *most recent* items.
>
>
> Use a window function to rank them and pull out rank=1

Hmm, like this? noting that it's rank<=5, not rank=1:

    -- 1. rank all item-part combinations, densely since an item may
          have multiple parts
    -- 2. limit by rank, still retaining multiple copies of each item
    -- 3. de-duplicate IDs
    SELECT DISTINCT id FROM (
        SELECT id, dense_rank FROM (
            SELECT
                items.id,
                dense_rank() OVER (ORDER BY create_time DESC)
            FROM items JOIN parts ON items.id = parts.item_id
            WHERE part_id % 3 = 0
        ) q
        WHERE dense_rank <= 5
    ) q

I've done this before, but my experience is that it's usually far slower
because the rank is computed eagerly even for rows that don't match the
rank bound. And indeed here it takes 20% longer than even the slower
GROUP BY from before: https://explain.depesz.com/s/mQIi

> or use a lateral subquery to surgically (fetch first 1) retrieve the first row when sorted by recency descending.

I'm not sure that I see how to apply this when I need top-k, not top-1.



pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Functionally dependent columns in SELECT DISTINCT
Next
From: Rich Shepard
Date:
Subject: Re: DDL issue