Home > mailing lists

Re: Faster distinct query? - Mailing list pgsql-general

From	David G. Johnston
Subject	Re: Faster distinct query?
Date	September 22, 2021 20:26:55
Msg-id	CAKFQuwYWY-2ohhQ_K0Lcr1V1_aT5=Qvk8oeScmMWm54rZKynUg@mail.gmail.com Whole thread Raw
In response to	Re: Faster distinct query? (Michael Lewis <mlewis@entrata.com>)
Responses	Re: Faster distinct query? Re: Faster distinct query? Re: Faster distinct query?
List	pgsql-general

Tree view

On Wed, Sep 22, 2021 at 1:21 PM Michael Lewis <mlewis@entrata.com> wrote:

In the future, please share the plan returned by explain analyze, and some data about how many rows in the involved tables,

I believe we consider it acceptable to link to an explain viewer, which is what the OP did. Reading explain output in email has its own challenges, and I'd rather have the website than a text attachment.

How does the below work? It should do a very simple index scan only, then aggregate the relative few rows after the fact.

select station, array_agg(distinct(channel)) as channels
FROM(
SELECT station,channel FROM data GROUP BY station,channel
) AS sub
group by station;

Yeah, am pondering this too, though seems like the queries should be identical so the plan/execution should be the same either way.

If there is correlation between station & channel, then you might look at creating a multivariate statistics object and analyzing the table so the planner can make better choices

There is no where clause so I'm doubtful there is much to be gained going down this path. The Index-Only scan seems like an optimal way to obtain this data and the existing query already does that. The aggregation path might vary though it seems like that shouldn't be the case here.

David J.

pgsql-general by date:

From: Michael Lewis
Date: 22 September 2021, 20:20:46
Subject: Re: Faster distinct query?

From: Israel Brewster
Date: 22 September 2021, 20:41:15
Subject: Re: Faster distinct query?

Re: Faster distinct query? - Mailing list pgsql-general

Previous

Next