Home > mailing lists

Re: Faster distinct query? - Mailing list pgsql-general

From	Ryan Booz
Subject	Re: Faster distinct query?
Date	September 23, 2021 15:34:54
Msg-id	CADyMnEzz2QQJu1J9tDCO36HSrsGCpa1o3Lg1wDQCW_i-ShRt9Q@mail.gmail.com Whole thread Raw
In response to	Re: Faster distinct query? (hubert depesz lubaczewski <depesz@depesz.com>)
Responses	Re: Faster distinct query? (Israel Brewster <ijbrewster@alaska.edu>)
List	pgsql-general

Tree view

Heh, I honestly forgot about the recursive CTE. Certainly worth a try and wouldn't require installing other extensions.

This is what depesz is referring to: https://wiki.postgresql.org/wiki/Loose_indexscan

On Thu, Sep 23, 2021 at 3:04 AM hubert depesz lubaczewski <depesz@depesz.com> wrote:

On Wed, Sep 22, 2021 at 12:05:22PM -0800, Israel Brewster wrote:
> I was wondering if there was any way to improve the performance of this query:
>
> SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;
>
> The explain execution plan can be found here:
> https://explain.depesz.com/s/mtxB#html <https://explain.depesz.com/s/mtxB#html>
>
> and it looks pretty straight forward. It does an index_only scan, followed by an aggregate, to produce a result that is a list of stations along with a list of channels associated with each (there can be anywhere from 1 to 3 channels associated with each station). This query takes around 5 minutes to run.
>
> To work around the issue, I created a materialized view that I can update periodically, and of course I can query said view in no time flat. However, I’m concerned that as the dataset grows, the time it takes to refresh the view will also grow (correct me if I am wrong there).
>
> This is running PostgreSQL 13, and the index referenced is a two-column index on data(station, channel)

It looks that there is ~ 170 stations, and ~ 800 million rows int he
table.

can you tell us how many rows has this:

select distinct station, channel from data;

If this is not huge, then you can make the query run much faster using
skip scan - recursive cte.

Best regards,

depesz

pgsql-general by date:

From: Tobias Meyer
Date: 23 September 2021, 13:49:54
Subject: Re: Remove duplicated row in pg_largeobject_metadata

From: "Clive Swan"
Date: 23 September 2021, 16:37:11
Subject: RE: Get COUNT results from two different columns

Re: Faster distinct query? - Mailing list pgsql-general

Previous

Next