Re: [GENERAL] Performance with high correlation in group by on PK - Mailing list pgsql-general

From Jeff Janes
Subject Re: [GENERAL] Performance with high correlation in group by on PK
Date
Msg-id CAMkU=1z_RZVMvXpag3NuA+mZS+1BPLjrZTrODAwvLMxit41E6w@mail.gmail.com
Whole thread Raw
In response to [GENERAL] Performance with high correlation in group by on PK  (Alban Hertroys <haramrae@gmail.com>)
Responses Re: [GENERAL] Performance with high correlation in group by on PK  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haramrae@gmail.com> wrote:
Hi all,

It's been a while since I actually got to use PG for anything serious,
but we're finally doing some experimentation @work now to see if it is
suitable for our datawarehouse. So far it's been doing well, but there
is a particular type of query I run into that I expect we will
frequently use and that's choosing a sequential scan - and I can't
fathom why.

This is on:


The query in question is:
select "VBAK_MANDT", max("VBAK_VBELN")
  from staging.etl00001_vbak
 group by "VBAK_MANDT";

This is the header-table for another detail table, and in this case
we're already seeing a seqscan. The thing is, there are 15M rows in
the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
"VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
we only have 1 at the moment!).

You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't currently detect and implement them automatically, but you can use a recursive CTE to get it to work.  There are some examples at https://wiki.postgresql.org/wiki/Loose_indexscan

Cheers,

Jeff

pgsql-general by date:

Previous
From: Gabriel Furstenheim Milerud
Date:
Subject: Re: [GENERAL] Extension coverage
Next
From: Jerry Regan
Date:
Subject: [GENERAL] Create Action for psql when NOTIFY Recieved