On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haramrae@gmail.com> wrote:
Hi all,
It's been a while since I actually got to use PG for anything serious, but we're finally doing some experimentation @work now to see if it is suitable for our datawarehouse. So far it's been doing well, but there is a particular type of query I run into that I expect we will frequently use and that's choosing a sequential scan - and I can't fathom why.
This is on:
The query in question is: select "VBAK_MANDT", max("VBAK_VBELN") from staging.etl00001_vbak group by "VBAK_MANDT";
This is the header-table for another detail table, and in this case we're already seeing a seqscan. The thing is, there are 15M rows in the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT", "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact, we only have 1 at the moment!).
You need an "index skip-scan" or "loose index scan". PostgreSQL doesn't currently detect and implement them automatically, but you can use a recursive CTE to get it to work. There are some examples at https://wiki.postgresql.org/wiki/Loose_indexscan