Thread: Partial index plan/cardinality costing

Partial index plan/cardinality costing

From
James Coleman
Date:
I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
  * 80% of the data has action_at between the current date and 1 year ago
     and status of E or C
  * 20% of the data has action_at between 5 days ago and 25 days into the
     future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
  s.status IN ('A', 'PD', 'P')
  AND (action_at <= '2018-09-06')
  AND s.status IN ('A', 'P')
  AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over s_pk_action_at. And on stock Postgres it does in fact use that index (with a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that change Postgres strongly prefers the index s_pk_action_at unless I both disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan = false;  only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest cardinality (primary key) column is first, wouldn't an index scan effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the s_action_at_pk index over the s_pk_action_at index for this query? I realize changing the random page cost is part of what's driving this, but I still can't imagine reading the full s_pk_action_at index (assuming that's what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that happens (I realize you wouldn't write this by hand, but ORMs) when you have a where clause like:
    s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
    s.status IN ('A', 'P')
even though semantically those are identical.


Attachment

Re: Partial index plan/cardinality costing

From
James Coleman
Date:
Bump, and curious if anyone on hackers has any ideas here: of particular interest is why the (pk, created_at) index can possibly be more valuable than the (created_at, pk) variant since the former effectively implies having to scan the entire index.
On Fri, Sep 7, 2018 at 12:17 PM James Coleman <jtc331@gmail.com> wrote:
I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
  * 80% of the data has action_at between the current date and 1 year ago
     and status of E or C
  * 20% of the data has action_at between 5 days ago and 25 days into the
     future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
  s.status IN ('A', 'PD', 'P')
  AND (action_at <= '2018-09-06')
  AND s.status IN ('A', 'P')
  AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over s_pk_action_at. And on stock Postgres it does in fact use that index (with a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that change Postgres strongly prefers the index s_pk_action_at unless I both disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan = false;  only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest cardinality (primary key) column is first, wouldn't an index scan effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the s_action_at_pk index over the s_pk_action_at index for this query? I realize changing the random page cost is part of what's driving this, but I still can't imagine reading the full s_pk_action_at index (assuming that's what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that happens (I realize you wouldn't write this by hand, but ORMs) when you have a where clause like:
    s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
    s.status IN ('A', 'P')
even though semantically those are identical.


Re: Partial index plan/cardinality costing

From
James Coleman
Date:
Bump, and curious if anyone on hackers has any ideas here: of particular interest is why the (pk, created_at) index can possibly be more valuable than the (created_at, pk) variant since the former effectively implies having to scan the entire index.
On Fri, Sep 7, 2018 at 12:17 PM James Coleman <jtc331@gmail.com> wrote:
I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
  * 80% of the data has action_at between the current date and 1 year ago
     and status of E or C
  * 20% of the data has action_at between 5 days ago and 25 days into the
     future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
  s.status IN ('A', 'PD', 'P')
  AND (action_at <= '2018-09-06')
  AND s.status IN ('A', 'P')
  AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over s_pk_action_at. And on stock Postgres it does in fact use that index (with a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that change Postgres strongly prefers the index s_pk_action_at unless I both disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan = false;  only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest cardinality (primary key) column is first, wouldn't an index scan effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the s_action_at_pk index over the s_pk_action_at index for this query? I realize changing the random page cost is part of what's driving this, but I still can't imagine reading the full s_pk_action_at index (assuming that's what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that happens (I realize you wouldn't write this by hand, but ORMs) when you have a where clause like:
    s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
    s.status IN ('A', 'P')
even though semantically those are identical.


Re: Partial index plan/cardinality costing

From
James Coleman
Date:
Bump, and curious if anyone on hackers has any ideas here: of particular interest is why the (pk, created_at) index can possibly be more valuable than the (created_at, pk) variant since the former effectively implies having to scan the entire index.
On Fri, Sep 7, 2018 at 12:17 PM James Coleman <jtc331@gmail.com> wrote:
I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
  * 80% of the data has action_at between the current date and 1 year ago
     and status of E or C
  * 20% of the data has action_at between 5 days ago and 25 days into the
     future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
  s.status IN ('A', 'PD', 'P')
  AND (action_at <= '2018-09-06')
  AND s.status IN ('A', 'P')
  AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over s_pk_action_at. And on stock Postgres it does in fact use that index (with a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that change Postgres strongly prefers the index s_pk_action_at unless I both disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan = false;  only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest cardinality (primary key) column is first, wouldn't an index scan effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the s_action_at_pk index over the s_pk_action_at index for this query? I realize changing the random page cost is part of what's driving this, but I still can't imagine reading the full s_pk_action_at index (assuming that's what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that happens (I realize you wouldn't write this by hand, but ORMs) when you have a where clause like:
    s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
    s.status IN ('A', 'P')
even though semantically those are identical.


Re: Partial index plan/cardinality costing

From
James Coleman
Date:
Bump, and curious if anyone on hackers has any ideas here: of particular interest is why the (pk, created_at) index can possibly be more valuable than the (created_at, pk) variant since the former effectively implies having to scan the entire index.
On Fri, Sep 7, 2018 at 12:17 PM James Coleman <jtc331@gmail.com> wrote:
I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
  * 80% of the data has action_at between the current date and 1 year ago
     and status of E or C
  * 20% of the data has action_at between 5 days ago and 25 days into the
     future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
  s.status IN ('A', 'PD', 'P')
  AND (action_at <= '2018-09-06')
  AND s.status IN ('A', 'P')
  AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over s_pk_action_at. And on stock Postgres it does in fact use that index (with a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that change Postgres strongly prefers the index s_pk_action_at unless I both disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan = false;  only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest cardinality (primary key) column is first, wouldn't an index scan effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the s_action_at_pk index over the s_pk_action_at index for this query? I realize changing the random page cost is part of what's driving this, but I still can't imagine reading the full s_pk_action_at index (assuming that's what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that happens (I realize you wouldn't write this by hand, but ORMs) when you have a where clause like:
    s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
    s.status IN ('A', 'P')
even though semantically those are identical.


Re: Partial index plan/cardinality costing

From
Justin Pryzby
Date:
Please don't cross-post to lists.

>insert into s(status, action_at, m_fk)
>select
>  ( CASE WHEN series.n % 100 < 80 THEN
>      (ARRAY['E', 'C'])[(series.n % 2) + 1]
>    ELSE
>      (ARRAY['P', 'PD', 'A'])[((random() * 3)::integer % 3) + 1]
>    END
>  ),
>  (
>    CASE WHEN series.n % 100 < 80 THEN
>      '2018-09-07'::date + ((series.n % 365 - 365)::text || ' day')::interval
>    ELSE
>      '2018-09-07'::date + (((random() * 30)::integer % 30 - 4)::text || ' day')::interval
>    END
>  ),
>  (select m.pk from m limit 1)
>from generate_series(1, 500000) series(n);

> I have two partial indexes:
> - s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
> - s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

> - How is s_pk_action_at ever efficient to scan? Given that the highest
> cardinality (primary key) column is first, wouldn't an index scan
> effectively have to scan the entire index?

The index probably IS inefficient to scan (you could see that if you force an
bitmap index scan on s_pk_action_at)...but because of leading pkey column, the
HEAP is read sequentially, and the planner knows that the heap will be read in
order of its leading column.  Reading the entire index is less expensive than
reading most of the table (maybe nonsequentially).  This is the 2nd effect Jeff
Janes likes to point out: high correlation means 1) sequential reads; *and*, 2)
a smaller fraction of the table needs to be accessed to read a given number of
tuples.

> - Why does index scan on s_action_at_pk reads over 2x as many blocks as the
> bitmap heap scan with the same index?

Maybe because of heap pages accessed multiple times (not sequentially), since
correlation is small on this table loaded with "modulus"-style insertions.

pryzbyj=# SELECT attname, correlation FROM pg_stats WHERE tablename='s' ;
  attname  | correlation 
-----------+-------------
 pk        |           1
 status    |    0.340651
 action_at |  0.00224239
 m_fk      |           1

..so each index tuple is accessing a separate heap page.

If you create non-partial index and CLUSTER on action_at_idx, then:

pryzbyj=# SELECT attname, correlation FROM pg_stats WHERE tablename='s' ;
  attname  | correlation
-----------+-------------
 pk        |  0.00354867
 status    |    0.420806
                                            action_at |           1
 
 m_fk      |           1

 Nested Loop  (cost=1907.03..6780.65 rows=11038 width=8) (actual time=2.241..17.839 rows=8922 loops=1)
   Join Filter: (s.m_fk = m.pk)
   Buffers: shared hit=115 read=53
   ->  Seq Scan on m  (cost=0.00..1.01 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=1)
         Filter: (status = 'A'::text)
         Buffers: shared hit=1
   ->  Bitmap Heap Scan on s  (cost=1907.03..6641.66 rows=11038 width=16) (actual time=2.222..9.032 rows=8922 loops=1)
         Recheck Cond: ((action_at <= '2018-09-06'::date) AND (status = ANY ('{P,PD,A}'::text[])))
         Filter: (status = ANY ('{A,P}'::text[]))
         Rows Removed by Filter: 4313
         Heap Blocks: exact=114
         Buffers: shared hit=114 read=53
         ->  Bitmap Index Scan on s_action_at_pk  (cost=0.00..1904.27 rows=82647 width=0) (actual time=2.185..2.186
rows=13235loops=1)
 
               Index Cond: (action_at <= '2018-09-06'::date)
               Buffers: shared read=53

Also, I don't think it matters here, but action_at and status are correlated.
Planner would think that they're independent.

I don't think it's related to other issues, but also note the rowcount estimate is off:
         ->  Bitmap Index Scan on s_action_at_pk  (cost=0.00..1258.02 rows=82347 width=0) (actual time=1.026..1.026
rows=13402loops=1)                             
 
               Index Cond: (action_at <= '2018-09-06'::date)
                                           
 

Justin



Re: Partial index plan/cardinality costing

From
Justin Pryzby
Date:
Please don't cross-post to lists.

>insert into s(status, action_at, m_fk)
>select
>  ( CASE WHEN series.n % 100 < 80 THEN
>      (ARRAY['E', 'C'])[(series.n % 2) + 1]
>    ELSE
>      (ARRAY['P', 'PD', 'A'])[((random() * 3)::integer % 3) + 1]
>    END
>  ),
>  (
>    CASE WHEN series.n % 100 < 80 THEN
>      '2018-09-07'::date + ((series.n % 365 - 365)::text || ' day')::interval
>    ELSE
>      '2018-09-07'::date + (((random() * 30)::integer % 30 - 4)::text || ' day')::interval
>    END
>  ),
>  (select m.pk from m limit 1)
>from generate_series(1, 500000) series(n);

> I have two partial indexes:
> - s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
> - s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

> - How is s_pk_action_at ever efficient to scan? Given that the highest
> cardinality (primary key) column is first, wouldn't an index scan
> effectively have to scan the entire index?

The index probably IS inefficient to scan (you could see that if you force an
bitmap index scan on s_pk_action_at)...but because of leading pkey column, the
HEAP is read sequentially, and the planner knows that the heap will be read in
order of its leading column.  Reading the entire index is less expensive than
reading most of the table (maybe nonsequentially).  This is the 2nd effect Jeff
Janes likes to point out: high correlation means 1) sequential reads; *and*, 2)
a smaller fraction of the table needs to be accessed to read a given number of
tuples.

> - Why does index scan on s_action_at_pk reads over 2x as many blocks as the
> bitmap heap scan with the same index?

Maybe because of heap pages accessed multiple times (not sequentially), since
correlation is small on this table loaded with "modulus"-style insertions.

pryzbyj=# SELECT attname, correlation FROM pg_stats WHERE tablename='s' ;
  attname  | correlation 
-----------+-------------
 pk        |           1
 status    |    0.340651
 action_at |  0.00224239
 m_fk      |           1

..so each index tuple is accessing a separate heap page.

If you create non-partial index and CLUSTER on action_at_idx, then:

pryzbyj=# SELECT attname, correlation FROM pg_stats WHERE tablename='s' ;
  attname  | correlation
-----------+-------------
 pk        |  0.00354867
 status    |    0.420806
                                            action_at |           1
 
 m_fk      |           1

 Nested Loop  (cost=1907.03..6780.65 rows=11038 width=8) (actual time=2.241..17.839 rows=8922 loops=1)
   Join Filter: (s.m_fk = m.pk)
   Buffers: shared hit=115 read=53
   ->  Seq Scan on m  (cost=0.00..1.01 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=1)
         Filter: (status = 'A'::text)
         Buffers: shared hit=1
   ->  Bitmap Heap Scan on s  (cost=1907.03..6641.66 rows=11038 width=16) (actual time=2.222..9.032 rows=8922 loops=1)
         Recheck Cond: ((action_at <= '2018-09-06'::date) AND (status = ANY ('{P,PD,A}'::text[])))
         Filter: (status = ANY ('{A,P}'::text[]))
         Rows Removed by Filter: 4313
         Heap Blocks: exact=114
         Buffers: shared hit=114 read=53
         ->  Bitmap Index Scan on s_action_at_pk  (cost=0.00..1904.27 rows=82647 width=0) (actual time=2.185..2.186
rows=13235loops=1)
 
               Index Cond: (action_at <= '2018-09-06'::date)
               Buffers: shared read=53

Also, I don't think it matters here, but action_at and status are correlated.
Planner would think that they're independent.

I don't think it's related to other issues, but also note the rowcount estimate is off:
         ->  Bitmap Index Scan on s_action_at_pk  (cost=0.00..1258.02 rows=82347 width=0) (actual time=1.026..1.026
rows=13402loops=1)                             
 
               Index Cond: (action_at <= '2018-09-06'::date)
                                           
 

Justin