Thread: Surprised by index choice for count(*)

Surprised by index choice for count(*)

From

Rob Sargent

Date:

01 May 2018, 18:46:51

Should I be?  I would have thought the pk would have been chosen v. 
function index?

explain analyse select count(*) from bc.segment s;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Finalize Aggregate  (cost=4428009.24..4428009.25 rows=1 width=8) 
(actual time=14786.395..14786.395 rows=1 loops=1)
    ->  Gather  (cost=4428009.03..4428009.24 rows=2 width=8) (actual 
time=14786.358..14786.386 rows=3 loops=1)
          Workers Planned: 2
          Workers Launched: 2
          ->  Partial Aggregate  (cost=4427009.03..4427009.04 rows=1 
width=8) (actual time=14782.167..14782.167 rows=1 loops=3)
                ->  Parallel Index Only Scan using fpv on segment s  
(cost=0.57..4210177.14 rows=86732753 width=0) (actual 
time=0.061..11352.855 rows=69386204 loops=3)
                      Heap Fetches: 1780
  Planning time: 0.221 ms
  Execution time: 14815.939 ms
(9 rows)

\d bc.segment
               Table "bc.segment"
      Column     |  Type   |     Modifiers
----------------+---------+--------------------
  id             | uuid    | not null
  chrom          | integer | not null
  markerset_id   | uuid    | not null
  probandset_id  | uuid    | not null
  startbase      | integer | not null
  endbase        | integer | not null
  firstmarker    | integer | not null
  lastmarker     | integer | not null
  events_less    | bigint  | not null default 0
  events_equal   | bigint  | not null default 0
  events_greater | bigint  | not null default 0
Indexes:
     "segment_pkey" PRIMARY KEY, btree (id)
     "useg" UNIQUE, btree (probandset_id, chrom, startbase, endbase)
     "fpv" btree (pv(events_less, events_equal, events_greater, 0))
     "segment_markerset_id_probandset_id_idx" btree (markerset_id, 
probandset_id)

create or replace function public.pv(l bigint, e bigint, g bigint, o int)
returns float
as
$$
select 1.0*(g+e+o)/(l+e+g+o)::float;
$$
language sql
;

Re: Surprised by index choice for count(*)

From

"David G. Johnston"

Date:

01 May 2018, 19:07:49

On Tue, May 1, 2018 at 8:46 AM, Rob Sargent <robjsargent@gmail.com> wrote:

Should I be? I would have thought the pk would have been chosen v. function index?
Indexes:
    "segment_pkey" PRIMARY KEY, btree (id)
    "useg" UNIQUE, btree (probandset_id, chrom, startbase, endbase)
    "fpv" btree (pv(events_less, events_equal, events_greater, 0))
    "segment_markerset_id_probandset_id_idx" btree (markerset_id, probandset_id)

Haven't looked at the code but among those 4 indexes the one that was chosen is the one that comes first alphabetically. Ideally it would avoid multi-column indexes since they are larger; and I believe that a float is smaller than a uuid so not only alphabetically but the fpv index wins on size too.

That the index is functional is immaterial here.

David J.

Re: Surprised by index choice for count(*)

From

Tom Lane

Date:

01 May 2018, 19:11:12

Rob Sargent <robjsargent@gmail.com> writes:
> Should I be?  I would have thought the pk would have been chosen v. 
> function index?

If I'm reading this correctly, the PK index contains uuids while
the fpv index contains float4s, meaning the latter is probably half
the size.  So scanning it is a lot cheaper, at least according to
the planner's cost model.

            regards, tom lane

Re: Surprised by index choice for count(*)

From

Rob Sargent

Date:

01 May 2018, 19:13:20

Thank you both.  Simple, as expected.  And I’m easily surprised.
Version 10 (perhaps obviously) for those scoring at home.

> On May 1, 2018, at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Rob Sargent <robjsargent@gmail.com> writes:
>> Should I be?  I would have thought the pk would have been chosen v.
>> function index?
>
> If I'm reading this correctly, the PK index contains uuids while
> the fpv index contains float4s, meaning the latter is probably half
> the size.  So scanning it is a lot cheaper, at least according to
> the planner's cost model.
>
>             regards, tom lane