Re: hash aggregation - Mailing list pgsql-performance

From Sergey Konoplev
Subject Re: hash aggregation
Date
Msg-id CAL_0b1trANSSN2f3AhWGc1+DG96D2ig-n-ZEv3kLsvAiwNr2aA@mail.gmail.com
Whole thread Raw
In response to Re: hash aggregation  (Korisk <Korisk@yandex.ru>)
Responses Re: hash aggregation  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-performance
On Thu, Oct 11, 2012 at 9:14 PM, Korisk <Korisk@yandex.ru> wrote:
> Strange situation.
> After indexscan enabling the cost is seriously decreased.

AFAIK when the planner has to choose between index scans and seq scans
and both of this options are off it uses one of this strategies anyway
but puts 10000000000.00 as a lower cost for this (thanks Maxim Boguk
for the explanation in chat).

>    ->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  (cost=10000000000.00..10000466660.96
rows=25990002width=32) (act 
> ual time=0.166..3698.776 rows=25990002 loops=1)

So when you enabled one of these options it started using it as usual.

> hashes=# set enable_indexscan=on;
> SET
> hashes=# explain  analyse verbose select name, count(name) as cnt from  hashcheck group by name order by name desc;

[cut]

>    ->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  (cost=0.00..466660.96 rows=25990002
width=32)(actual time=0.129. 
> .3653.848 rows=25990002 loops=1)

What I can not understand is why the seq scan's estimated cost is
better the index scan's one. It depends on the number of pages in
index/relation. May be the index is heavily bloated?

Let's see the sizes:

select pg_total_relation_size('hashcheck')
select pg_total_relation_size('hashcheck_name_rev_idx');


> hashes=# set enable_seqscan=on;
> SET
> hashes=# explain  analyse verbose select name, count(name) as cnt from  hashcheck group by name order by name desc;
>                                                                   QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------------
> -----
>  Sort  (cost=565411.67..565412.17 rows=200 width=32) (actual time=21746.799..21747.026 rows=4001 loops=1)
>    Output: name, (count(name))
>    Sort Key: hashcheck.name
>    Sort Method: quicksort  Memory: 315kB
>    ->  HashAggregate  (cost=565402.03..565404.03 rows=200 width=32) (actual time=21731.551..21733.277 rows=4001
loops=1)
>          Output: name, count(name)
>          ->  Seq Scan on public.hashcheck  (cost=0.00..435452.02 rows=25990002 width=32) (actual
time=29.431..13383.812rows=25990002 loop 
> s=1)
>                Output: id, name, value
>  Total runtime: 21747.356 ms
> (9 rows)
>
>
>
>
>



--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984


pgsql-performance by date:

Previous
From: Ondrej Ivanič
Date:
Subject: Re: hash aggregation
Next
From: Korisk
Date:
Subject: Re: hash aggregation