Home > mailing lists

Re: hash aggregation - Mailing list pgsql-performance

From	Sergey Konoplev
Subject	Re: hash aggregation
Date	October 12, 2012 07:10:30
Msg-id	CAL_0b1trANSSN2f3AhWGc1+DG96D2ig-n-ZEv3kLsvAiwNr2aA@mail.gmail.com Whole thread
In response to	Re: hash aggregation (Korisk <Korisk@yandex.ru>)
Responses	Re: hash aggregation
List	pgsql-performance

Tree view

On Thu, Oct 11, 2012 at 9:14 PM, Korisk <Korisk@yandex.ru> wrote:
> Strange situation.
> After indexscan enabling the cost is seriously decreased.

AFAIK when the planner has to choose between index scans and seq scans
and both of this options are off it uses one of this strategies anyway
but puts 10000000000.00 as a lower cost for this (thanks Maxim Boguk
for the explanation in chat).

>    ->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  (cost=10000000000.00..10000466660.96
rows=25990002width=32) (act 
> ual time=0.166..3698.776 rows=25990002 loops=1)

So when you enabled one of these options it started using it as usual.

> hashes=# set enable_indexscan=on;
> SET
> hashes=# explain  analyse verbose select name, count(name) as cnt from  hashcheck group by name order by name desc;

[cut]

>    ->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  (cost=0.00..466660.96 rows=25990002
width=32)(actual time=0.129. 
> .3653.848 rows=25990002 loops=1)

What I can not understand is why the seq scan's estimated cost is
better the index scan's one. It depends on the number of pages in
index/relation. May be the index is heavily bloated?

Let's see the sizes:

select pg_total_relation_size('hashcheck')
select pg_total_relation_size('hashcheck_name_rev_idx');


> hashes=# set enable_seqscan=on;
> SET
> hashes=# explain  analyse verbose select name, count(name) as cnt from  hashcheck group by name order by name desc;
>                                                                   QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------------
> -----
>  Sort  (cost=565411.67..565412.17 rows=200 width=32) (actual time=21746.799..21747.026 rows=4001 loops=1)
>    Output: name, (count(name))
>    Sort Key: hashcheck.name
>    Sort Method: quicksort  Memory: 315kB
>    ->  HashAggregate  (cost=565402.03..565404.03 rows=200 width=32) (actual time=21731.551..21733.277 rows=4001
loops=1)
>          Output: name, count(name)
>          ->  Seq Scan on public.hashcheck  (cost=0.00..435452.02 rows=25990002 width=32) (actual
time=29.431..13383.812rows=25990002 loop 
> s=1)
>                Output: id, name, value
>  Total runtime: 21747.356 ms
> (9 rows)
>
>
>
>
>



--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984

pgsql-performance by date:

From: Ondrej Ivanič
Date: 12 October 2012, 04:33:00
Subject: Re: hash aggregation

From: Korisk
Date: 12 October 2012, 14:37:58
Subject: Re: hash aggregation

Re: hash aggregation - Mailing list pgsql-performance

Previous

Next