Андрей Репко wrote:
> RH> What happens if you use something like
> RH> SELECT DISTINCT alias_id FROM ma_data;
> sart_ma=# EXPLAIN ANALYZE SELECT DISTINCT alias_id FROM ma_data;
> QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------
> Unique (cost=65262.63..66770.75 rows=32 width=4) (actual time=16780.214..18250.761 rows=32 loops=1)
> -> Sort (cost=65262.63..66016.69 rows=301624 width=4) (actual time=16780.204..17255.129 rows=301624 loops=1)
> Sort Key: alias_id
> -> Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=6.896..15321.023 rows=301624
loops=1)
> Total runtime: 18292.542 ms
> sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id;
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=38565.30..38565.62 rows=32 width=4) (actual time=15990.863..15990.933 rows=32 loops=1)
> -> Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=3.446..14572.141 rows=301624
loops=1)
> Total runtime: 15991.244 ms
OK - the planner thinks it's doing the right thing, your cost estimates
are way off. If you look back at where you got an index-scan, it's cost
was 1.1 million.
Index Scan using reference_9_fk on ma_data (cost=0.00..1139526.57
That's way above the numbers for seq-scan+hash/sort, so if the cost
estimate was right PG would be making the right choice. Looks like you
need to check your configuration settings. Have you read:
http://www.powerpostgresql.com/PerfList
or
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
--
Richard Huxton
Archonet Ltd