Re: Unacceptable postgres performance vs. Microsoft sqlserver - Mailing list pgsql-general
From | Gregory Stark |
---|---|
Subject | Re: Unacceptable postgres performance vs. Microsoft sqlserver |
Date | |
Msg-id | 87r6d8faco.fsf@oxford.xeocode.com Whole thread Raw |
In response to | Re: Unacceptable postgres performance vs. Microsoft sqlserver (Ivan Sergio Borgonovo <mail@webthatworks.it>) |
Responses |
Re: Unacceptable postgres performance vs. Microsoft
sqlserver
|
List | pgsql-general |
"Ivan Sergio Borgonovo" <mail@webthatworks.it> writes: > But why once you add the index and count distinct the performances > are still so far? > I'd say that counting in this case is not the hardest thing to do, > but rather the "distinct" part. Your tests have been a bit unfortunate in finding a few particularly soft bits in Postgres's underbelly. a) Postgres can't do "index-only" scans so the index doesn't really help in this case. There's some discussion about improving that soon but it's still early. b) Postgres doesn't know how to pick just distinct values out of an index, it has to find a start point and read all the records from that point forward so even if this is a very low cardinality value it would have to read the whole index. c) DISTINCT is one of the earlier features in Postgres and a lot planner code is a lot smarter. In particular it doesn't know about using hash tables to find distinct values. This you can work around by rewriting the query as GROUP BY goes through a more recent code path. d) The default settings if you haven't tuned postgresql.conf are quite conservative. That alone often makes it look poor when compared against other databases. In particular it makes it unlikely to pick hash tables for things or do in-memory sorts unless you raise work_mem. This is what you're doing now: postgres=# explain select count(distinct aid) from accounts; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=2838.00..2838.01 rows=1 width=4) -> Seq Scan on accounts (cost=0.00..2588.00 rows=100000 width=4) (2 rows) This is what you're expecting it to do (on a freshly built clustered index it might perform ok but on a non-clustered index it will be terrible): postgres=# explain select count(*) from (select 1 from accounts group by aid) as a; QUERY PLAN -------------------------------------------------------------------------------------------------- Aggregate (cost=5475.26..5475.27 rows=1 width=0) -> Group (cost=0.00..4225.26 rows=100000 width=4) -> Index Scan using accounts_pkey on accounts (cost=0.00..3975.26 rows=100000 width=4) (3 rows) postgres=# set work_mem = '128M'; ERROR: invalid value for parameter "work_mem": "128M" HINT: Valid units for this parameter are "kB", "MB", and "GB". (ARGH! DAMN YOU PETER!!!!) This is probably the best you can get Postgres to do currently: postgres=# set work_mem = '128MB'; SET postgres=# explain select count(*) from (select 1 from accounts group by aid) as a; QUERY PLAN ---------------------------------------------------------------------------- Aggregate (cost=5088.00..5088.01 rows=1 width=0) -> HashAggregate (cost=2838.00..3838.00 rows=100000 width=4) -> Seq Scan on accounts (cost=0.00..2588.00 rows=100000 width=4) (3 rows) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
pgsql-general by date: