Re: Performance pb vs SQLServer. - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: Performance pb vs SQLServer.
Date
Msg-id 42FFE17A.6010208@arbash-meinel.com
Whole thread Raw
In response to Performance pb vs SQLServer.  (Stéphane COEZ <scoez@harrysoftware.com>)
Responses Re: Performance pb vs SQLServer.  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Re: Performance pb vs SQLServer.  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Re: Performance pb vs SQLServer.  (Stéphane COEZ <scoez@harrysoftware.com>)
List pgsql-performance
Stéphane COEZ wrote:

>Hi,
>
>I have a perfomance issue :
>
>I run PG (8.0.3) and SQLServer2000 on a Windows2000 Server (P4 1,5Ghz 512Mo)
>I have a table (3200000 rows) and I run this single query :
>
>select cod from mytable group by cod
>I have an index on cod (char(4) - 88 different values)
>
>PG = ~ 20 sec.
>SQLServer = < 8 sec
>
>
>the explain is :
>
>HashAggregate  (cost=64410.09..64410.09 rows=55 width=8)
>  ->  Seq Scan on mytable  (cost=0.00..56325.27 rows=3233927 width=8)
>
>
>if I switch to "enable_hashagg = false" (just for a try...)
>the planner will choose my index :
>
>Group  (cost=0.00..76514.01 rows=55 width=8)
>  ->  Index Scan using myindex on mytable  (cost=0.00..68429.20 rows=3233927
>width=8)
>
>but performance will be comparable to previous test.
>
>So with or without using Index I have the same result.
>
>

My guess is that this is part of a larger query. There isn't really much
you can do. If you want all 3.2M rows, then you have to wait for them to
be pulled in.

What you generally can do for performance, is to restructure things, so
that you *don't* have to touch all 3.2M rows.
If you are just trying to determine what the unique entries are for cod,
you probably are better off doing some normalization, and keeping a
separate table of cod values.

I'm guessing the reason your query is faster with SQLServer is because
of how postgres handles MVCC. Basically, it still has to fetch the main
page to determine if a row exists. While SQL server doesn't do MVCC, so
it can just look things up in the index.

You might also try a different query, something like:

SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod;
(You may or may not want order by, or group by, try the different
combinations.)
It might be possible to have the planner realize that all you want is
unique rows, just doing a group by doesn't give you that.

John
=:->

>
>Thanks for help.
>
>Stéphane COEZ
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>
>
>



Attachment

pgsql-performance by date:

Previous
From: John Arbash Meinel
Date:
Subject: Re: How many views is ok?
Next
From: Tom Lane
Date:
Subject: Re: How many views is ok?