Gary Cowell wrote:
> --- lnd@hnit.is wrote: > You can roughly estimate time
> spent for just scaning
>
>>the table using
>>something like this:
>>
>> select sum(version) from ... where version is not
>>null
>>
>> and just
>>
>> select sum(version) from ...
>>
>>The results would be interesting to compare.
>
>
> To answer (I hope) everyones questions at once:
>
> 1) Oracle and postmaster were not running at the same
> time
> 2) The queries were run once, to cache as much as
> possible then run again to get the timing
>
> 3) Distinct vs. no distinct (i.e. sort performance).
>
> select length(version) from vers where version is not
> null;
>
> Time: 9748.174 ms
>
> select distinct(version) from vers where version is
> not null;
>
> Time: 67988.972 ms
>
> So about an extra 60 seconds with the distinct on.
Which is basically the sorting time...
> Here is the explain analyze output from psql:
>
> # explain analyze select distinct version from vers
> where version is not null;
>
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
> Unique (cost=117865.77..120574.48 rows=142
> width=132) (actual time=63623.428..68269.111 rows=536
> loops=1)
> -> Sort (cost=117865.77..119220.13 rows=541741
> width=132) (actual time=63623.417..66127.641
> rows=541741 loops=1)
> Sort Key: "version"
> -> Seq Scan on vers (cost=0.00..21367.41
> rows=541741 width=132) (actual time=0.218..7214.903
> rows=541741 loops=1)
> Filter: ("version" IS NOT NULL)
> Total runtime: 68324.215 ms
> (6 rows)
>
> Time: 68326.062 ms
Yep - the seq-scan takes 7214.903 ms, there's a huge setup time for the
sort (63623.417) and it's not finished until 66127.641ms have elapsed.
>
> And the non-default .conf parameters:
>
> tcpip_socket = true
> max_connections = 100
> password_encryption = true
> shared_buffers = 2000
> sort_mem = 16384
> vacuum_mem = 8192
> effective_cache_size = 4000
> syslog = 2
Well, I'd probably up vacuum_mem, and check how much RAM is being used
for disk cache - I'd guess it's more than 32MB (4000 * 8kb).
You might want to up the shared_buffers, but that's going to depend on
the load.
Try increasing sort_mem temporarily, and see if that makes a difference:
SET sort_mem = 64000;
EXPLAIN ANALYSE ...
The only thing I can think is that you're getting disk activity to get a
sort that slow. I'd be expecting a hash-sort if PG thought it could fit
the distinct values in memory.
--
Richard Huxton
Archonet Ltd