On Tue, 17 Dec 2002 14:40:58 +0100, Thomas Beutin
<tyrone@laokoon.IN-Berlin.DE> wrote:
>On Fri, Dec 13, 2002 at 06:37:59PM +0100, Manfred Koizar wrote:
>> and compare them to the results of SELECT COUNT(DISTINCT ...)?
>Here we are:
SELECT COUNT(DISTINCT ...) vs. SELECT COUNT(*) FROM subselect
8.86 : 9.67
22889.60 : 25726.64
37803.25 : 44169.09
5420.06 : 5242.39
Not much of a difference. From Tom's answer ("it's a sort/uniq
implementation") I conclude that what happens internally is basically
the same for both variants. EXPLAIN is more informative for the
subselect version. Looking at its output we see that most of the time
is spent in the sort step.
> -> Sort (actual time=39487.47..40079.07 rows=195265 loops=1)
> -> Seq Scan (actual time=624.73..5814.48 rows=195265 loops=1)
^^^^^^^
Compare these values!
Scan vs. Sort for N rows rel t
1.92 : 7.66 164 0.0068
5426.79 : 23798.54 113612 0.0139
5814.48 : 40079.07 195265 0.0144
473.04 : 4401.14 29937 0.0127
rel t is (Sort - Scan) / (N * ln N)
Make sure you have configured enough sort_mem to perform the sort in
memory for a reasonable number of tuples. The default is 1024 (1 MB)
which is not enough for 200000 tuples.
>Then i can start playing with the indicees. I think, i got an idea about the
>usage of indicees.
If on your production system WHERE clauses look like
m_id = ... and visit BETWEEN ... AND ...
then an index on (m_id, visit) should help. If the planner doesn't
choose that index while you think it should, post here again (after
you have verified with EXPLAIN ANALYZE with enable_seqscan on/off).
Servus
Manfred