Thread: WG: Re: Re: select ... where ='' does a seqscan [auf Viren geprueft]

WG: Re: Re: select ... where ='' does a seqscan [auf Viren geprueft]

From
Silvio Matthes
Date:

>Am Dienstag, 24. August 2004 11:59 schrieb Silvio Matthes:
>> So using the index does need more time than a sequential scan?

>It's possible.  If you want to prove the opposite, please post the output of
>EXPLAIN ANALYZE in both cases.

On my system, with PostgresQL 8.0Beta1, I could prove the opposite, but that's not the point for me.
Both cases are too slow:

set enable_seqscan=true  
explain analyze select count(*) from document_params where param_name='KONTO_NR' and param_value='';

                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=241980.45..241980.45 rows=1 width=0) (actual time=47766.000..47766.000 rows=1 loops=1)
   ->  Seq Scan on document_params  (cost=0.00..241600.27 rows=152073 width=0) (actual time=0.000..47719.000 rows=6672 loops=1)
         Filter: (((param_name)::text = 'KONTO_NR'::text) AND ((param_value)::text = ''::text))
 Total runtime: 47938.000 ms
(4 rows)



set enable_seqscan=false
explain analyze select count(*) from document_params where param_name='KONTO_NR' and param_value='';

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=592164.03..592164.03 rows=1 width=0) (actual time=31828.000..31828.000 rows=1 loops=1)
   ->  Index Scan using idx_empty on document_params  (cost=0.00..591783.84 rows=152073 width=0) (actual time=140.000..31782.000 rows=6672 loops=1)
         Index Cond: ((param_name)::text = 'KONTO_NR'::text)
         Filter: ((param_value)::text = ''::text)
 Total runtime: 31828.000 ms
(5 rows)


In other tests the server cached the responses so I cannot compare the results, in other cases the server did need some 20s for an index-scan, probably due to too low configured memory.

This brings me to the following questions:

1.) What can I do to avoid seqscans if I want rows with empty param_value?
2.) Would it be wiser to change all the empty values to some placeholder (e.g. 'empty')?
    From the tests made up to now, this would be the fastest solution, but wouldn't be very nice in concern of readability...

a bit offtopic, but related for me:
3.) how can I prevent the server from caching or empty the cache under win32 (stopping and re-starting the service doesn't help) for comparing similar statements?
4.) how can I change the memory settings for the postmaster (v8.0beta) in win32?


As always, any help would be highly appreciated,

Ciao,

Silvio Matthes

Re: WG: Re: Re: select ... where ='' does a seqscan [auf Viren geprueft]

From
Tom Lane
Date:
Silvio Matthes <silvio.matthes@xcom.de> writes:
> 2.) Would it be wiser to change all the empty values to some placeholder
> (e.g. 'empty')?

You're missing the point entirely.  '' is being treated specially
because the planner can see from the column statistics that it occurs a
lot.  Substituting a different value would simply move the problem over
to that value (as soon as you'd done ANALYZE, anyway).

            regards, tom lane