Re: again on index usage - Mailing list pgsql-hackers
From | Daniel Kalchev |
---|---|
Subject | Re: again on index usage |
Date | |
Msg-id | 200201100939.LAA28676@dcave.digsys.bg Whole thread Raw |
In response to | Re: again on index usage (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: again on index usage
Re: again on index usage |
List | pgsql-hackers |
>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > In any case, if we need to hit 50k pages (assuming the indexeddata is > > randomly scattered in the file), and having to read these three times each , it > > will be less I/Othan having to read 1.7 million records.> > How do you arrive at that? Assuming 100 records per page (probably the> rightorder of magnitude), the seqscan alternative is 17k page reads.> Yes, you examine more tuples, but CPUs are lots fasterthan disks. I tried this: db=# select * into iplog_test from iplog_gate200112; SELECT db=# create index iplog_test_ipaddr_idx on iplog_test(ipaddr); CREATE db=# cluster iplog_test_ipaddr_idx on iplog_test; CLUSTER db=# create index iplog_test_ipdate_idx on iplog_test(ipdate); CREATE db=# vacuum verbose analyze iplog_test; NOTICE: --Relation iplog_test-- NOTICE: Pages 17722: Changed 0, reaped 0, Empty 0, New 0; Tup 1706202: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 80, MaxLen 88; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 1.48s/-1.86u sec. NOTICE: Index iplog_test_ipaddr_idx: Pages 5621; Tuples 1706202. CPU 0.51s/1.80u sec. NOTICE: Index iplog_test_ipdate_idx: Pages 4681; Tuples 1706202. CPU 0.36s/1.92u sec. NOTICE: --Relation pg_toast_253297758-- NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: Index pg_toast_253297758_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec. NOTICE: Analyzing... VACUUM db=# explain db-# SELECT sum(input), sum(output) FROM iplog_test db-# WHERE db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02' AND db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02' AND db-# ipaddr <<= '193.68.240.0/20' AND 'uni-gw' ~ router; NOTICE: QUERY PLAN: Aggregate (cost=56112.97..56112.97 rows=1 width=16) -> Seq Scan on iplog_test (cost=0.00..56111.54 rows=284 width=16) EXPLAIN query runs for ca 3.5 minutes. db=# set enable_seqscan to off; the query plan is Aggregate (cost=100507.36..100507.36 rows=1 width=16) -> Index Scan using iplog_test_ipdate_idx on iplog_test (cost=0.00..100505.94 rows=284 width=16) query runs for ca 2.2 minutes. Moves closer to your point :-) Anyway, the platform is an dual Pentium III 550 MHz (intel) machine with 512 MB RAM, with 15000 RPM Cheetah for the database, running BSD/OS 4.2. The machine is reasonably loaded all the time, so this is very much real-time test. I agree, that with the 'wrong' clustering the index scan is not so much faster than the sequential scan. Perhaps I need to tune this machine's costs to prefer more disk intensive operations over CPU intensive operations? Let's see what 4.2 will result in. Daniel
pgsql-hackers by date: