performance discrepancy indexing one column versus two columns - Mailing list pgsql-performance
From | Mark Dilger |
---|---|
Subject | performance discrepancy indexing one column versus two columns |
Date | |
Msg-id | 4323E016.2060606@markdilger.com Whole thread Raw |
Responses |
Re: performance discrepancy indexing one column versus
|
List | pgsql-performance |
All, In the psql output below, I would expect the second query to run faster, because the b-tree index on two columns knows the values of 'b' for any given value of 'a', and hence does not need to fetch a row from the actual table. I am not seeing a speed-up, however, so I expect my understanding of the index mechanism is wrong. Could anyone enlighten me? Specifically, I would expect the first query to walk the b-tree looking for values of 'a' equal to 1, 100001, etc., and then a dereference over to the main table to fetch the value for column 'b'. But I would expect the second query to walk the b-tree looking for values of 'a' equal to 1, 100001, etc., and then find on that same page in the b-tree the value of 'b', thereby avoiding the dereference and extra page fetch. Is the problem that the two-column b-tree contains more data, is spread across more disk pages, and is hence slower to access, canceling out the performance gain of not having to fetch from the main table? Or is the query system not using the second column information from the index and doing the table fetch anyway? Or does the index store the entire row from the main table regardless of the column being indexed? I am running postgresql 8.0.3 on a Pentium 4 with ide hard drives and the default configuration file settings. Thanks in advance, mark mark=# create sequence test_id_seq; CREATE SEQUENCE mark=# create table test (a integer not null default nextval('test_id_seq'), b integer not null); CREATE TABLE mark=# create function testfunc () returns void as $$ mark$# declare mark$# i integer; mark$# begin mark$# for i in 1..1000000 loop mark$# insert into test (b) values (i); mark$# end loop; mark$# return; mark$# end; mark$# $$ language plpgsql; CREATE FUNCTION mark=# select * from testfunc(); testfunc ---------- (1 row) mark=# select count(*) from test; count --------- 1000000 (1 row) mark=# create index test_single_idx on test(a); CREATE INDEX mark=# vacuum full; VACUUM mark=# analyze; ANALYZE mark=# explain analyze select b from test where a in (1, 100001, 200001, 300001, 400001, 500001, 600001, 700001, 800001, 900001); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx on test (cost=0.00..30.36 rows=10 width=4) (actual time=0.145..0.917 rows=10 loops=1) Index Cond: ((a = 1) OR (a = 100001) OR (a = 200001) OR (a = 300001) OR (a = 400001) OR (a = 500001) OR (a = 600001) OR (a = 700001) OR (a = 800001) OR(a = 900001)) Total runtime: 1.074 ms (3 rows) mark=# drop index test_single_idx; DROP INDEX mark=# create index test_double_idx on test(a,b); CREATE INDEX mark=# vacuum full; VACUUM mark=# analyze; ANALYZE mark=# explain analyze select b from test where a in (1, 100001, 200001, 300001, 400001, 500001, 600001, 700001, 800001, 900001); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx on test (cost=0.00..43.48 rows=10 width=4) (actual time=0.283..1.119 rows=10 loops=1) Index Cond: ((a = 1) OR (a = 100001) OR (a = 200001) OR (a = 300001) OR (a = 400001) OR (a = 500001) OR (a = 600001) OR (a = 700001) OR (a = 800001) OR(a = 900001)) Total runtime: 1.259 ms (3 rows)
pgsql-performance by date: