performance discrepancy indexing one column versus two columns - Mailing list pgsql-performance

From Mark Dilger
Subject performance discrepancy indexing one column versus two columns
Date
Msg-id 4323E016.2060606@markdilger.com
Whole thread Raw
Responses Re: performance discrepancy indexing one column versus
List pgsql-performance
All,

In the psql output below, I would expect the second query to run faster, because
the b-tree index on two columns knows the values of 'b' for any given value of
'a', and hence does not need to fetch a row from the actual table.  I am not
seeing a speed-up, however, so I expect my understanding of the index mechanism
is wrong.  Could anyone enlighten me?

Specifically, I would expect the first query to walk the b-tree looking for
values of 'a' equal to 1, 100001, etc., and then a dereference over to the main
table to fetch the value for column 'b'.

But I would expect the second query to walk the b-tree looking for values of 'a'
equal to 1, 100001, etc., and then find on that same page in the b-tree the
value of 'b', thereby avoiding the dereference and extra page fetch.

Is the problem that the two-column b-tree contains more data, is spread across
more disk pages, and is hence slower to access, canceling out the performance
gain of not having to fetch from the main table?  Or is the query system not
using the second column information from the index and doing the table fetch
anyway?  Or does the index store the entire row from the main table regardless
of the column being indexed?

I am running postgresql 8.0.3 on a Pentium 4 with ide hard drives and the
default configuration file settings.

Thanks in advance,

mark



mark=# create sequence test_id_seq;
CREATE SEQUENCE
mark=# create table test (a integer not null default nextval('test_id_seq'), b
integer not null);
CREATE TABLE
mark=# create function testfunc () returns void as $$
mark$# declare
mark$# i integer;
mark$# begin
mark$# for i in 1..1000000 loop
mark$# insert into test (b) values (i);
mark$# end loop;
mark$# return;
mark$# end;
mark$# $$ language plpgsql;
CREATE FUNCTION
mark=# select * from testfunc();
  testfunc
----------

(1 row)

mark=# select count(*) from test;
   count
---------
  1000000
(1 row)

mark=# create index test_single_idx on test(a);
CREATE INDEX
mark=# vacuum full;
VACUUM
mark=# analyze;
ANALYZE
mark=# explain analyze select b from test where a in (1, 100001, 200001, 300001,
400001, 500001, 600001, 700001, 800001, 900001);

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using test_single_idx, test_single_idx, test_single_idx,
test_single_idx, test_single_idx, test_single_idx, test_single_idx,
test_single_idx, test_single_idx, test_single_idx on test  (cost=0.00..30.36
rows=10 width=4) (actual time=0.145..0.917 rows=10 loops=1)
    Index Cond: ((a = 1) OR (a = 100001) OR (a = 200001) OR (a = 300001) OR (a =
400001) OR (a = 500001) OR (a = 600001) OR (a = 700001) OR (a = 800001) OR(a =
900001))
  Total runtime: 1.074 ms
(3 rows)

mark=# drop index test_single_idx;
DROP INDEX
mark=# create index test_double_idx on test(a,b);
CREATE INDEX
mark=# vacuum full;
VACUUM
mark=# analyze;
ANALYZE
mark=# explain analyze select b from test where a in (1, 100001, 200001, 300001,
400001, 500001, 600001, 700001, 800001, 900001);

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using test_double_idx, test_double_idx, test_double_idx,
test_double_idx, test_double_idx, test_double_idx, test_double_idx,
test_double_idx, test_double_idx, test_double_idx on test  (cost=0.00..43.48
rows=10 width=4) (actual time=0.283..1.119 rows=10 loops=1)
    Index Cond: ((a = 1) OR (a = 100001) OR (a = 200001) OR (a = 300001) OR (a =
400001) OR (a = 500001) OR (a = 600001) OR (a = 700001) OR (a = 800001) OR(a =
900001))
  Total runtime: 1.259 ms
(3 rows)

pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: CHECK vs REFERENCES
Next
From: Gavin Sherry
Date:
Subject: Re: performance discrepancy indexing one column versus