Home > mailing lists

How can an index be larger than a table - Mailing list pgsql-performance

From	David Roussel
Subject	How can an index be larger than a table
Date	April 21, 2005 10:13:24
Msg-id	1114078390.26441.232391808@webmail.messagingengine.com Whole thread Raw
Responses	Re: How can an index be larger than a table
List	pgsql-performance

Tree view

Hi,

I have a series of tables with identical structure.  Some contain a few
thousand rows and some contain 3,000,000 rows.  Another applicate writes
the rows and my applicate reads then just by selecting where pk >
last_seen_pk limit 2000.

I've found that one of the tables, when selecting from it that one of
the tables is many times slower than the others.

For instance when reading data in batches of 2000 rows, it seems to take
26 seconds to query from dave_data_update_events  with 1593600, but only
1 or two seconds to query from jane_data_update_events with 3100000
rows!

This is ther SQL used....


|
|select
|   events.event_id, ctrl.real_name, events.tsds, events.value,
|   events.lds, events.correction, ctrl.type, ctrl.freq
|from dave_data_update_events events, dave_control ctrl
|where events.obj_id = ctrl.obj_id and
|events.event_id > 32128893::bigint
|order by events.event_id
|limit 2000
|

Here is the structure of the tables...

|
|CREATE TABLE dave_control (
|    obj_id numeric(6,0) NOT NULL,
|    real_name character varying(64) NOT NULL,
|    "type" numeric(2,0) NOT NULL,
|    freq numeric(2,0) NOT NULL
|);
|
|CREATE TABLE dave_data_update_events (
|    lds numeric(13,0) NOT NULL,
|    obj_id numeric(6,0) NOT NULL,
|    tsds numeric(13,0) NOT NULL,
|    value character varying(22) NOT NULL,
|    correction numeric(1,0) NOT NULL,
|    delta_lds_tsds numeric(13,0) NOT NULL,
|    event_id bigserial NOT NULL
|);
|
|CREATE UNIQUE INDEX dave_control_obj_id_idx ON dave_control USING btree
(obj_id);
|ALTER TABLE dave_control CLUSTER ON dave_control_obj_id_idx;
|
|CREATE UNIQUE INDEX dave_control_real_name_idx ON dave_control USING
btree (real_name);
|
|CREATE INDEX dave_data_update_events_lds_idx ON dave_data_update_events
USING btree (lds);
|
|CREATE INDEX dave_data_update_events_obj_id_idx ON
dave_data_update_events USING btree (obj_id);
|
|ALTER TABLE ONLY dave_control
|    ADD CONSTRAINT dave_control_obj_id_key UNIQUE (obj_id);
|
|ALTER TABLE ONLY dave_control
|    ADD CONSTRAINT dave_control_real_name_key UNIQUE (real_name);
|
|ALTER TABLE ONLY dave_data_update_events
|    ADD CONSTRAINT dave_data_update_events_event_id_key UNIQUE
(event_id);
|

There are several pairs of tables, but with names like rod, jane,
fredie, etc.. instead of dave.
The first thing to note about the scheme (not designed by me) is that
the control table is clustered on obj_id, but the data_update_events
table is not clustered.  Does that mean the rows will be stored in order
of insert?  That might be ok, because data_update_events table is like a
queue and I read it in the order the new rows are inserted.

What also seems weird to me is that the control table has some unique
indexes created on it, but the data_upate_events table just has a unique
constraint.  Will postgres use an index in the background to enforce
this constraint?

When looking at the indexes on the all the tables in DbVisualiser my
colleague noticed that the cardinality of the indexes on the rod, jane
and fredie tables was consistent, but for dave the cardinality was
strange...

|
|SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE
relname LIKE 'dave_data%';
|
|relname                                relkind reltuples relpages
|======================================= ======= ========= ========
|dave_data_update_events                r       1593600.0 40209
|dave_data_update_events_event_id_keyi1912320.0 29271
|dave_data_update_events_event_id_seqS       1.0       1
|dave_data_update_events_lds_idx        i       1593600.0 6139
|dave_data_update_events_obj_id_idx     i       1593600.0 6139
|iso_pjm_data_update_events_obj_id_idxi1593600.0 6139
|

Note that there are only 1593600 rows in the table, so why the 1912320
figure?

Of course I checked that the row count was correct...

|
|EXPLAIN ANALYZE
|select count(*) from iso_pjm_data_update_events
|
|QUERY PLAN
|Aggregate  (cost=60129.00..60129.00 rows=1 width=0) (actual
time=35933.292..35933.293 rows=1 loops=1)
|  ->  Seq Scan on iso_pjm_data_update_events  (cost=0.00..56145.00
rows=1593600 width=0) (actual time=0.213..27919.497 rows=1593600
loops=1)
|Total runtime: 35933.489 ms
|

and...

|
|select count(*) from iso_pjm_data_update_events
|
|count
|1593600
|

so it's not that there are any undeleted rows lying around.

So any comments on the index structure?  Any ideas why the cardinality
of the index is greater than the number of rows in the table?  Was it
because the table used to be larger?

Also any ideas on how else to track down the big performance difference
between tables of the same structure?

pgsql-performance by date:

From: Richard van den Berg
Date: 21 April 2005, 08:16:02
Subject: Re: When are index scans used over seq scans?

From: "Shoaib Burq (VPAC)"
Date: 21 April 2005, 11:50:03
Subject: two queries and dual cpu (perplexed)

How can an index be larger than a table - Mailing list pgsql-performance

Previous

Next