How can an index be larger than a table - Mailing list pgsql-performance
From | David Roussel |
---|---|
Subject | How can an index be larger than a table |
Date | |
Msg-id | 1114078390.26441.232391808@webmail.messagingengine.com Whole thread Raw |
Responses |
Re: How can an index be larger than a table
|
List | pgsql-performance |
Hi, I have a series of tables with identical structure. Some contain a few thousand rows and some contain 3,000,000 rows. Another applicate writes the rows and my applicate reads then just by selecting where pk > last_seen_pk limit 2000. I've found that one of the tables, when selecting from it that one of the tables is many times slower than the others. For instance when reading data in batches of 2000 rows, it seems to take 26 seconds to query from dave_data_update_events with 1593600, but only 1 or two seconds to query from jane_data_update_events with 3100000 rows! This is ther SQL used.... | |select | events.event_id, ctrl.real_name, events.tsds, events.value, | events.lds, events.correction, ctrl.type, ctrl.freq |from dave_data_update_events events, dave_control ctrl |where events.obj_id = ctrl.obj_id and |events.event_id > 32128893::bigint |order by events.event_id |limit 2000 | Here is the structure of the tables... | |CREATE TABLE dave_control ( | obj_id numeric(6,0) NOT NULL, | real_name character varying(64) NOT NULL, | "type" numeric(2,0) NOT NULL, | freq numeric(2,0) NOT NULL |); | |CREATE TABLE dave_data_update_events ( | lds numeric(13,0) NOT NULL, | obj_id numeric(6,0) NOT NULL, | tsds numeric(13,0) NOT NULL, | value character varying(22) NOT NULL, | correction numeric(1,0) NOT NULL, | delta_lds_tsds numeric(13,0) NOT NULL, | event_id bigserial NOT NULL |); | |CREATE UNIQUE INDEX dave_control_obj_id_idx ON dave_control USING btree (obj_id); |ALTER TABLE dave_control CLUSTER ON dave_control_obj_id_idx; | |CREATE UNIQUE INDEX dave_control_real_name_idx ON dave_control USING btree (real_name); | |CREATE INDEX dave_data_update_events_lds_idx ON dave_data_update_events USING btree (lds); | |CREATE INDEX dave_data_update_events_obj_id_idx ON dave_data_update_events USING btree (obj_id); | |ALTER TABLE ONLY dave_control | ADD CONSTRAINT dave_control_obj_id_key UNIQUE (obj_id); | |ALTER TABLE ONLY dave_control | ADD CONSTRAINT dave_control_real_name_key UNIQUE (real_name); | |ALTER TABLE ONLY dave_data_update_events | ADD CONSTRAINT dave_data_update_events_event_id_key UNIQUE (event_id); | There are several pairs of tables, but with names like rod, jane, fredie, etc.. instead of dave. The first thing to note about the scheme (not designed by me) is that the control table is clustered on obj_id, but the data_update_events table is not clustered. Does that mean the rows will be stored in order of insert? That might be ok, because data_update_events table is like a queue and I read it in the order the new rows are inserted. What also seems weird to me is that the control table has some unique indexes created on it, but the data_upate_events table just has a unique constraint. Will postgres use an index in the background to enforce this constraint? When looking at the indexes on the all the tables in DbVisualiser my colleague noticed that the cardinality of the indexes on the rod, jane and fredie tables was consistent, but for dave the cardinality was strange... | |SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'dave_data%'; | |relname relkind reltuples relpages |======================================= ======= ========= ======== |dave_data_update_events r 1593600.0 40209 |dave_data_update_events_event_id_keyi1912320.0 29271 |dave_data_update_events_event_id_seqS 1.0 1 |dave_data_update_events_lds_idx i 1593600.0 6139 |dave_data_update_events_obj_id_idx i 1593600.0 6139 |iso_pjm_data_update_events_obj_id_idxi1593600.0 6139 | Note that there are only 1593600 rows in the table, so why the 1912320 figure? Of course I checked that the row count was correct... | |EXPLAIN ANALYZE |select count(*) from iso_pjm_data_update_events | |QUERY PLAN |Aggregate (cost=60129.00..60129.00 rows=1 width=0) (actual time=35933.292..35933.293 rows=1 loops=1) | -> Seq Scan on iso_pjm_data_update_events (cost=0.00..56145.00 rows=1593600 width=0) (actual time=0.213..27919.497 rows=1593600 loops=1) |Total runtime: 35933.489 ms | and... | |select count(*) from iso_pjm_data_update_events | |count |1593600 | so it's not that there are any undeleted rows lying around. So any comments on the index structure? Any ideas why the cardinality of the index is greater than the number of rows in the table? Was it because the table used to be larger? Also any ideas on how else to track down the big performance difference between tables of the same structure?
pgsql-performance by date: