Thread: Use of indexes with table inheritance
I have a "master table" called events, and 10 subtables which are created using this: CREATE TABLE events001 ( ) INHERITS (events) WITHOUT OIDS; I then build all the indexes on it, including a column called "src_ip": CREATE INDEX events001_src_ip_idx ON events001(src_ip); Then I populate it with about 1M events per table, but none in the "master table". I'm using this for data partitioning basically. When I do a select count(event_id) from events... that does multiple sequential table scans, as expected. However, when I do: SELECT event_id FROM events WHERE src_ip = 3232235511; It claims to use indexes, but ti takes over 90 seconds to perform this (on a Athlong64 box, with 10K RPM SATA drive). This is after a VACUUM ANALYZE as well. The EXPLAIN results are here: bench2=# explain select count(event_id) from events where src_ip = 3232235512; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=113151.90..113151.90 rows=1 width=66) -> Append (cost=0.00..113078.85 rows=29218 width=66) -> Seq Scan on events (cost=0.00..0.00 rows=1 width=66) Filter: (src_ip = 3232235512::bigint) -> Index Scan using events001_src_ip_idx on events001 events (cost=0.00..11323.63 rows=2925 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events002_src_ip_idx on events002 events (cost=0.00..11365.13 rows=2937 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events003_src_ip_idx on events003 events (cost=0.00..11276.54 rows=2912 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events004_src_ip_idx on events004 events (cost=0.00..11222.15 rows=2898 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events005_src_ip_idx on events005 events (cost=0.00..11377.30 rows=2946 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events006_src_ip_idx on events006 events (cost=0.00..11365.05 rows=2935 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events007_src_ip_idx on events007 events (cost=0.00..11309.66 rows=2922 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events008_src_ip_idx on events008 events (cost=0.00..11371.45 rows=2937 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events009_src_ip_idx on events009 events (cost=0.00..11244.08 rows=2905 width=36) Index Cond: (src_ip = 3232235512::bigint) -> Index Scan using events010_src_ip_idx on events010 events (cost=0.00..11223.85 rows=2900 width=36) Index Cond: (src_ip = 3232235512::bigint) (24 rows) bench2=# select count(event_id) from events where src_ip = 3232235512; count ------- 40198 (1 row) If you do it against the derived tables it takes a while as well (roughly 1/10th as long). Any thoughts for where i'm missing something here? I would think an index lookup that is an equality, even with 1M entries, wouldn't take more than 1/2 second. Chris -- | Christopher Petrilli | petrilli@gmail.com
On Mon, Feb 07, 2005 at 01:59:16PM -0500, Christopher Petrilli wrote: > I have a "master table" called events, and 10 subtables which are > created using this: > > CREATE TABLE events001 ( ) INHERITS (events) WITHOUT OIDS; > > I then build all the indexes on it, including a column called "src_ip": > > CREATE INDEX events001_src_ip_idx ON events001(src_ip); > > Then I populate it with about 1M events per table, but none in the > "master table". I'm using this for data partitioning basically. Any reason you don't use a single table? Also, please post the EXPLAIN ANALYZE of your query. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Estoy de acuerdo contigo en que la verdad absoluta no existe... El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
On Mon, 7 Feb 2005 19:34:52 -0300, Alvaro Herrera <alvherre@dcc.uchile.cl> wrote: > Any reason you don't use a single table? Also, please post the EXPLAIN > ANALYZE of your query. My main reason was experimental evidence showed a massive drop in performance with 6 indexes (the number needed from a query perspective) when hitting 2-3M rows given the insert rate that I had to sustain. I'm using COPY to insert 2000 records/second at this point. This is an application that currently runs on Oracle and uses SQL*Loader to get data into the database. On the Oracle side we use Data Partioning to manage the data sets under it. We're talking a data set that can easily be 10B rows. Because of how Data Partitioning works, you can easily remove "chunks" of data when it needs to be archived. This is also why I'm using inheritance. bench2=# explain analyze select count(event_id) from events where src_ip = 3232235513; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=261090.73..261090.73 rows=1 width=66) (actual time=82950.432..82950.433 rows=1 loops=1) -> Append (cost=0.00..260917.98 rows=69097 width=66) (actual time=0.123..82854.881 rows=80171 loops=1) -> Seq Scan on events (cost=0.00..0.00 rows=1 width=66) (actual time=0.002..0.002 rows=0 loops=1) Filter: (src_ip = 3232235513::bigint) -> Index Scan using events001_src_ip_idx on events001 events (cost=0.00..11323.63 rows=2925 width=36) (actual time=0.118..2008.688 rows=7885 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Index Scan using events002_src_ip_idx on events002 events (cost=0.00..30803.25 rows=8001 width=36) (actual time=12.137..17262.378 rows=7972 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Index Scan using events003_src_ip_idx on events003 events (cost=0.00..11276.54 rows=2912 width=36) (actual time=10.995..17387.986 rows=7993 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Index Scan using events004_src_ip_idx on events004 events (cost=0.00..33378.92 rows=8667 width=36) (actual time=13.344..17220.166 rows=8039 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Index Scan using events005_src_ip_idx on events005 events (cost=0.00..29463.54 rows=7667 width=36) (actual time=8.827..17220.986 rows=8131 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Seq Scan on events006 events (cost=0.00..35228.00 rows=9667 width=36) (actual time=7.823..3185.068 rows=8055 loops=1) Filter: (src_ip = 3232235513::bigint) -> Index Scan using events007_src_ip_idx on events007 events (cost=0.00..11309.66 rows=2922 width=36) (actual time=0.115..1957.151 rows=8053 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Index Scan using events008_src_ip_idx on events008 events (cost=0.00..30820.39 rows=8001 width=36) (actual time=0.107..1976.218 rows=8076 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Index Scan using events009_src_ip_idx on events009 events (cost=0.00..32086.05 rows=8334 width=36) (actual time=0.117..2780.883 rows=7996 loops=1) Index Cond: (src_ip = 3232235513::bigint) -> Seq Scan on events010 events (cost=0.00..35228.00 rows=10000 width=36) (actual time=0.060..1716.214 rows=7971 loops=1) Filter: (src_ip = 3232235513::bigint) Total runtime: 82950.584 ms (25 rows) That's the second run of the same query, so that it can take advantage of caching. The first run was 80% more expensive in runtime. Chris -- | Christopher Petrilli | petrilli@gmail.com