Thread: Severe performance problems for simple query
Hi folks, Here is the executive summary: * RHEL5 (postgresql 8.1, .conf tweaked for performance [1]) * 2x Intel E5410 @ 2.33GHz (8 cores), 8GB RAM, 15KRPM SAS disks * 4.9 million records in a table (IP address info) * composite primary key: primary key(ipFrom, ipTo) * ipFrom/ipTo are int8 (see below for full schema info [2]) * bad performance on queries of the form: select * from ipTable where ipFrom <= val and val <= ipTo Part of the problem is that most of the time PostgreSQL decides to use seq scans on the table, resulting in queries taking many seconds (sometimes 3, 7, 20 sec). We did ANALYZE and enabled statistics, and that sometimes fixes the problem temporarily, but overnight (without the database being used), it reverts to seq scans. For example: perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 and 2130706433 <= ipto; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Seq Scan on ipligenceipaddress (cost=0.00..139903.80 rows=1209530 width=145) (actual time=1233.628..2100.891 rows=1 loops=1) Filter: ((ipfrom <= 2130706433) AND (2130706433 <= ipto)) Total runtime: 2100.928 ms (3 rows) Moreover, even when it is using the index, it is not all that fast: perpedes_db=# SET enable_seqscan = off; SET perpedes_db=# EXPLAIN ANALYZE select * from static.ipligenceipaddress where 3507360727 between ipfrom and ipto; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using ipligenceipaddress_pkey on ipligenceipaddress (cost=0.00..148143.67 rows=806199 width=146) (actual time=351.316..351.320rows=1 loops=1) Index Cond: ((3507360727::bigint >= ipfrom) AND (3507360727::bigint <= ipto)) Total runtime: 351.355 ms (3 rows) So, my questions are: * did we miss any obvious settings? * why does it decide all of a sudden to do seq scans? * adding a "limit 1" sometimes causes the query to be even slower, when in fact it should have helped the DB to return faster, no? * in the ideal case, what execution times should I be expecting? Is ~400ms reasonable? I would have hoped this to be <40ms... * AFAICT, the (ipFrom, ipTo) intervals should be mutually exclusive, so the result should be at most one row. Can this info help the DB do a faster query? If so, how can I express that? * the DB takes tens of minutes to do an ANALYZE on this table, which doesn't happen with the default configuration. Any idea how I can fix that? Thank you! ==================================================================== [1] Changes from standard config: --- /var/lib/pgsql/data/postgresql.conf.orig 2008-03-21 11:51:45.000000000 -0400 +++ /var/lib/pgsql/data/postgresql.conf 2008-03-21 21:04:38.000000000 -0400 @@ -90,19 +90,19 @@ # - Memory - -shared_buffers = 1000 # min 16 or max_connections*2, 8KB each -#temp_buffers = 1000 # min 100, 8KB each -#max_prepared_transactions = 5 # can be 0 or more +shared_buffers = 50000 # min 16 or max_connections*2, 8KB each +temp_buffers = 10000 # min 100, 8KB each +max_prepared_transactions = 100 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). -#work_mem = 1024 # min 64, size in KB -#maintenance_work_mem = 16384 # min 1024, size in KB +work_mem = 2048 # min 64, size in KB +maintenance_work_mem = 131072 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - -#max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each -#max_fsm_relations = 1000 # min 100, ~70 bytes each +max_fsm_pages = 200000 # min max_fsm_relations*16, 6 bytes each +max_fsm_relations = 10000 # min 100, ~70 bytes each # - Kernel Resource Usage - @@ -111,11 +111,11 @@ # - Cost-Based Vacuum Delay - -#vacuum_cost_delay = 0 # 0-1000 milliseconds -#vacuum_cost_page_hit = 1 # 0-10000 credits +vacuum_cost_delay = 200 # 0-1000 milliseconds +vacuum_cost_page_hit = 6 # 0-10000 credits #vacuum_cost_page_miss = 10 # 0-10000 credits #vacuum_cost_page_dirty = 20 # 0-10000 credits -#vacuum_cost_limit = 200 # 0-10000 credits +vacuum_cost_limit = 100 # 0-10000 credits # - Background writer - @@ -141,13 +141,13 @@ # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes -#wal_buffers = 8 # min 4, 8KB each +wal_buffers = 128 # min 4, 8KB each #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - -#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each +checkpoint_segments = 192 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 30 # in seconds, 0 is off @@ -175,12 +175,12 @@ # - Planner Cost Constants - -#effective_cache_size = 1000 # typically 8KB each -#random_page_cost = 4 # units are one sequential page fetch +effective_cache_size = 393216 # typically 8KB each +random_page_cost = 2 # units are one sequential page fetch # cost -#cpu_tuple_cost = 0.01 # (same) -#cpu_index_tuple_cost = 0.001 # (same) -#cpu_operator_cost = 0.0025 # (same) +cpu_tuple_cost = 0.002 # (same) +cpu_index_tuple_cost = 0.0002 # (same) +cpu_operator_cost = 0.0005 # (same) # - Genetic Query Optimizer - @@ -329,10 +329,10 @@ # - Query/Index Statistics Collector - -#stats_start_collector = on -#stats_command_string = off -#stats_block_level = off -#stats_row_level = off +stats_start_collector = on +stats_command_string = on +stats_block_level = on +stats_row_level = on #stats_reset_on_server_start = off @@ -340,8 +340,8 @@ # AUTOVACUUM PARAMETERS #--------------------------------------------------------------------------- -#autovacuum = off # enable autovacuum subprocess? +autovacuum = on # enable autovacuum subprocess? #autovacuum_naptime = 60 # time between autovacuum runs, in secs #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before # vacuum #autovacuum_analyze_threshold = 500 # min # of tuple updates before @@ -400,7 +400,7 @@ #--------------------------------------------------------------------------- #deadlock_timeout = 1000 # in milliseconds -#max_locks_per_transaction = 64 # min 10 +max_locks_per_transaction = 512 # min 10 # note: each lock table slot uses ~220 bytes of shared memory, and there are # max_locks_per_transaction * (max_connections + max_prepared_transactions) # lock table slots. [2] Actual schema for the table: create table ipligenceIpAddress ( ipFrom int8 not null default 0, ipTo int8 not null default 0, countryCode varchar(10) not null, countryName varchar(255) not null, continentCode varchar(10) not null, continentName varchar(255) not null, timeZone varchar(10) not null, regionCode varchar(10) not null, regionName varchar(255) not null, owner varchar(255) not null, cityName varchar(255) not null, countyName varchar(255) not null, latitude float8 not null, longitude float8 not null, createdTS timestamp with time zone default current_timestamp, primary key(ipFrom, ipTo) ); -- Dimi Paun <dimi@lattica.com> Lattica, Inc.
On Mon, 7 Apr 2008, Dimi Paun wrote: > * bad performance on queries of the form: > select * from ipTable where ipFrom <= val and val <= ipTo This type of query is very hard for a normal B-tree index to answer. For example, say val is half-way between min and max values. If you have an index on ipFrom, it will be able to restrict the entries to about half of them, which is no real benefit over a sequential scan. Likewise, an index on ipTo will be able to restrict the entries to half of them, with no benefit. The intersection of these two halves may be just one entry, but finding that out is non-trivial. An index bitmap scan would do it if you can persuade Postgres to do that, but really you want an R-tree index on the two columns, like I have requested in the past. You can achieve that to some degree by using Postgres' geometric indexes, but it's ugly. Note that the following is completely untested and may not work with int8 values. Firstly, you need to create the index. The index will contain fake "boxes" that stretch from ipFrom to ipTo. CREATE INDEX index_name ON table_name ((box '((ipFrom, 0), (ipTo, 1))')) Then, instead of querying simply for fromIp and toIp, query on whether the fake box overlaps with a point representing val. SELECT blah FROM table_name WHERE (box '((ipFrom, 0), (ipTo, 2))') @> (point '(val, 1)'); Or alternatively you could adapt the "seg" GiST index to int8 values. Hope you get this sorted out - it's something I'll have to do at some point soon too. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!!
On Mon, 7 Apr 2008, Dimi Paun wrote: > * bad performance on queries of the form: > select * from ipTable where ipFrom <= val and val <= ipTo Oh yes, if you can guarantee that no two entries overlap at all, then there is a simpler way. Just create a B-tree index on ipFrom as usual, sort by ipFrom, and LIMIT to the first result: SELECT blah FROM table_name WHERE ipFrom <= 42 ORDER BY ipFrom DESC LIMIT 1 This should run *very* quickly. However, if any entries overlap at all then you will get incorrect results. Matthew -- I'm always interested when [cold callers] try to flog conservatories. Anyone who can actually attach a conservatory to a fourth floor flat stands a marginally better than average chance of winning my custom. (Seen on Usenet)
Matthew wrote: > On Mon, 7 Apr 2008, Dimi Paun wrote: >> * bad performance on queries of the form: >> select * from ipTable where ipFrom <= val and val <= ipTo > > This type of query is very hard for a normal B-tree index to answer. For > example, say val is half-way between min and max values. If you have an > index on ipFrom, it will be able to restrict the entries to about half > of them, which is no real benefit over a sequential scan. Likewise, an > index on ipTo will be able to restrict the entries to half of them, with > no benefit. The intersection of these two halves may be just one entry, > but finding that out is non-trivial. An index bitmap scan would do it if > you can persuade Postgres to do that, but really you want an R-tree > index on the two columns, like I have requested in the past. If I understood the original post correctly, the ipFrom and ipTo columns actually split a single linear ip address space into non-overlapping chunks. Something like this: ipFrom ipTo 1 10 10 20 20 50 50 60 ... In that case, a regular index on (ipFrom, ipTo) should work just fine, and that's what he's got. Actually, an index on just ipFrom would probably work just as well. The problem is that the planner doesn't know about that special relationship between ipFrom and ipTo. Perhaps it could be hinted by explicitly specifying "AND ipTo > ipFrom" in the query? I don't know why the single index lookup took > 300ms, though. That does seem high to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2008-04-07 at 17:32 +0100, Heikki Linnakangas wrote: > If I understood the original post correctly, the ipFrom and ipTo > columns actually split a single linear ip address space into > non-overlapping chunks. Something like this: > > ipFrom ipTo > 1 10 > 10 20 > 20 50 > 50 60 > ... > Indeed. > In that case, a regular index on (ipFrom, ipTo) should work just fine, > and that's what he's got. Actually, an index on just ipFrom would > probably work just as well. No, it doesn't: perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom); CREATE INDEX perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 and 2130706433 <= ipto limit1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.07 rows=1 width=145) (actual time=1519.526..1519.527 rows=1 loops=1) -> Index Scan using temp1 on ipligenceipaddress (cost=0.00..84796.50 rows=1209308 width=145) (actual time=1519.524..1519.524rows=1 loops=1) Index Cond: (ipfrom <= 2130706433) Filter: (2130706433 <= ipto) Total runtime: 1519.562 ms (5 rows) This is huge, I'd say... > The problem is that the planner doesn't know about that special > relationship between ipFrom and ipTo. Perhaps it could be hinted by > explicitly specifying "AND ipTo > ipFrom" in the query? Unfortunately, it still does a seq scan: perpedes_db=# SET enable_seqscan = on; SET perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 and 2130706433 <= ipto ANDipTo > ipFrom limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..0.35 rows=1 width=145) (actual time=1245.293..1245.294 rows=1 loops=1) -> Seq Scan on ipligenceipaddress (cost=0.00..142343.80 rows=403103 width=145) (actual time=1245.290..1245.290 rows=1loops=1) Filter: ((ipfrom <= 2130706433) AND (2130706433 <= ipto) AND (ipto > ipfrom)) Total runtime: 1245.335 ms (4 rows) > I don't know why the single index lookup took > 300ms, though. That > does seem high to me. That is my feeling. I would have expected order of magnitude faster execution times, the DB runs on fairly decent hardware... -- Dimi Paun <dimi@lattica.com> Lattica, Inc.
On Mon, 2008-04-07 at 17:27 +0100, Matthew wrote: > Oh yes, if you can guarantee that no two entries overlap at all, then > there is a simpler way. Just create a B-tree index on ipFrom as usual, > sort by ipFrom, and LIMIT to the first result: > > SELECT blah FROM table_name > WHERE ipFrom <= 42 ORDER BY ipFrom DESC LIMIT 1 > > This should run *very* quickly. However, if any entries overlap at all > then you will get incorrect results. Thanks Matthew, this seems to be indeed a lot faster: perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom); CREATE INDEX perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 ORDER BY ipFrom DESC LIMIT1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.03 rows=1 width=145) (actual time=0.060..0.060 rows=1 loops=1) -> Index Scan Backward using temp1 on ipligenceipaddress (cost=0.00..83453.92 rows=2685155 width=145) (actual time=0.057..0.057rows=1 loops=1) Index Cond: (ipfrom <= 2130706433) Total runtime: 0.094 ms (4 rows) However, it is rather disappointing that the DB can't figure out how to execute such a simple query in a half decent manner (seq scan on an indexed table for a BETWEEN filter doesn't qualify :)). Many thanks! -- Dimi Paun <dimi@lattica.com> Lattica, Inc.
On Mon, 7 Apr 2008, Heikki Linnakangas wrote: > In that case, a regular index on (ipFrom, ipTo) should work just fine, and > that's what he's got. Actually, an index on just ipFrom would probably work > just as well. The problem is that the planner doesn't know about that special > relationship between ipFrom and ipTo. Perhaps it could be hinted by > explicitly specifying "AND ipTo > ipFrom" in the query? Actually, the problem is that the database doesn't know that the entries don't overlap. For all it knows, you could have data like this: 0 10 10 20 20 30 ... ten million rows later 100000030 100000040 100000040 100000050 0 100000050 So say you wanted to search for the value of 50,000,000. The index on ipFrom would select five million rows, all of which then have to be filtered by the constraint on ipTo. Likewise, an index on ipTo would return five million rows, all of which then have to be filtered by the constraint on ipFrom. If you just read the index and took the closest entry to the value, then you would miss out on the last entry which overlaps with the whole range. An R-tree on both fields will correctly find the small set of entries that are relevant. It would be very cool to be able to create an R-tree index that would just make the original query run fast without needing alteration. I had a look at this a while back, but it is not currently possible in GiST, because only one field is handed to the index at a time. So all the current R-tree implementations require that you generate an object containing the two values, like the box, and then index that. Something for 8.4? Matthew -- $ rm core Segmentation Fault (core dumped)
* Dimi Paun: > * 4.9 million records in a table (IP address info) You should use the ip4r type for that.