Thread: Severe performance problems for simple query

Severe performance problems for simple query

From
Dimi Paun
Date:
Hi folks,

Here is the executive summary:
  * RHEL5 (postgresql 8.1, .conf tweaked for performance [1])
  * 2x Intel E5410 @ 2.33GHz (8 cores), 8GB RAM, 15KRPM SAS disks
  * 4.9 million records in a table (IP address info)
  * composite primary key: primary key(ipFrom, ipTo)
  * ipFrom/ipTo are int8 (see below for full schema info [2])
  * bad performance on queries of the form:
    select * from ipTable where  ipFrom <= val and val <= ipTo

Part of the problem is that most of the time PostgreSQL decides to
use seq scans on the table, resulting in queries taking many seconds
(sometimes 3, 7, 20 sec). We did ANALYZE and enabled statistics, and
that sometimes fixes the problem temporarily, but overnight (without
the database being used), it reverts to seq scans. For example:

perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 and 2130706433 <=
ipto;
                                                          QUERY PLAN
       

-------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on ipligenceipaddress  (cost=0.00..139903.80 rows=1209530 width=145) (actual time=1233.628..2100.891 rows=1
loops=1)
   Filter: ((ipfrom <= 2130706433) AND (2130706433 <= ipto))
 Total runtime: 2100.928 ms
(3 rows)



Moreover, even when it is using the index, it is not all that fast:
perpedes_db=# SET enable_seqscan = off;
SET
perpedes_db=# EXPLAIN ANALYZE select * from static.ipligenceipaddress where 3507360727 between ipfrom and ipto;
                                                                         QUERY PLAN
                                    

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ipligenceipaddress_pkey on ipligenceipaddress  (cost=0.00..148143.67 rows=806199 width=146) (actual
time=351.316..351.320rows=1 loops=1) 
   Index Cond: ((3507360727::bigint >= ipfrom) AND (3507360727::bigint <= ipto))
 Total runtime: 351.355 ms
(3 rows)


So, my questions are:
  * did we miss any obvious settings?
  * why does it decide all of a sudden to do seq scans?
  * adding a "limit 1" sometimes causes the query to be even slower,
    when in fact it should have helped the DB to return faster, no?
  * in the ideal case, what execution times should I be expecting?
    Is ~400ms reasonable? I would have hoped this to be <40ms...
  * AFAICT, the (ipFrom, ipTo) intervals should be mutually exclusive,
    so the result should be at most one row. Can this info help the
    DB do a faster query? If so, how can I express that?
  * the DB takes tens of minutes to do an ANALYZE on this table,
    which doesn't happen with the default configuration. Any idea
    how I can fix that?

Thank you!

====================================================================
[1] Changes from standard config:
--- /var/lib/pgsql/data/postgresql.conf.orig    2008-03-21 11:51:45.000000000 -0400
+++ /var/lib/pgsql/data/postgresql.conf 2008-03-21 21:04:38.000000000 -0400
@@ -90,19 +90,19 @@

 # - Memory -

-shared_buffers = 1000                  # min 16 or max_connections*2, 8KB each
-#temp_buffers = 1000                   # min 100, 8KB each
-#max_prepared_transactions = 5         # can be 0 or more
+shared_buffers = 50000                 # min 16 or max_connections*2, 8KB each
+temp_buffers = 10000                   # min 100, 8KB each
+max_prepared_transactions = 100                # can be 0 or more
 # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
 # per transaction slot, plus lock space (see max_locks_per_transaction).
-#work_mem = 1024                       # min 64, size in KB
-#maintenance_work_mem = 16384          # min 1024, size in KB
+work_mem = 2048                                # min 64, size in KB
+maintenance_work_mem = 131072          # min 1024, size in KB
 #max_stack_depth = 2048                        # min 100, size in KB

 # - Free Space Map -

-#max_fsm_pages = 20000                 # min max_fsm_relations*16, 6 bytes each
-#max_fsm_relations = 1000              # min 100, ~70 bytes each
+max_fsm_pages = 200000                 # min max_fsm_relations*16, 6 bytes each
+max_fsm_relations = 10000              # min 100, ~70 bytes each

 # - Kernel Resource Usage -

@@ -111,11 +111,11 @@

 # - Cost-Based Vacuum Delay -

-#vacuum_cost_delay = 0                 # 0-1000 milliseconds
-#vacuum_cost_page_hit = 1              # 0-10000 credits
+vacuum_cost_delay = 200                        # 0-1000 milliseconds
+vacuum_cost_page_hit = 6               # 0-10000 credits
 #vacuum_cost_page_miss = 10            # 0-10000 credits
 #vacuum_cost_page_dirty = 20           # 0-10000 credits
-#vacuum_cost_limit = 200               # 0-10000 credits
+vacuum_cost_limit = 100                        # 0-10000 credits

 # - Background writer -

@@ -141,13 +141,13 @@
                                        #   fsync_writethrough
                                        #   open_sync
 #full_page_writes = on                 # recover from partial page writes
-#wal_buffers = 8                       # min 4, 8KB each
+wal_buffers = 128                      # min 4, 8KB each
 #commit_delay = 0                      # range 0-100000, in microseconds
 #commit_siblings = 5                   # range 1-1000

 # - Checkpoints -

-#checkpoint_segments = 3               # in logfile segments, min 1, 16MB each
+checkpoint_segments = 192              # in logfile segments, min 1, 16MB each
 #checkpoint_timeout = 300              # range 30-3600, in seconds
 #checkpoint_warning = 30               # in seconds, 0 is off

@@ -175,12 +175,12 @@

 # - Planner Cost Constants -

-#effective_cache_size = 1000           # typically 8KB each
-#random_page_cost = 4                  # units are one sequential page fetch
+effective_cache_size = 393216          # typically 8KB each
+random_page_cost = 2                   # units are one sequential page fetch
                                        # cost
-#cpu_tuple_cost = 0.01                 # (same)
-#cpu_index_tuple_cost = 0.001          # (same)
-#cpu_operator_cost = 0.0025            # (same)
+cpu_tuple_cost = 0.002                 # (same)
+cpu_index_tuple_cost = 0.0002          # (same)
+cpu_operator_cost = 0.0005             # (same)

 # - Genetic Query Optimizer -

@@ -329,10 +329,10 @@

 # - Query/Index Statistics Collector -

-#stats_start_collector = on
-#stats_command_string = off
-#stats_block_level = off
-#stats_row_level = off
+stats_start_collector = on
+stats_command_string = on
+stats_block_level = on
+stats_row_level = on
 #stats_reset_on_server_start = off


@@ -340,8 +340,8 @@
 # AUTOVACUUM PARAMETERS
 #---------------------------------------------------------------------------

-#autovacuum = off                      # enable autovacuum subprocess?
+autovacuum = on                                # enable autovacuum subprocess?
 #autovacuum_naptime = 60               # time between autovacuum runs, in secs
 #autovacuum_vacuum_threshold = 1000    # min # of tuple updates before
                                        # vacuum
 #autovacuum_analyze_threshold = 500    # min # of tuple updates before
@@ -400,7 +400,7 @@
 #---------------------------------------------------------------------------

 #deadlock_timeout = 1000               # in milliseconds
-#max_locks_per_transaction = 64                # min 10
+max_locks_per_transaction = 512                # min 10
 # note: each lock table slot uses ~220 bytes of shared memory, and there are
 # max_locks_per_transaction * (max_connections + max_prepared_transactions)
 # lock table slots.


[2] Actual schema for the table:
create table ipligenceIpAddress
(
    ipFrom int8 not null default 0,
    ipTo int8 not null default 0,
    countryCode varchar(10) not null,
    countryName varchar(255) not null,
    continentCode varchar(10) not null,
    continentName varchar(255) not null,
    timeZone varchar(10) not null,
    regionCode varchar(10) not null,
    regionName varchar(255) not null,
    owner varchar(255) not null,
    cityName varchar(255) not null,
    countyName varchar(255) not null,
    latitude float8 not null,
    longitude float8 not null,
    createdTS timestamp with time zone default current_timestamp,
    primary key(ipFrom, ipTo)
);

--
Dimi Paun <dimi@lattica.com>
Lattica, Inc.


Re: Severe performance problems for simple query

From
Matthew
Date:
On Mon, 7 Apr 2008, Dimi Paun wrote:
>  * bad performance on queries of the form:
>    select * from ipTable where  ipFrom <= val and val <= ipTo

This type of query is very hard for a normal B-tree index to answer. For
example, say val is half-way between min and max values. If you have an
index on ipFrom, it will be able to restrict the entries to about half of
them, which is no real benefit over a sequential scan. Likewise, an index
on ipTo will be able to restrict the entries to half of them, with no
benefit. The intersection of these two halves may be just one entry, but
finding that out is non-trivial. An index bitmap scan would do it if you
can persuade Postgres to do that, but really you want an R-tree index on
the two columns, like I have requested in the past.

You can achieve that to some degree by using Postgres' geometric indexes,
but it's ugly. Note that the following is completely untested and may not
work with int8 values.

Firstly, you need to create the index. The index will contain fake "boxes"
that stretch from ipFrom to ipTo.

CREATE INDEX index_name ON table_name ((box '((ipFrom, 0), (ipTo, 1))'))

Then, instead of querying simply for fromIp and toIp, query on whether the
fake box overlaps with a point representing val.

SELECT blah FROM table_name
   WHERE (box '((ipFrom, 0), (ipTo, 2))') @> (point '(val, 1)');

Or alternatively you could adapt the "seg" GiST index to int8 values.

Hope you get this sorted out - it's something I'll have to do at some
point soon too.

Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

Re: Severe performance problems for simple query

From
Matthew
Date:
On Mon, 7 Apr 2008, Dimi Paun wrote:
>  * bad performance on queries of the form:
>    select * from ipTable where  ipFrom <= val and val <= ipTo

Oh yes, if you can guarantee that no two entries overlap at all, then
there is a simpler way. Just create a B-tree index on ipFrom as usual,
sort by ipFrom, and LIMIT to the first result:

SELECT blah FROM table_name
   WHERE ipFrom <= 42 ORDER BY ipFrom DESC LIMIT 1

This should run *very* quickly. However, if any entries overlap at all
then you will get incorrect results.

Matthew

--
I'm always interested when [cold callers] try to flog conservatories.
Anyone who can actually attach a conservatory to a fourth floor flat
stands a marginally better than average chance of winning my custom.
(Seen on Usenet)

Re: Severe performance problems for simple query

From
Heikki Linnakangas
Date:
Matthew wrote:
> On Mon, 7 Apr 2008, Dimi Paun wrote:
>>  * bad performance on queries of the form:
>>    select * from ipTable where  ipFrom <= val and val <= ipTo
>
> This type of query is very hard for a normal B-tree index to answer. For
> example, say val is half-way between min and max values. If you have an
> index on ipFrom, it will be able to restrict the entries to about half
> of them, which is no real benefit over a sequential scan. Likewise, an
> index on ipTo will be able to restrict the entries to half of them, with
> no benefit. The intersection of these two halves may be just one entry,
> but finding that out is non-trivial. An index bitmap scan would do it if
> you can persuade Postgres to do that, but really you want an R-tree
> index on the two columns, like I have requested in the past.

If I understood the original post correctly, the ipFrom and ipTo columns
actually split a single linear ip address space into non-overlapping
chunks. Something like this:

ipFrom    ipTo
1    10
10    20
20    50
50    60
...

In that case, a regular index on (ipFrom, ipTo) should work just fine,
and that's what he's got. Actually, an index on just ipFrom would
probably work just as well. The problem is that the planner doesn't know
about that special relationship between ipFrom and ipTo. Perhaps it
could be hinted by explicitly specifying "AND ipTo > ipFrom" in the query?

I don't know why the single index lookup took > 300ms, though. That does
seem high to me.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Severe performance problems for simple query

From
Dimi Paun
Date:
On Mon, 2008-04-07 at 17:32 +0100, Heikki Linnakangas wrote:
> If I understood the original post correctly, the ipFrom and ipTo
> columns actually split a single linear ip address space into
> non-overlapping  chunks. Something like this:
>
> ipFrom  ipTo
> 1       10
> 10      20
> 20      50
> 50      60
> ...
>

Indeed.

> In that case, a regular index on (ipFrom, ipTo) should work just fine,
> and that's what he's got. Actually, an index on just ipFrom would
> probably work just as well.

No, it doesn't:

perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom);
CREATE INDEX
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 and 2130706433 <= ipto
limit1; 
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.07 rows=1 width=145) (actual time=1519.526..1519.527 rows=1 loops=1)
   ->  Index Scan using temp1 on ipligenceipaddress  (cost=0.00..84796.50 rows=1209308 width=145) (actual
time=1519.524..1519.524rows=1 loops=1) 
         Index Cond: (ipfrom <= 2130706433)
         Filter: (2130706433 <= ipto)
 Total runtime: 1519.562 ms
(5 rows)

This is huge, I'd say...

> The problem is that the planner doesn't know  about that special
> relationship between ipFrom and ipTo. Perhaps it could be hinted by
> explicitly specifying "AND ipTo > ipFrom" in the query?

Unfortunately, it still does a seq scan:

perpedes_db=# SET enable_seqscan = on;
SET
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 and 2130706433 <= ipto
ANDipTo > ipFrom limit 1; 
                                                             QUERY PLAN
            

------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.35 rows=1 width=145) (actual time=1245.293..1245.294 rows=1 loops=1)
   ->  Seq Scan on ipligenceipaddress  (cost=0.00..142343.80 rows=403103 width=145) (actual time=1245.290..1245.290
rows=1loops=1) 
         Filter: ((ipfrom <= 2130706433) AND (2130706433 <= ipto) AND (ipto > ipfrom))
 Total runtime: 1245.335 ms
(4 rows)


> I don't know why the single index lookup took > 300ms, though. That
> does seem high to me.

That is my feeling. I would have expected order of magnitude faster
execution times, the DB runs on fairly decent hardware...

--
Dimi Paun <dimi@lattica.com>
Lattica, Inc.


Re: Severe performance problems for simple query

From
Dimi Paun
Date:
On Mon, 2008-04-07 at 17:27 +0100, Matthew wrote:
> Oh yes, if you can guarantee that no two entries overlap at all, then
> there is a simpler way. Just create a B-tree index on ipFrom as usual,
> sort by ipFrom, and LIMIT to the first result:
>
> SELECT blah FROM table_name
>    WHERE ipFrom <= 42 ORDER BY ipFrom DESC LIMIT 1
>
> This should run *very* quickly. However, if any entries overlap at all
> then you will get incorrect results.

Thanks Matthew, this seems to be indeed a lot faster:

perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom);
CREATE INDEX
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom <= 2130706433 ORDER BY ipFrom DESC
LIMIT1; 
                                                                     QUERY PLAN
                             

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.03 rows=1 width=145) (actual time=0.060..0.060 rows=1 loops=1)
   ->  Index Scan Backward using temp1 on ipligenceipaddress  (cost=0.00..83453.92 rows=2685155 width=145) (actual
time=0.057..0.057rows=1 loops=1) 
         Index Cond: (ipfrom <= 2130706433)
 Total runtime: 0.094 ms
(4 rows)


However, it is rather disappointing that the DB can't figure out
how to execute such a simple query in a half decent manner (seq scan
on an indexed table for a BETWEEN filter doesn't qualify :)).

Many thanks!

--
Dimi Paun <dimi@lattica.com>
Lattica, Inc.


Re: Severe performance problems for simple query

From
Matthew
Date:
On Mon, 7 Apr 2008, Heikki Linnakangas wrote:
> In that case, a regular index on (ipFrom, ipTo) should work just fine, and
> that's what he's got. Actually, an index on just ipFrom would probably work
> just as well. The problem is that the planner doesn't know about that special
> relationship between ipFrom and ipTo. Perhaps it could be hinted by
> explicitly specifying "AND ipTo > ipFrom" in the query?

Actually, the problem is that the database doesn't know that the entries
don't overlap. For all it knows, you could have data like this:

0               10
10              20
20              30
... ten million rows later
100000030       100000040
100000040       100000050
0               100000050

So say you wanted to search for the value of 50,000,000. The index on
ipFrom would select five million rows, all of which then have to be
filtered by the constraint on ipTo. Likewise, an index on ipTo would
return five million rows, all of which then have to be filtered by the
constraint on ipFrom. If you just read the index and took the closest
entry to the value, then you would miss out on the last entry which
overlaps with the whole range. An R-tree on both fields will correctly
find the small set of entries that are relevant.

It would be very cool to be able to create an R-tree index that would just
make the original query run fast without needing alteration. I had a look
at this a while back, but it is not currently possible in GiST, because
only one field is handed to the index at a time. So all the current R-tree
implementations require that you generate an object containing the two
values, like the box, and then index that.

Something for 8.4?

Matthew

--
$ rm core
Segmentation Fault (core dumped)

Re: Severe performance problems for simple query

From
Florian Weimer
Date:
* Dimi Paun:

>   * 4.9 million records in a table (IP address info)

You should use the ip4r type for that.