Thread: Normal case or bad query plan?

Normal case or bad query plan?

From

Gabriele Bartolini

Date:

11 October 2004, 22:06:10

Hi guys,

    please consider this scenario. I have this table:

CREATE TABLE ip2location (
     ip_address_from BIGINT NOT NULL,
     ip_address_to BIGINT NOT NULL,
     id_location BIGINT NOT NULL,
     PRIMARY KEY (ip_address_from, ip_address_to)
);

I created a cluster on its primary key, by running:
CLUSTER ip2location_ip_address_from_key ON ip2location;

This allowed me to organise data in a more efficient way: the data that is
contained are ranges of IP addresses with empty intersections; for every IP
class there is a related location's ID. The total number of entries is 1392443.

For every IP address I have, an application retrieves the corresponding
location's id from the above table, by running a query like:

SELECT id_location FROM ip2location WHERE '11020000111' >= ip_address_from
AND '11020000111' <= ip_address_to;

For instance, by running the 'EXPLAIN ANALYSE' command, I get this "funny"
result:


                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
  Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
(actual time=5338.120..40237.283 rows=1 loops=1)
    Filter: ((1040878301::bigint >= ip_address_from) AND
(1040878301::bigint <= ip_address_to))
  Total runtime: 40237.424 ms


With other data, that returns an empty set, I get:

explain SELECT id_location FROM ip2location WHERE '11020000111' >=
ip_address_from AND '11020000111' <= ip_address_to;

                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------
  Index Scan using ip2location_ip_address_from_key on
ip2location  (cost=0.00..419.16 rows=140 width=8)
    Index Cond: ((11020000111::bigint >= ip_address_from) AND
(11020000111::bigint <= ip_address_to))


I guess the planner chooses the best of the available options for the first
case, the sequential scan. This is not confirmed though by the fact that,
after I ran "SET enable_scan TO off", I got this:
                                                                          QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using ip2location_ip_address_from_key on
ip2location  (cost=0.00..31505.73 rows=124781 width=8) (actual
time=2780.172..2780.185 rows=1 loops=1)
    Index Cond: ((1040878301::bigint >= ip_address_from) AND
(1040878301::bigint <= ip_address_to))
  Total runtime: 2780.359 ms


Is this a normal case or should I worry? What am I missing? Do you have any
suggestion or comment to do (that would be extremely appreciated)? Is the
CLUSTER I created worthwhile or not?

Thank you,
-Gabriele

--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Toscana, Italia
angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004

Re: Normal case or bad query plan?

From

Tom Lane

Date:

11 October 2004, 22:34:10

Gabriele Bartolini <angusgb@tin.it> writes:
>                                                       QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>   Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
> (actual time=5338.120..40237.283 rows=1 loops=1)
>     Filter: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
>   Total runtime: 40237.424 ms

> Is this a normal case or should I worry? What am I missing?

The striking thing about that is the huge difference between estimated
rowcount (124781) and actual (1).  The planner would certainly have
picked an indexscan if it thought the query would select only one row.

I suspect that you haven't ANALYZEd this table in a long time, if ever.
You really need reasonably up-to-date ANALYZE stats if you want the
planner to do an adequate job of planning range queries.  It may well be
that you need to increase the analyze statistics target for this table,
also --- in BIGINT terms the distribution is probably pretty irregular,
which will mean you need finer-grain statistics to get good estimates.

(BTW, have you looked at the inet datatype to see if that would fit your
needs?)

            regards, tom lane

Re: Normal case or bad query plan?

From

Kris Jurka

Date:

12 October 2004, 05:09:09

On Mon, 11 Oct 2004, Gabriele Bartolini wrote:

> ---------------------------------------------------------------------------------------------------------------------
>   Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
> (actual time=5338.120..40237.283 rows=1 loops=1)
>     Filter: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
>   Total runtime: 40237.424 ms
>

I believe the problem is that pg's lack of cross-column statistics is
producing the poor number of rows estimate.  The number of rows mataching
just the first 1040878301::bigint >= ip_address_from condition is 122774
which is roughtly 10% of the table.  I imagine the query planner
believes that the other condition alone will match the other 90% of the
table.  The problem is that it doesn't know that these two ranges'
intersection is actually tiny.  The planner assumes a complete or nearly
complete overlap so it thinks it will need to fetch 10% of the rows from
both the index and the heap and chooses a seqscan.

Kris Jurka

Re: Normal case or bad query plan?

From

Gabriele Bartolini

Date:

12 October 2004, 06:26:31

Hi Tom,

    thanks for your interest.

At 23.33 11/10/2004, Tom Lane wrote:

>Gabriele Bartolini <angusgb@tin.it> writes:
> >                                                       QUERY PLAN
> >
> ---------------------------------------------------------------------------------------------------------------------
> >   Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
> > (actual time=5338.120..40237.283 rows=1 loops=1)
> >     Filter: ((1040878301::bigint >= ip_address_from) AND
> > (1040878301::bigint <= ip_address_to))
> >   Total runtime: 40237.424 ms
>
> > Is this a normal case or should I worry? What am I missing?
>
>The striking thing about that is the huge difference between estimated
>rowcount (124781) and actual (1).  The planner would certainly have
>picked an indexscan if it thought the query would select only one row.
>
>I suspect that you haven't ANALYZEd this table in a long time, if ever.
>You really need reasonably up-to-date ANALYZE stats if you want the
>planner to do an adequate job of planning range queries.

That's the thing ... I had just peformed a VACUUM ANALYSE   :-(


>  It may well be that you need to increase the analyze statistics target
> for this table,
>also --- in BIGINT terms the distribution is probably pretty irregular,
>which will mean you need finer-grain statistics to get good estimates.

You mean ... SET STATISTICS for the two columns, don't you?

>(BTW, have you looked at the inet datatype to see if that would fit your
>needs?)

Yes, I know. In other cases I use it. But this is a type of data coming
from an external source (www.ip2location.com) and I can't change it.

Thank you so much. I will try to play with the grain of the statistics,
otherwise - if worse comes to worst - I will simply disable the seq scan
after connecting.

-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Toscana, Italia
angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004

Re: Normal case or bad query plan?

From

Tom Lane

Date:

12 October 2004, 06:46:40

Gabriele Bartolini <angusgb@tin.it> writes:
> Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
> (actual time=5338.120..40237.283 rows=1 loops=1)
> Filter: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
> Total runtime: 40237.424 ms
>>
>> I suspect that you haven't ANALYZEd this table in a long time, if ever.
>> You really need reasonably up-to-date ANALYZE stats if you want the
>> planner to do an adequate job of planning range queries.

> That's the thing ... I had just peformed a VACUUM ANALYSE   :-(

In that case I think Kris Jurka had it right: the problem is the planner
doesn't know enough about the relationship of the ip_address_from and
ip_address_to columns to realize that this is a very selective query.
But actually, even *had* it realized that, it would have had little
choice but to use a seqscan, because neither of the independent
conditions is really very useful as an index condition by itself.

Assuming that this problem is representative of your query load, you
really need to recast the data representation to make it more readily
searchable.  I think you might be able to get somewhere by combining
ip_address_from and ip_address_to into a single CIDR column and then
using the network-overlap operator to probe for matches to your query
address.  (This assumes that the from/to pairs are actually meant to
represent CIDR subnets; if not you need some other idea.)  Another
possibility is to convert to a geometric type and use an rtree index
with an "overlaps" operator.  I'm too tired to work out the details,
but try searching for "decorrelation" in the list archives to see some
related problems.

            regards, tom lane

Re: Normal case or bad query plan?

From

"Aaron Werman"

Date:

12 October 2004, 13:21:15

Makes sense. See DB2 8.2 info on their new implementation of cross column
statistics. If this is common and you're willing to change code, you can
fake that by adding a operation index on some hash function of both columns,
and search for both columns and the hash.

----- Original Message -----
From: "Kris Jurka" <books@ejurka.com>
To: "Gabriele Bartolini" <angusgb@tin.it>
Cc: <pgsql-performance@postgresql.org>
Sent: Monday, October 11, 2004 5:17 PM
Subject: Re: [PERFORM] Normal case or bad query plan?


>
>
> On Mon, 11 Oct 2004, Gabriele Bartolini wrote:
>
>
> --------------------------------------------------------------------------
-------------------------------------------
> >   Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
> > (actual time=5338.120..40237.283 rows=1 loops=1)
> >     Filter: ((1040878301::bigint >= ip_address_from) AND
> > (1040878301::bigint <= ip_address_to))
> >   Total runtime: 40237.424 ms
> >
>
> I believe the problem is that pg's lack of cross-column statistics is
> producing the poor number of rows estimate.  The number of rows mataching
> just the first 1040878301::bigint >= ip_address_from condition is 122774
> which is roughtly 10% of the table.  I imagine the query planner
> believes that the other condition alone will match the other 90% of the
> table.  The problem is that it doesn't know that these two ranges'
> intersection is actually tiny.  The planner assumes a complete or nearly
> complete overlap so it thinks it will need to fetch 10% of the rows from
> both the index and the heap and chooses a seqscan.
>
> Kris Jurka
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Normal case or bad query plan?

From

"Gabriele Bartolini"

Date:

12 October 2004, 15:29:47

Hi Kris,

>I believe the problem is that pg's lack of cross-column statistics is
>producing the poor number of rows estimate.  The number of rows mataching

I got your point now. I had not understood it last night but it makes really
sense.

>which is roughtly 10% of the table.  I imagine the query planner
>believes that the other condition alone will match the other 90% of the

>table.  The problem is that it doesn't know that these two ranges'
>intersection is actually tiny.  The planner assumes a complete or nearly
>complete overlap so it thinks it will need to fetch 10% of the rows from

Yep, because it performs those checks separately and it gets 10% for one
check and 90% for the other.

As Tom says, I should somehow make PostgreSQL see my data as a single entity
in order to perform a real range check. I will study some way to obtain
it.

However, I got better results by specifying the grane of the statistics
through "ALTER TABLE ... SET STATISTICS".

FYI I set it to 1000 (the maximum) and I reduced the query's estimated time
by the 90% (from 40000ms to 4000ms) although much slower than the index
scan (200ms).

I will play a bit with data types as Tom suggested.

For now, thanks anyone who tried and helped me.

Ciao,
-Gabriele

Re: Normal case or bad query plan?

From

"Steinar H. Gunderson"

Date:

12 October 2004, 15:41:12

On Tue, Oct 12, 2004 at 04:29:36PM +0200, Gabriele Bartolini wrote:
> FYI I set it to 1000 (the maximum) and I reduced the query's estimated time
> by the 90% (from 40000ms to 4000ms) although much slower than the index
> scan (200ms).

Note that the estimated times are _not_ in ms. They are in multiples of a
disk fetch (?). Thus, you can't compare estimated and real times like that.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Normal case or bad query plan?

From

Mischa Sandberg

Date:

19 October 2004, 22:03:21

This may sound more elaborate than it's worth, but I don't know of
a better way to avoid a table scan.

You want to index on a computed value that is a common prefix of your
FROM and TO fields.

The next step is to search on a fixed SET of prefixes of different
lengths. For example, some of your ranges might be common in the first 3
bytes of ipaddr, some in two, some in only one.

You create and index on one common prefix of either 1,2 or 3 bytes, for
each row.

Your query then looks something like (pardon my ignorance in PGSQL)

    select    *
    from    ip2location
    where    ip2prefix in (
        network(:myaddr || '/8'),
        network(:myaddr || '/16'),
        network(:myaddr || '/24'),
        :myaddr --- assuming single-address ranges are possible
        )
    and :myaddr between ip_address_from and ip_address_to

Although this looks a little gross, it hits very few records.
It also adapts cleanly to a join between ip2location and a table of
ip addrs.

Gabriele Bartolini wrote:
> Hi guys,
>
>    please consider this scenario. I have this table:
>
> CREATE TABLE ip2location (
>     ip_address_from BIGINT NOT NULL,
>     ip_address_to BIGINT NOT NULL,
>     id_location BIGINT NOT NULL,
>     PRIMARY KEY (ip_address_from, ip_address_to)
> );
>
> I created a cluster on its primary key, by running:
> CLUSTER ip2location_ip_address_from_key ON ip2location;
>
> This allowed me to organise data in a more efficient way: the data that
> is contained are ranges of IP addresses with empty intersections; for
> every IP class there is a related location's ID. The total number of
> entries is 1392443.
>
> For every IP address I have, an application retrieves the corresponding
> location's id from the above table, by running a query like:
>
> SELECT id_location FROM ip2location WHERE '11020000111' >=
> ip_address_from AND '11020000111' <= ip_address_to;
>
> For instance, by running the 'EXPLAIN ANALYSE' command, I get this
> "funny" result:
>
>
>                                                      QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------

>
>  Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8)
> (actual time=5338.120..40237.283 rows=1 loops=1)
>    Filter: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
>  Total runtime: 40237.424 ms
>
>
> With other data, that returns an empty set, I get:
>
> explain SELECT id_location FROM ip2location WHERE '11020000111' >=
> ip_address_from AND '11020000111' <= ip_address_to;
>
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>
>  Index Scan using ip2location_ip_address_from_key on ip2location
> (cost=0.00..419.16 rows=140 width=8)
>    Index Cond: ((11020000111::bigint >= ip_address_from) AND
> (11020000111::bigint <= ip_address_to))
>
>
> I guess the planner chooses the best of the available options for the
> first case, the sequential scan. This is not confirmed though by the
> fact that, after I ran "SET enable_scan TO off", I got this:
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>  Index Scan using ip2location_ip_address_from_key on ip2location
> (cost=0.00..31505.73 rows=124781 width=8) (actual
> time=2780.172..2780.185 rows=1 loops=1)
>    Index Cond: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
>  Total runtime: 2780.359 ms
>
>
> Is this a normal case or should I worry? What am I missing? Do you have
> any suggestion or comment to do (that would be extremely appreciated)?
> Is the CLUSTER I created worthwhile or not?
>
> Thank you,
> -Gabriele
>
> --
> Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member,
> ht://Check maintainer
> Current Location: Prato, Toscana, Italia
> angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
>  > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy,
> The Inferno
>
>
> ------------------------------------------------------------------------
>
>
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)