Thread: Normal case or bad query plan?
Hi guys, please consider this scenario. I have this table: CREATE TABLE ip2location ( ip_address_from BIGINT NOT NULL, ip_address_to BIGINT NOT NULL, id_location BIGINT NOT NULL, PRIMARY KEY (ip_address_from, ip_address_to) ); I created a cluster on its primary key, by running: CLUSTER ip2location_ip_address_from_key ON ip2location; This allowed me to organise data in a more efficient way: the data that is contained are ranges of IP addresses with empty intersections; for every IP class there is a related location's ID. The total number of entries is 1392443. For every IP address I have, an application retrieves the corresponding location's id from the above table, by running a query like: SELECT id_location FROM ip2location WHERE '11020000111' >= ip_address_from AND '11020000111' <= ip_address_to; For instance, by running the 'EXPLAIN ANALYSE' command, I get this "funny" result: QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) (actual time=5338.120..40237.283 rows=1 loops=1) Filter: ((1040878301::bigint >= ip_address_from) AND (1040878301::bigint <= ip_address_to)) Total runtime: 40237.424 ms With other data, that returns an empty set, I get: explain SELECT id_location FROM ip2location WHERE '11020000111' >= ip_address_from AND '11020000111' <= ip_address_to; QUERY PLAN ------------------------------------------------------------------------------------------------------- Index Scan using ip2location_ip_address_from_key on ip2location (cost=0.00..419.16 rows=140 width=8) Index Cond: ((11020000111::bigint >= ip_address_from) AND (11020000111::bigint <= ip_address_to)) I guess the planner chooses the best of the available options for the first case, the sequential scan. This is not confirmed though by the fact that, after I ran "SET enable_scan TO off", I got this: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using ip2location_ip_address_from_key on ip2location (cost=0.00..31505.73 rows=124781 width=8) (actual time=2780.172..2780.185 rows=1 loops=1) Index Cond: ((1040878301::bigint >= ip_address_from) AND (1040878301::bigint <= ip_address_to)) Total runtime: 2780.359 ms Is this a normal case or should I worry? What am I missing? Do you have any suggestion or comment to do (that would be extremely appreciated)? Is the CLUSTER I created worthwhile or not? Thank you, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Prato, Toscana, Italia angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004
Gabriele Bartolini <angusgb@tin.it> writes: > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------- > Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) > (actual time=5338.120..40237.283 rows=1 loops=1) > Filter: ((1040878301::bigint >= ip_address_from) AND > (1040878301::bigint <= ip_address_to)) > Total runtime: 40237.424 ms > Is this a normal case or should I worry? What am I missing? The striking thing about that is the huge difference between estimated rowcount (124781) and actual (1). The planner would certainly have picked an indexscan if it thought the query would select only one row. I suspect that you haven't ANALYZEd this table in a long time, if ever. You really need reasonably up-to-date ANALYZE stats if you want the planner to do an adequate job of planning range queries. It may well be that you need to increase the analyze statistics target for this table, also --- in BIGINT terms the distribution is probably pretty irregular, which will mean you need finer-grain statistics to get good estimates. (BTW, have you looked at the inet datatype to see if that would fit your needs?) regards, tom lane
On Mon, 11 Oct 2004, Gabriele Bartolini wrote: > --------------------------------------------------------------------------------------------------------------------- > Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) > (actual time=5338.120..40237.283 rows=1 loops=1) > Filter: ((1040878301::bigint >= ip_address_from) AND > (1040878301::bigint <= ip_address_to)) > Total runtime: 40237.424 ms > I believe the problem is that pg's lack of cross-column statistics is producing the poor number of rows estimate. The number of rows mataching just the first 1040878301::bigint >= ip_address_from condition is 122774 which is roughtly 10% of the table. I imagine the query planner believes that the other condition alone will match the other 90% of the table. The problem is that it doesn't know that these two ranges' intersection is actually tiny. The planner assumes a complete or nearly complete overlap so it thinks it will need to fetch 10% of the rows from both the index and the heap and chooses a seqscan. Kris Jurka
Hi Tom, thanks for your interest. At 23.33 11/10/2004, Tom Lane wrote: >Gabriele Bartolini <angusgb@tin.it> writes: > > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------------- > > Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) > > (actual time=5338.120..40237.283 rows=1 loops=1) > > Filter: ((1040878301::bigint >= ip_address_from) AND > > (1040878301::bigint <= ip_address_to)) > > Total runtime: 40237.424 ms > > > Is this a normal case or should I worry? What am I missing? > >The striking thing about that is the huge difference between estimated >rowcount (124781) and actual (1). The planner would certainly have >picked an indexscan if it thought the query would select only one row. > >I suspect that you haven't ANALYZEd this table in a long time, if ever. >You really need reasonably up-to-date ANALYZE stats if you want the >planner to do an adequate job of planning range queries. That's the thing ... I had just peformed a VACUUM ANALYSE :-( > It may well be that you need to increase the analyze statistics target > for this table, >also --- in BIGINT terms the distribution is probably pretty irregular, >which will mean you need finer-grain statistics to get good estimates. You mean ... SET STATISTICS for the two columns, don't you? >(BTW, have you looked at the inet datatype to see if that would fit your >needs?) Yes, I know. In other cases I use it. But this is a type of data coming from an external source (www.ip2location.com) and I can't change it. Thank you so much. I will try to play with the grain of the statistics, otherwise - if worse comes to worst - I will simply disable the seq scan after connecting. -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Prato, Toscana, Italia angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004
Gabriele Bartolini <angusgb@tin.it> writes: > Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) > (actual time=5338.120..40237.283 rows=1 loops=1) > Filter: ((1040878301::bigint >= ip_address_from) AND > (1040878301::bigint <= ip_address_to)) > Total runtime: 40237.424 ms >> >> I suspect that you haven't ANALYZEd this table in a long time, if ever. >> You really need reasonably up-to-date ANALYZE stats if you want the >> planner to do an adequate job of planning range queries. > That's the thing ... I had just peformed a VACUUM ANALYSE :-( In that case I think Kris Jurka had it right: the problem is the planner doesn't know enough about the relationship of the ip_address_from and ip_address_to columns to realize that this is a very selective query. But actually, even *had* it realized that, it would have had little choice but to use a seqscan, because neither of the independent conditions is really very useful as an index condition by itself. Assuming that this problem is representative of your query load, you really need to recast the data representation to make it more readily searchable. I think you might be able to get somewhere by combining ip_address_from and ip_address_to into a single CIDR column and then using the network-overlap operator to probe for matches to your query address. (This assumes that the from/to pairs are actually meant to represent CIDR subnets; if not you need some other idea.) Another possibility is to convert to a geometric type and use an rtree index with an "overlaps" operator. I'm too tired to work out the details, but try searching for "decorrelation" in the list archives to see some related problems. regards, tom lane
Makes sense. See DB2 8.2 info on their new implementation of cross column statistics. If this is common and you're willing to change code, you can fake that by adding a operation index on some hash function of both columns, and search for both columns and the hash. ----- Original Message ----- From: "Kris Jurka" <books@ejurka.com> To: "Gabriele Bartolini" <angusgb@tin.it> Cc: <pgsql-performance@postgresql.org> Sent: Monday, October 11, 2004 5:17 PM Subject: Re: [PERFORM] Normal case or bad query plan? > > > On Mon, 11 Oct 2004, Gabriele Bartolini wrote: > > > -------------------------------------------------------------------------- ------------------------------------------- > > Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) > > (actual time=5338.120..40237.283 rows=1 loops=1) > > Filter: ((1040878301::bigint >= ip_address_from) AND > > (1040878301::bigint <= ip_address_to)) > > Total runtime: 40237.424 ms > > > > I believe the problem is that pg's lack of cross-column statistics is > producing the poor number of rows estimate. The number of rows mataching > just the first 1040878301::bigint >= ip_address_from condition is 122774 > which is roughtly 10% of the table. I imagine the query planner > believes that the other condition alone will match the other 90% of the > table. The problem is that it doesn't know that these two ranges' > intersection is actually tiny. The planner assumes a complete or nearly > complete overlap so it thinks it will need to fetch 10% of the rows from > both the index and the heap and chooses a seqscan. > > Kris Jurka > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
Hi Kris, >I believe the problem is that pg's lack of cross-column statistics is >producing the poor number of rows estimate. The number of rows mataching I got your point now. I had not understood it last night but it makes really sense. >which is roughtly 10% of the table. I imagine the query planner >believes that the other condition alone will match the other 90% of the >table. The problem is that it doesn't know that these two ranges' >intersection is actually tiny. The planner assumes a complete or nearly >complete overlap so it thinks it will need to fetch 10% of the rows from Yep, because it performs those checks separately and it gets 10% for one check and 90% for the other. As Tom says, I should somehow make PostgreSQL see my data as a single entity in order to perform a real range check. I will study some way to obtain it. However, I got better results by specifying the grane of the statistics through "ALTER TABLE ... SET STATISTICS". FYI I set it to 1000 (the maximum) and I reduced the query's estimated time by the 90% (from 40000ms to 4000ms) although much slower than the index scan (200ms). I will play a bit with data types as Tom suggested. For now, thanks anyone who tried and helped me. Ciao, -Gabriele
On Tue, Oct 12, 2004 at 04:29:36PM +0200, Gabriele Bartolini wrote: > FYI I set it to 1000 (the maximum) and I reduced the query's estimated time > by the 90% (from 40000ms to 4000ms) although much slower than the index > scan (200ms). Note that the estimated times are _not_ in ms. They are in multiples of a disk fetch (?). Thus, you can't compare estimated and real times like that. /* Steinar */ -- Homepage: http://www.sesse.net/
This may sound more elaborate than it's worth, but I don't know of a better way to avoid a table scan. You want to index on a computed value that is a common prefix of your FROM and TO fields. The next step is to search on a fixed SET of prefixes of different lengths. For example, some of your ranges might be common in the first 3 bytes of ipaddr, some in two, some in only one. You create and index on one common prefix of either 1,2 or 3 bytes, for each row. Your query then looks something like (pardon my ignorance in PGSQL) select * from ip2location where ip2prefix in ( network(:myaddr || '/8'), network(:myaddr || '/16'), network(:myaddr || '/24'), :myaddr --- assuming single-address ranges are possible ) and :myaddr between ip_address_from and ip_address_to Although this looks a little gross, it hits very few records. It also adapts cleanly to a join between ip2location and a table of ip addrs. Gabriele Bartolini wrote: > Hi guys, > > please consider this scenario. I have this table: > > CREATE TABLE ip2location ( > ip_address_from BIGINT NOT NULL, > ip_address_to BIGINT NOT NULL, > id_location BIGINT NOT NULL, > PRIMARY KEY (ip_address_from, ip_address_to) > ); > > I created a cluster on its primary key, by running: > CLUSTER ip2location_ip_address_from_key ON ip2location; > > This allowed me to organise data in a more efficient way: the data that > is contained are ranges of IP addresses with empty intersections; for > every IP class there is a related location's ID. The total number of > entries is 1392443. > > For every IP address I have, an application retrieves the corresponding > location's id from the above table, by running a query like: > > SELECT id_location FROM ip2location WHERE '11020000111' >= > ip_address_from AND '11020000111' <= ip_address_to; > > For instance, by running the 'EXPLAIN ANALYSE' command, I get this > "funny" result: > > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------- > > Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) > (actual time=5338.120..40237.283 rows=1 loops=1) > Filter: ((1040878301::bigint >= ip_address_from) AND > (1040878301::bigint <= ip_address_to)) > Total runtime: 40237.424 ms > > > With other data, that returns an empty set, I get: > > explain SELECT id_location FROM ip2location WHERE '11020000111' >= > ip_address_from AND '11020000111' <= ip_address_to; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------- > > Index Scan using ip2location_ip_address_from_key on ip2location > (cost=0.00..419.16 rows=140 width=8) > Index Cond: ((11020000111::bigint >= ip_address_from) AND > (11020000111::bigint <= ip_address_to)) > > > I guess the planner chooses the best of the available options for the > first case, the sequential scan. This is not confirmed though by the > fact that, after I ran "SET enable_scan TO off", I got this: > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Index Scan using ip2location_ip_address_from_key on ip2location > (cost=0.00..31505.73 rows=124781 width=8) (actual > time=2780.172..2780.185 rows=1 loops=1) > Index Cond: ((1040878301::bigint >= ip_address_from) AND > (1040878301::bigint <= ip_address_to)) > Total runtime: 2780.359 ms > > > Is this a normal case or should I worry? What am I missing? Do you have > any suggestion or comment to do (that would be extremely appreciated)? > Is the CLUSTER I created worthwhile or not? > > Thank you, > -Gabriele > > -- > Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, > ht://Check maintainer > Current Location: Prato, Toscana, Italia > angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, > The Inferno > > > ------------------------------------------------------------------------ > > > --- > Outgoing mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 > > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)