Thread: Force a merge join?
Hello, [7.2.1, Debian/woody kernel 2.2.20] I have an extremely slow running query which uses a nested loop. I would like to force it to use a merge join instead, but I have been unable to figure out how to fool the optimizer to do that. Alternately, I would welcome suggestions on how to speed up the query. Yes, I have tried turning off enable_nestloop and playing with the tuple costs somewhat. When I say extremely slowly, I mean it. We're talking 280 seconds for a query which returns 3000 rows out of a possible ~800,000 using two parts of the same table (joined together) of sizes 500 and 15000 or so. Please find below some details on my query, the explain, the table and the indices. Some columns not referenced have been omitted. Note that I just increased the RAM on the database server to 2 gigs and allocated almost 1.4 gigs to shared buffers. All my tables and indices total size (for now) is about 600 megs, so everything in theory could be running from memory. I have a continuous vmstat running and show no swapping, and no "blocks in" activity indicating it is indeed running fully from memory. The INTENT of the query is to take a set of e-mail addresses (list A) and another set (list B) and find the intersection - that is, all entries in A which have a corresponding entry in B based upon having the same e-mail address. This is the last one part of a larger query (which is otherwise very fast) which selects parts of list A and then removes (via combining with EXCEPT) the remaining ones in the query below. These are then inserted into yet another table so the whole query looks like INSERT ... SELECT ... EXCEPT SELECT ... where the last SELECT is the slow query I show below. By breaking the query into a bunch (about 40) of queries without any ORs and UNIONing them, I can get about a 3-4 times improvement, but that still means this is an 80 second query, and is not scalable to the case where it can't be broken into multiple queries. Query: -- Note the odd 745 is because this is used in an INSERT ... SELECT statement -- Also note that there is no difference if I use IN (1,2,3,4) syntax SELECT 745, a.list_entry_id FROM list_entries a, list_entries b WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR a.list_id=144) AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418) AND LOWER(a.email) = LOWER(b.email); psql's \d on the table name and indices: (use fixed-width font) Table "list_entries" Column | Type | Modifiers ---------------+--------------------------+-------------------------------------------------------------------- list_id | integer | not null list_entry_id | integer | not null default nextval('"list_entries_list_entry_id_seq"'::text) email | character varying(64) | (about a dozen more mostly varchar fields not shown) Indexes: list_id_idx, lower_email_idx Primary key: list_entries_pkey Triggers: RI_ConstraintTrigger_1572852 Index "list_id_idx" Column | Type ---------+--------- list_id | integer btree Index "lower_email_idx" Column | Type --------+------ lower | text btree Query analysis: -> Nested Loop (cost=0.00..219086859.00 rows=1585343 width=52) (actual time=0.27..294568.88 rows=2970 loops=1) -> Index Scan using list_id_idx, list_id_idx, list_id_idx, list_id_idx, list_id_idx on list_entries a (cost=0.00..29025.99 rows=14115 width=28) (actual time=0.05..176.79 rows=15859 loops=1) -> Index Scan using lower_email_idx on list_entries b (cost=0.00..15513.40 rows=112 width=24) (actual time=0.69..18.55 rows=0 loops=15859) Database optimizer settings: NOTICE: cpu_operator_cost is 0.0025 NOTICE: cpu_tuple_cost is 0.05 NOTICE: cpu_index_tuple_cost is 0.001 All enable_X options are ON Many thanks, Doug
On Wed, May 15, 2002 at 03:31:30PM -0400, Doug Fields wrote: [Much snipped about mergejoins] > AND LOWER(a.email) = LOWER(b.email); There's your problem. You're not comparing the two columns, you're comparing the two columns after running through a function, so it can't use the index. Try creating an index on LOWER(email) instead of just email. HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
Martijn van Oosterhout <kleptog@svana.org> writes: > On Wed, May 15, 2002 at 03:31:30PM -0400, Doug Fields wrote: > [Much snipped about mergejoins] >> AND LOWER(a.email) = LOWER(b.email); > There's your problem. You're not comparing the two columns, you're comparing > the two columns after running through a function, so it can't use the index. > Try creating an index on LOWER(email) instead of just email. I don't think that will help :-( ... (in fact, it kinda looked like he'd done that already, though surely the version he's using isn't saying so explicitly). The current version of the optimizer is not bright enough to do either merge or hash joins on join expressions more complex than var1 = var2. Improving this is on the TODO list ...but in the meantime I wonder why you couldn't force an email-address column to lower case when you store it, so as to simplify the join problem. The RFCs nominally allow the local-part of an email address to be case sensitive, but in practice there is no one who really expects a case-sensitive email address to work. regards, tom lane
At 12:24 AM 5/16/02 -0400, Tom Lane wrote: >Improving this is on the TODO list ...but in the meantime I wonder why >you couldn't force an email-address column to lower case when you store >it, so as to simplify the join problem. The RFCs nominally allow the >local-part of an email address to be case sensitive, but in practice >there is no one who really expects a case-sensitive email address to >work. Could use an extra column - so you can keep one column for display and one for matching/searches. So it would be easier to treat the following addresses as identical without having to do it all in an SQL function: Doug Fields <dfields-pg-general@pexicom.com> Doug Fields <dfields-pg-general@pexicom.com.> (D Fields ) dfields-pg-general@PEXICOM.COM Cheerio, Link.
At 09:00 PM 5/15/2002, Martijn van Oosterhout wrote: >[Much snipped about mergejoins] > > > AND LOWER(a.email) = LOWER(b.email); > >There's your problem. You're not comparing the two columns, you're comparing >the two columns after running through a function, so it can't use the index. > >Try creating an index on LOWER(email) instead of just email. Thanks. It actually is, already. I noticed that the \d lower_email_idx displays: pexitest=# \d lower_email_idx Index "lower_email_idx" Column | Type --------+------ lower | text btree But if I look at a pg_dump, you can see I covered this base already (I indexed every column used in the majority of searches by turning on query debug and then sort | uniq them, a few months ago): -- Name: "lower_email_idx" Type: INDEX Owner: dfields -- CREATE INDEX lower_email_idx ON list_entries USING btree (lower(email)); So - that's not the problem. Although I did run a test on a new table, where I created an additional column called lower_email and set it accordingly - and it does do the merge join if you set enable_nestloop=off (but not if it is on). However, I don't want to store the same data twice... Other ideas, please? Cheers, Doug
I am having the exact same problem, although setting enable-nestloop = off fixes my problem. Does it not affect yours? Anyway, I occasionally recreate and reload my entire database. When I do this, the planner is flying blind and chooses mergejoin. As soon as I vaccum analyze, it chooses nested loop and certain queries take 1.5 days to complete. If I setenable_nestloop off, they take seconds. Does yours act like this? That is, if you can reload the data in the affected tables so the planner uses default values,does it choose merge join? Tom had indicated he would be interested in why this happens. I can forward my schemaand another example to the group if anyone wants. Ian A. Harding Programmer/Analyst II Tacoma-Pierce County Health Department (253) 798-3549 mailto: iharding@tpchd.org >>> Doug Fields <dfields-pg-general@pexicom.com> 05/15/02 12:31PM >>> Hello, [7.2.1, Debian/woody kernel 2.2.20] I have an extremely slow running query which uses a nested loop. I would like to force it to use a merge join instead, but I have been unable to figure out how to fool the optimizer to do that. Alternately, I would welcome suggestions on how to speed up the query. Yes, I have tried turning off enable_nestloop and playing with the tuple costs somewhat. When I say extremely slowly, I mean it. We're talking 280 seconds for a query which returns 3000 rows out of a possible ~800,000 using two parts of the same table (joined together) of sizes 500 and 15000 or so. Please find below some details on my query, the explain, the table and the indices. Some columns not referenced have been omitted. Note that I just increased the RAM on the database server to 2 gigs and allocated almost 1.4 gigs to shared buffers. All my tables and indices total size (for now) is about 600 megs, so everything in theory could be running from memory. I have a continuous vmstat running and show no swapping, and no "blocks in" activity indicating it is indeed running fully from memory. The INTENT of the query is to take a set of e-mail addresses (list A) and another set (list B) and find the intersection - that is, all entries in A which have a corresponding entry in B based upon having the same e-mail address. This is the last one part of a larger query (which is otherwise very fast) which selects parts of list A and then removes (via combining with EXCEPT) the remaining ones in the query below. These are then inserted into yet another table so the whole query looks like INSERT ... SELECT ... EXCEPT SELECT ... where the last SELECT is the slow query I show below. By breaking the query into a bunch (about 40) of queries without any ORs and UNIONing them, I can get about a 3-4 times improvement, but that still means this is an 80 second query, and is not scalable to the case where it can't be broken into multiple queries. Query: -- Note the odd 745 is because this is used in an INSERT ... SELECT statement -- Also note that there is no difference if I use IN (1,2,3,4) syntax SELECT 745, a.list_entry_id FROM list_entries a, list_entries b WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR a.list_id=144) AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418) AND LOWER(a.email) = LOWER(b.email); psql's \d on the table name and indices: (use fixed-width font) Table "list_entries" Column | Type | Modifiers ---------------+--------------------------+-------------------------------------------------------------------- list_id | integer | not null list_entry_id | integer | not null default nextval('"list_entries_list_entry_id_seq"'::text) email | character varying(64) | (about a dozen more mostly varchar fields not shown) Indexes: list_id_idx, lower_email_idx Primary key: list_entries_pkey Triggers: RI_ConstraintTrigger_1572852 Index "list_id_idx" Column | Type ---------+--------- list_id | integer btree Index "lower_email_idx" Column | Type --------+------ lower | text btree Query analysis: -> Nested Loop (cost=0.00..219086859.00 rows=1585343 width=52) (actual time=0.27..294568.88 rows=2970 loops=1) -> Index Scan using list_id_idx, list_id_idx, list_id_idx, list_id_idx, list_id_idx on list_entries a (cost=0.00..29025.99 rows=14115 width=28) (actual time=0.05..176.79 rows=15859 loops=1) -> Index Scan using lower_email_idx on list_entries b (cost=0.00..15513.40 rows=112 width=24) (actual time=0.69..18.55 rows=0 loops=15859) Database optimizer settings: NOTICE: cpu_operator_cost is 0.0025 NOTICE: cpu_tuple_cost is 0.05 NOTICE: cpu_index_tuple_cost is 0.001 All enable_X options are ON Many thanks, Doug ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Hi Ian, At 01:06 PM 5/17/2002, you wrote: >I am having the exact same problem, although setting enable-nestloop = off >fixes my problem. Does it not affect yours? No, in my case, I'm using LOWER(x) = LOWER(y) which precludes merge joins - however, when I refactor the DB in a test situation to do x = y, as another person has mentioned, it can be forced into merge joins. >Anyway, I occasionally recreate and reload my entire database. When I do >this, the planner is flying blind and chooses merge join. As soon as I >vaccum analyze, it chooses nested loop and certain queries take 1.5 days >to complete. If I set enable_nestloop off, they take seconds. > >Does yours act like this? That is, if you can reload the data in the >affected tables so the planner uses default values, does it choose merge >join? Tom had indicated he would be interested in why this happens. I >can forward my schema and another example to the group if anyone wants. In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO ... to copy my real table to a testing table so I could refactor it. Then I did the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM) and voila - nested loops and half second queries turning into five minute nightmares. Then enable_nestloop would fix the problem again after that. I played with some of the CPU TUPLE parameters but couldn't get it to force merge joins without giving really ridiculous values which would doubtlessly screw other things up. Cheers, Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO ... > to copy my real table to a testing table so I could refactor it. Then I did > the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I > did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM) and > voila - nested loops and half second queries turning into five minute > nightmares. Then enable_nestloop would fix the problem again after that. Could we see the usual details here? Before and after EXPLAIN ANALYZE, and the schemas and pg_stats rows for the tables involved. BTW, you don't really have to reload a table to get back to the "unanalyzed" condition for testing purposes. You can just manually delete the rows from pg_statistic: delete from pg_statistic where starelid = (select oid from pg_class where relname = 'mytable'); regards, tom lane
At 04:19 PM 5/18/2002, Tom Lane wrote: >Doug Fields <dfields-pg-general@pexicom.com> writes: > > In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO > ... > > to copy my real table to a testing table so I could refactor it. Then I > did > > the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I > > did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM) > and > > voila - nested loops and half second queries turning into five minute > > nightmares. Then enable_nestloop would fix the problem again after that. > >Could we see the usual details here? Before and after EXPLAIN ANALYZE, >and the schemas and pg_stats rows for the tables involved. Hi Tom, I gave some of this information in my first post, but I will repeat it here with the pg_stats and the exact before and after information. Thanks. delete from pg_statistic where starelid = (select oid from pg_class where relname = 'test_list_entries'); EXPLAIN ANALYZE SELECT 745, a.list_entry_id FROM test_list_entries a, test_list_entries b WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR a.list_id=144) AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418) AND a.lower_email = b.lower_email; NOTICE: QUERY PLAN: Merge Join (cost=1926.12..1940.21 rows=247 width=140) (actual time=448.32..521.99 rows=3674 loops=1) -> Sort (cost=739.66..739.66 rows=176 width=72) (actual time=437.18..447.71 rows=15859 loops=1) -> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries a (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32 rows=15859 loops=1) -> Sort (cost=1186.45..1186.45 rows=280 width=68) (actual time=11.12..13.67 rows=3783 loops=1) -> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries b (cost=0.00..1175.08 rows=280 width=68) (actual time=0.06..4.75 rows=573 loops=1) Total runtime: 528.83 msec ANALYZE; EXPLAIN SELECT 745, a.list_entry_id FROM test_list_entries a, test_list_entries b WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR a.list_id=144) AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418) AND a.lower_email = b.lower_email; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1176.19 rows=4 width=140) -> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries a (cost=0.00..454.33 rows=84 width=72) -> Index Scan using test_lower_email_idx on test_list_entries b (cost=0.00..8.52 rows=1 width=68) EXPLAIN ANALYZE same thing... NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1176.19 rows=4 width=140) (actual time=11.52..388320.86 rows=3674 loops=1) -> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries a (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30 rows=15859 loops=1) -> Index Scan using test_lower_email_idx on test_list_entries b (cost=0.00..8.52 rows=1 width=68) (actual time=14.86..24.47 rows=0 loops=15859) Total runtime: 388330.10 msec select attname, null_frac, avg_width, n_distinct, most_common_freqs, correlation from pg_stats where tablename = 'test_list_entries'; -- Note, I removed some irrelevant rows attname | null_frac | avg_width | n_distinct | most_common_freqs | correlation ---------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+------------- list_id | 0 | 4 | 184 | {0.383667,0.115667,0.0163333,0.0126667,0.0113333,0.01,0.00966667,0.00933333,0.009,0.009} | 0.842899 list_entry_id | 0 | 4 | -1 | | 0.719433 lower_email | 0.387667 | 68 | -0.38239 | {0.0156667,0.000666667,0.000666667,0.000666667,0.000666667} | 0.00150877 (27 rows) \d test_list_entries -- Note I removed some irrelevant columns Table "test_list_entries" Column | Type | Modifiers ---------------+--------------------------+----------- list_id | integer | list_entry_id | integer | lower_email | character(64) | Indexes: test_list_entries_pkey, test_list_id_idx, test_lower_email_idx Cheers, Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > [ unanalyzed ] > -> Index Scan using test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries > a (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32 > rows=15859 loops=1) > [ after analyze ] > -> Index Scan using test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries > a (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30 > rows=15859 loops=1) The major problem clearly is the horribly bad estimate on the selectivity of the clause WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR a.list_id=144) This is showing that the planner estimated 84 matching rows (vs. 176 with no stats!) whereas it was really 15859. > select attname, null_frac, avg_width, n_distinct, most_common_freqs, > correlation from pg_stats where tablename = 'test_list_entries'; Could we see the whole pg_stats row for list_id? In particular I was wondering if any of the list_id values being selected for appear in most_common_vals. regards, tom lane
At 06:31 PM 5/18/2002, Tom Lane wrote: [Analysis omitted] >The major problem clearly is the horribly bad estimate on the >selectivity of the clause > WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR > a.list_id=147 OR a.list_id=144) >This is showing that the planner estimated 84 matching rows (vs. 176 >with no stats!) whereas it was really 15859. I find that this is actually fairly typical, where the row estimates and actual rows are off by orders of magnitudes. Some info on this table: pexitest=# select count(distinct list_id) from test_list_entries; select count(*) from test_list_ count ------- 308 (1 row) pexitest=# select count(*) from test_list_entries; count -------- 800576 (1 row) Indicating that the safest assumption based upon no information is that each list_id has about 2600 records associated with it. >Could we see the whole pg_stats row for list_id? In particular I was >wondering if any of the list_id values being selected for appear in >most_common_vals. Absolutely: select * from pg_stats where tablename = 'test_list_entries' and attname = 'list_id'; tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -------------------+---------+-----------+-----------+------------+------------------------------------------+--------------------------------------------------------------------------------------------------+---------------------------------------------+------------- test_list_entries | list_id | 0 | 4 | 189 | {38,192,369,330,332,501,229,493,319,424} | {0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333} | {5,138,154,224,296,315,342,371,439,460,505} | 0.839262 (1 row) Many thanks, Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > At 04:19 PM 5/18/2002, Tom Lane wrote: >> Doug Fields <dfields-pg-general@pexicom.com> writes: > In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO >> ... > to copy my real table to a testing table so I could refactor it. Then I >> did > the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I > did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM) >> and > voila - nested loops and half second queries turning into five minute > nightmares. Then enable_nestloop would fix the problem again after that. >> >> Could we see the usual details here? Before and after EXPLAIN ANALYZE, >> and the schemas and pg_stats rows for the tables involved. > Hi Tom, > I gave some of this information in my first post, but I will repeat it here > with the pg_stats and the exact before and after information. Thanks. > delete from pg_statistic where > starelid = (select oid from pg_class where relname = > 'test_list_entries'); > EXPLAIN ANALYZE SELECT 745, a.list_entry_id > FROM test_list_entries a, test_list_entries b > WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR > a.list_id=147 OR a.list_id=144) > AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR > b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR > b.list_id=418) > AND a.lower_email = b.lower_email; > NOTICE: QUERY PLAN: > Merge Join (cost=1926.12..1940.21 rows=247 width=140) (actual > time=448.32..521.99 rows=3674 loops=1) > -> Sort (cost=739.66..739.66 rows=176 width=72) (actual > time=437.18..447.71 rows=15859 loops=1) > -> Index Scan using test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries > a (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32 > rows=15859 loops=1) > -> Sort (cost=1186.45..1186.45 rows=280 width=68) (actual > time=11.12..13.67 rows=3783 loops=1) > -> Index Scan using test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx on test_list_entries > b (cost=0.00..1175.08 rows=280 width=68) (actual time=0.06..4.75 rows=573 > loops=1) > Total runtime: 528.83 msec > ANALYZE; > EXPLAIN SELECT 745, a.list_entry_id > FROM test_list_entries a, test_list_entries b > WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR > a.list_id=147 OR a.list_id=144) > AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR > b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR > b.list_id=418) > AND a.lower_email = b.lower_email; > NOTICE: QUERY PLAN: > Nested Loop (cost=0.00..1176.19 rows=4 width=140) > -> Index Scan using test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries > a (cost=0.00..454.33 rows=84 width=72) > -> Index Scan using test_lower_email_idx on test_list_entries > b (cost=0.00..8.52 rows=1 width=68) > EXPLAIN ANALYZE same thing... > NOTICE: QUERY PLAN: > Nested Loop (cost=0.00..1176.19 rows=4 width=140) (actual > time=11.52..388320.86 rows=3674 loops=1) > -> Index Scan using test_list_id_idx, test_list_id_idx, > test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries > a (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30 > rows=15859 loops=1) > -> Index Scan using test_lower_email_idx on test_list_entries > b (cost=0.00..8.52 rows=1 width=68) (actual time=14.86..24.47 rows=0 > loops=15859) > Total runtime: 388330.10 msec > select attname, null_frac, avg_width, n_distinct, most_common_freqs, > correlation from pg_stats where tablename = 'test_list_entries'; > -- Note, I removed some irrelevant rows > attname | null_frac | avg_width | n_distinct > | most_common_freqs > | correlation > ---------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+------------- > list_id | 0 | 4 | 184 | > {0.383667,0.115667,0.0163333,0.0126667,0.0113333,0.01,0.00966667,0.00933333,0.009,0.009} > | 0.842899 > list_entry_id | 0 | 4 | -1 > | > | 0.719433 > lower_email | 0.387667 | 68 | -0.38239 | > {0.0156667,0.000666667,0.000666667,0.000666667,0.000666667} > | 0.00150877 > (27 rows) > \d test_list_entries > -- Note I removed some irrelevant columns > Table "test_list_entries" > Column | Type | Modifiers > ---------------+--------------------------+----------- > list_id | integer | > list_entry_id | integer | > lower_email | character(64) | > Indexes: test_list_entries_pkey, > test_list_id_idx, > test_lower_email_idx > Cheers, > Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > I find that this is actually fairly typical, where the row estimates and > actual rows are off by orders of magnitudes. One would like to think that 7.2 is better than previous releases, especially for relatively simple queries such as this. > pexitest=# select count(distinct list_id) from test_list_entries; > count > ------- > 308 > (1 row) As opposed to the pg_stats estimate of 189 ... not too bad, really. Is the most-common-values distribution shown in the pg_stats output reasonably correct? (Specifically, 38 has nearly 40% of the entries, 192 another 12%, and everything else 1.5% or less) Another question is exactly what version you are running. I tried plugging the stats values you gave into pg_statistic by hand, and got this plan from current sources: Merge Join (cost=29720.79..29843.05 rows=423 width=140) Merge Cond: ("outer".lower_email = "inner".lower_email) -> Sort (cost=11371.23..11393.77 rows=9016 width=72) Sort Key: a.lower_email -> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx ontest_list_entries a (cost=0.00..10565.78 rows=9016 width=72) Index Cond: ((list_id = 148) OR (list_id = 146) OR (list_id = 145) OR (list_id = 147) OR (list_id = 144)) -> Sort (cost=18349.56..18385.50 rows=14377 width=68) Sort Key: b.lower_email -> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx,test_list_id_idx, test_list_id_idx on test_list_entries b (cost=0.00..17013.92 rows=14377 width=68) Index Cond: ((list_id = 247) OR (list_id = 433) OR (list_id = 249) OR (list_id = 434) OR (list_id = 238) OR(list_id = 340) OR (list_id = 339) OR (list_id = 418)) (10 rows) ie, it's estimating about 1800 matches per list_id value, which seems pretty reasonable given that it knows none of these values are in the most_common list. Now I don't see anything in the CVS logs to suggest that the estimation of this query would've changed since late in 7.2 beta cycle, so I'm confused why you don't get similar results. regards, tom lane
> > 308 > >As opposed to the pg_stats estimate of 189 ... not too bad, really. >Is the most-common-values distribution shown in the pg_stats output >reasonably correct? (Specifically, 38 has nearly 40% of the entries, >192 another 12%, and everything else 1.5% or less) The output is actually pretty in terms of which should be in the top 10, but not which exactly: pexitest=# select list_id, count(*) from test_list_entries group by list_id order by count(*) DESC; list_id | count ---------+-------- 192 | 173290 330 | 16174 501 | 14054 369 | 12659 229 | 12654 332 | 11429 342 | 10982 454 | 10404 493 | 9835 424 | 9778 460 | 9707 38 | 9454 331 | 9355 323 | 9232 319 | 9164 So that correlates pretty well with it's guesses as to the top 10: {38,192,369,330,332,501,229,493,319,424} But not so well as to their relative distributions: {0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333} Some other stats: select avg(count) from (select list_id, count(*) from test_list_entries group by list_id order by count(*) DESC) as a; avg ----------------- 2599.2727272727 (1 row) pexitest=# select stddev(count) from (select list_id, count(*) from test_list_entries group by list_id order by count(*) DESC) as a; stddev ------------------ 10160.4314402693 (1 row) >Another question is exactly what version you are running. I tried >plugging the stats values you gave into pg_statistic by hand, and >got this plan from current sources: I'm running 7.2.1 as packaged and distributed in the Debian/woody 7.2.1-2 distribution. >ie, it's estimating about 1800 matches per list_id value, which seems >pretty reasonable given that it knows none of these values are in the >most_common list. Now I don't see anything in the CVS logs to suggest >that the estimation of this query would've changed since late in 7.2 >beta cycle, so I'm confused why you don't get similar results. It would be nice if I could get it to use a query such as the ones I gave above to put exact values into the analysis. Or, if I could tell it to do a more detailed sampling during ANALYZE. I could also tell it to keep more than the top 10 in the statistics table (SET STATISTICS), but I'm not sure what it would buy me, other than forcing a larger sample (but not knowing how much larger). How much would I slow the ANALYZE statement, and more importantly, the query optimizer, if I told it to keep statistics on the top 200 instead of the default 10 values? In the mean time, I've surrounded my code with SET ENABLE_NESTLOOP=OFF and ON blocks, which should force merge joins. I appreciate your insight on this matter and others Tom. Thanks! Cheers, Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > So that correlates pretty well with it's guesses as to the top 10: > {38,192,369,330,332,501,229,493,319,424} > But not so well as to their relative distributions: > {0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333} Very curious. I'd have expected it to recognize 192 as the most common value, given that actual distribution. Is the analyze result repeatable? > Or, if I could tell it to do a > more detailed sampling during ANALYZE. I could also tell it to keep more > than the top 10 in the statistics table (SET STATISTICS), but I'm not sure > what it would buy me, other than forcing a larger sample The sample size scales linearly with the SET STATISTICS target (more specifically, with the largest target among the columns being analyzed). I was just about to suggest that you try setting a larger target and see if the stats get better. > How much would I slow the ANALYZE statement, and more importantly, the > query optimizer, if I told it to keep statistics on the top 200 instead of > the default 10 values? 200 seems like overkill... 20 or 30 might well be enough. FWIW, I went back and loaded the test case into 7.2.1, and I still get the same estimates I showed from current sources. So there's something very fishy about your results. Don't know where to look for the cause of the discrepancy at the moment. regards, tom lane PS: sorry about the blank post before...
OK, I posted the current schema and before and after EXPLAIN output. The problem query is "select * from vtb_allocpayroll". It is messy, but works fine until I analyze things... http://www.pakrat.com/pgtrouble I will look into the pg_stats thing I saw mentioned, but am not sure if it applies since I am way back at 7.1.3. planning=# select version(); version -------------------------------------------------------------- PostgreSQL 7.1.3 on i386--netbsd, compiled by GCC egcs-1.1.2 (1 row) Ian A. Harding Programmer/Analyst II Tacoma-Pierce County Health Department (253) 798-3549 mailto: iharding@tpchd.org >>> Neil Conway <nconway@klamath.dyndns.org> 05/17/02 04:08PM >>> On Fri, 17 May 2002 10:06:32 -0700 "Ian Harding" <ianh@tpchd.org> wrote: > I can forward my schema and another example to the group if anyone wants. Yes, I'd be interested in seeing that, as well as the output of EXPLAIN ANALYZE for both the good and bad query plans. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC