Thread: Queries joining views
Is there a trick to make this work a bit faster? We have a number of views that join tables, and we have queries that join those views. Some relatively large tables are involved. We added indexes that match our query constraints as much as possible, and that does work if we explicitly query the tables with all the involved joins, instead of the views. However, if we query the views, the planner starts using a filter instead of the desired index... What we see basically is that adding one view to the query makes it go from 12ms to 130ms... zorgweb_solaris=> explain analyze SELECT insrel.owner,insrel.rnumber,insrel.dnumber,insrel.snumber,insrel.dir,insrel.otype,insrel.number,medical_care_container.number,product.number FROM mm_medical_care_container_table medical_care_container,mm_insrel insrel,mm_product_table product WHERE medical_care_container.number=558332 AND (medical_care_container.number=insrel.dnumber AND product.number=insrel.snumber AND insrel.dir<>1); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=114.23..203.24 rows=3 width=42) (actual time=10.137..12.171 rows=1 loops=1) -> Index Scan using mm_medical_care_container_table_pkey on mm_medical_care_container_table medical_care_container (cost=0.00..5.64 rows=1 width=4) (actual time=0.048..0.052 rows=1 loops=1) Index Cond: (number = 558332) -> Nested Loop (cost=114.23..197.57 rows=3 width=38) (actual time=10.077..12.106 rows=1 loops=1) -> Merge Join (cost=114.23..186.13 rows=3 width=24) (actual time=10.025..12.049 rows=1 loops=1) Merge Cond: ("outer".number = "inner".snumber) -> Index Scan using mm_product_table_pkey on mm_product_table product (cost=0.00..67.90 rows=1571 width=4) (actual time=0.025..9.460 rows=1571 loops=1) -> Sort (cost=114.23..114.31 rows=30 width=20) (actual time=0.144..0.145 rows=2 loops=1) Sort Key: mm_insrel_table.snumber -> Bitmap Heap Scan on mm_insrel_table (cost=2.11..113.50 rows=30 width=20) (actual time=0.092..0.103 rows=2 loops=1) Recheck Cond: ((dnumber = 558332) AND (dir <> 1)) -> Bitmap Index Scan on mm_insrel_dnumber_dir_not_one_idx (cost=0.00..2.11 rows=30 width=0) (actual time=0.070..0.070 rows=2 loops=1) Index Cond: (dnumber = 558332) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=18) (actual time=0.042..0.046 rows=1 loops=1) Index Cond: ("outer".number = mm_object.number) Total runtime: 12.765 ms zorgweb_solaris=> explain analyze SELECT insrel.rnumber,insrel.dnumber,insrel.snumber,insrel.dir,insrel.number,medical_care_container.number,product.number FROM mm_medical_care_container medical_care_container,mm_insrel insrel,mm_product product WHERE medical_care_container.number=558332 AND (medical_care_container.number=insrel.dnumber AND product.number=insrel.snumber AND insrel.dir<>1); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..172.69 rows=1 width=28) (actual time=53.987..129.419 rows=1 loops=1) -> Nested Loop (cost=0.00..168.88 rows=1 width=28) (actual time=53.940..129.365 rows=1 loops=1) -> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1) Merge Cond: ("outer".number = "inner".number) -> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1) -> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1) -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20) (actual time=43.975..117.246 rows=2 loops=1) Index Cond: (dnumber = 558332) -> Index Scan using mm_medical_care_container_table_pkey on mm_medical_care_container_table (cost=0.00..5.64 rows=1 width=4) (actual time=0.044..0.045 rows=1 loops=2) Index Cond: (558332 = number) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=4) (actual time=0.041..0.043 rows=1 loops=2) Index Cond: (mm_object.number = "outer".snumber) -> Index Scan using mm_product_table_pkey on mm_product_table (cost=0.00..67.90 rows=1571 width=4) (actual time=0.023..9.443 rows=1571 loops=1) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=4) (actual time=0.040..0.042 rows=1 loops=1) Index Cond: ("outer".number = mm_object.number) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=4) (actual time=0.038..0.042 rows=1 loops=1) Index Cond: (number = 558332) Total runtime: 130.149 ms zorgweb_solaris=> \d mm_insrel; View "public.mm_insrel" Column | Type | Modifiers ---------+---------+----------- number | integer | otype | integer | owner | text | snumber | integer | dnumber | integer | rnumber | integer | dir | integer | View definition: SELECT mm_object.number, mm_object.otype, mm_object."owner", mm_insrel_table.snumber, mm_insrel_table.dnumber, mm_insrel_table.rnumber, mm_insrel_table.dir FROM mm_insrel_table JOIN mm_object USING (number); zorgweb_solaris=> \d mm_medical_care_container View "public.mm_medical_care_container" Column | Type | Modifiers --------+---------+----------- number | integer | otype | integer | owner | text | View definition: SELECT mm_object.number, mm_object.otype, mm_object."owner" FROM mm_medical_care_container_table JOIN mm_object USING (number); zorgweb_solaris=> \d mm_product View "public.mm_product" Column | Type | Modifiers ------------------------+---------+----------- number | integer | otype | integer | owner | text | created | bigint | lastmodified | bigint | start_time | bigint | end_time | bigint | title | text | details | text | only_collectively | boolean | term_of_notice | text | max_number_paying_kids | integer | contract_term | text | advance_declarations | text | free_care_choice | text | export_to_rivm | boolean | export_to_kwiz | boolean | export_to_independer | boolean | show_in_frontend | boolean | path | text | type_notes | text | View definition: SELECT mm_object.number, mm_object.otype, mm_object."owner", mm_product_table.created, mm_product_table.lastmodified, mm_product_table.start_time, mm_product_table.end_time, mm_p roduct_table.title, mm_product_table.details, mm_product_table.only_collectively, mm_product_table.term_of_notice, mm_product_table.max_number_paying_kids, mm_product_table.contra ct_term, mm_product_table.advance_declarations, mm_product_table.free_care_choice, mm_product_table.export_to_rivm, mm_product_table.export_to_kwiz, mm_product_table.export_to_ind epender, mm_product_table.show_in_frontend, mm_product_table.path, mm_product_table.type_notes FROM mm_product_table JOIN mm_object USING (number); -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys wrote: > We have a number of views that join tables, and we have queries that > join those views. Some relatively large tables are involved. > We added indexes that match our query constraints as much as possible, > and that does work if we explicitly query the tables with all the > involved joins, instead of the views. However, if we query the views, > the planner starts using a filter instead of the desired index... Anecdotally, I had a situation recently where I got different plans depending on whether I queried a join of a view against itself, or "macro expanded" the view by hand. I was =very= surprised at this. - John D. Burger MITRE
Alban Hertroys <alban@magproductions.nl> writes: > Is there a trick to make this work a bit faster? Have you really shown us the right queries for those explain results? I don't see where the second plan is testing "dir <> 1" at all. It looks like the first one is faster because it's using a partial index that has predicate dir <> 1, while the second one is using a much larger full index. But I don't see where the second plan is applying that restriction, so I wonder if you forgot it in the query. regards, tom lane
"John D. Burger" <john@mitre.org> writes: > Anecdotally, I had a situation recently where I got different plans > depending on whether I queried a join of a view against itself, or > "macro expanded" the view by hand. I was =very= surprised at this. Me too, at least if you didn't do any hand optimization but just stuck the view definition in as a sub-select. Can you provide a reproducible case? regards, tom lane
Tom Lane wrote: >> Anecdotally, I had a situation recently where I got different plans >> depending on whether I queried a join of a view against itself, or >> "macro expanded" the view by hand. I was =very= surprised at this. > > Me too, at least if you didn't do any hand optimization but just stuck > the view definition in as a sub-select. Can you provide a reproducible > case? No surprise, I cannot reproduce this, especially since the DDL has evolved since then. But you are almost certainly right, my "macro expansion" must have done something more than simply dropping in the view definition as is. Presumably something minor (to me) but a show-stopper for the planner. Sorry for the distraction. :( - John D. Burger MITRE
Alban Hertroys <alban ( at ) magproductions ( dot ) nl> writes:
> Is there a trick to make this work a bit faster?
Have you really shown us the right queries for those explain results?
I don't see where the second plan is testing "dir <> 1" at all.
It looks like the first one is faster because it's using a partial
index that has predicate dir <> 1, while the second one is using
a much larger full index. But I don't see where the second plan
is applying that restriction, so I wonder if you forgot it in the
query.
regards, tom lane
He has really shown the right queries. But I see the table definition if mm_insrel_table (including the indexes) is not in the e-mail, so you don't see why the dir <> 1 is not in the query plan. Here is the table definition, with the indexes. As you can see we tried some indexes, to see if we could get the queries on the views to become faster.
zorgweb_solaris=> \d mm_insrel_table
Table "public.mm_insrel_table"
Column | Type | Modifiers
---------+---------+-----------
number | integer | not null
snumber | integer | not null
dnumber | integer | not null
rnumber | integer | not null
dir | integer |
Indexes:
"mm_insrel_table_pkey" PRIMARY KEY, btree (number)
"mm_insrel_dir_not_one_idx" btree (dnumber, snumber) WHERE dir <> 1
"mm_insrel_dnumber_dir_not_one_idx" btree (dnumber) WHERE dir <> 1
"mm_insrel_full_idx" btree (snumber, dnumber, rnumber, dir) WHERE dir <> 1
"mm_insrel_relation_idx" btree (snumber, dnumber, rnumber)
Foreign-key constraints:
"mm_insrel_table_dnumber_fkey" FOREIGN KEY (dnumber) REFERENCES mm_object(number)
"mm_insrel_table_rnumber_fkey" FOREIGN KEY (rnumber) REFERENCES mm_object(number)
"mm_insrel_table_snumber_fkey" FOREIGN KEY (snumber) REFERENCES mm_object(number)
Regards,
Wessel van Norel
DelGurth <delgurth@gmail.com> writes: > As you can see we tried some indexes, to see if we could > get the queries on the views to become faster. > Indexes: > "mm_insrel_table_pkey" PRIMARY KEY, btree (number) > "mm_insrel_dir_not_one_idx" btree (dnumber, snumber) WHERE dir <> 1 > "mm_insrel_dnumber_dir_not_one_idx" btree (dnumber) WHERE dir <> 1 > "mm_insrel_full_idx" btree (snumber, dnumber, rnumber, dir) WHERE dir <> > 1 > "mm_insrel_relation_idx" btree (snumber, dnumber, rnumber) Hmph ... it certainly appears to be choosing the wrong index in the second case. I wonder why --- can you show the relpages and reltuples stats from pg_class for these indexes? It might be interesting also to examine the output of just explain select * from mm_insrel_table where dnumber=558332 and dir<>1 with different subsets of these indexes in place. I'd like to see what it's deriving as the cost estimates for these indexes. If you can get an EXPLAIN selecting each one of the indexes, that would help diagnose what's going on. Dunno if you know this trick already, but it's possible to experiment with different index subsets without physically dropping and recreating the indexes. Try begin; drop [unwanted indexes] explain ... rollback; This will hold exclusive lock on the table until you rollback, so if it's a production database you want to be quick about it --- maybe put the whole thing in a SQL script. But it sure beats rebuilding indexes. BTW, what PG version is this exactly? regards, tom lane
BTW, what PG version is this exactly?
Our PG version is the version downloadable from http://www.sunfreeware.com/programlistsparc10.html#postgresql , so 8.0.1 for solaris sparc.
Sorry I was wrong on this point, it's 8.1.4
-bash-3.00$ pg_config --version
PostgreSQL 8.1.4
And it's the version from blastwave.org: http://www.blastwave.org/packages.php/postgresql
Regards,
Wessel van Norel
On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm personally not aware how to do that, perhaps Alban will (tell me how to) do that tomorrow.
Ok. Did that (with your trick, thanks!). The output is attached to this e-mail. The script I used to drop the indexes dropped them in the order it was using them (partially by accident, partially because I assumed it would work in that order).
I'm not sure if you want to see more permutations, if so please tell me.
Our PG version is the version downloadable from http://www.sunfreeware.com/programlistsparc10.html#postgresql , so 8.0.1 for solaris sparc.
(it's a sun T2000 test machine).
Regards,
Wessel van Norel
Hmph ... it certainly appears to be choosing the wrong index in the
second case. I wonder why --- can you show the relpages and reltuples
stats from pg_class for these indexes?
I'm personally not aware how to do that, perhaps Alban will (tell me how to) do that tomorrow.
It might be interesting also to examine the output of just
explain select * from mm_insrel_table where dnumber=558332 and dir<>1
with different subsets of these indexes in place. I'd like to see what
it's deriving as the cost estimates for these indexes. If you can get
an EXPLAIN selecting each one of the indexes, that would help diagnose
what's going on.
Ok. Did that (with your trick, thanks!). The output is attached to this e-mail. The script I used to drop the indexes dropped them in the order it was using them (partially by accident, partially because I assumed it would work in that order).
I'm not sure if you want to see more permutations, if so please tell me.
BTW, what PG version is this exactly?
Our PG version is the version downloadable from http://www.sunfreeware.com/programlistsparc10.html#postgresql , so 8.0.1 for solaris sparc.
(it's a sun T2000 test machine).
regards, tom lane
Regards,
Wessel van Norel
Attachment
DelGurth <delgurth@gmail.com> writes: > On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It might be interesting also to examine the output of just >> explain select * from mm_insrel_table where dnumber=558332 and dir<>1 >> with different subsets of these indexes in place. > Ok. Did that (with your trick, thanks!). The output is attached to > this e-mail. Well, no smoking gun there, it clearly knows that mm_insrel_full_idx is much more expensive for this query than the other two... Looking back at Alban's original post, I finally see what the planner is up to: -> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1) Merge Cond: ("outer".number = "inner".number) -> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1) -> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1) -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20)(actual time=43.975..117.246 rows=2 loops=1) Index Cond: (dnumber = 558332) The reason it's choosing this indexscan is that that will give it data sorted by mm_insrel_table.number, which it can feed into the mergejoin without an extra sort step. Now sorting 30 rows is not going to take nearly as much time as the indexscan eats up, so this still doesn't make sense --- until you notice that it's estimating the top merge join at considerably less than the cost of its inputs (165.07, vss 2796.82 just for this input). That means it thinks it won't have to run the inputs to completion in order to finish the mergejoin, and so it's picking a sub-plan that has zero start cost. What this means is that the planner thinks the range of "number" values in mm_product_table (the other side of the mergejoin) is much less than the range in mm_insrel_table. Is that the case? Perhaps your ANALYZE stats for these tables are out of date. If not I'd like to see the pg_stats entries for the two "number" columns. regards, tom lane
Tom Lane wrote: > DelGurth <delgurth@gmail.com> writes: >> As you can see we tried some indexes, to see if we could >> get the queries on the views to become faster. > >> Indexes: >> "mm_insrel_table_pkey" PRIMARY KEY, btree (number) >> "mm_insrel_dir_not_one_idx" btree (dnumber, snumber) WHERE dir <> 1 >> "mm_insrel_dnumber_dir_not_one_idx" btree (dnumber) WHERE dir <> 1 >> "mm_insrel_full_idx" btree (snumber, dnumber, rnumber, dir) WHERE dir <> >> 1 >> "mm_insrel_relation_idx" btree (snumber, dnumber, rnumber) > > Hmph ... it certainly appears to be choosing the wrong index in the > second case. I wonder why --- can you show the relpages and reltuples > stats from pg_class for these indexes? Here they are: relname | relpages | reltuples -----------------------------------+----------+----------- mm_insrel_dir_not_one_idx | 899 | 323628 mm_insrel_dnumber_dir_not_one_idx | 899 | 323628 mm_insrel_table_pkey | 1237 | 323628 mm_insrel_relation_idx | 1849 | 323628 mm_insrel_full_idx | 1260 | 323628 -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Tom Lane wrote: > DelGurth <delgurth@gmail.com> writes: >> On 8/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> It might be interesting also to examine the output of just >>> explain select * from mm_insrel_table where dnumber=558332 and dir<>1 >>> with different subsets of these indexes in place. > >> Ok. Did that (with your trick, thanks!). The output is attached to >> this e-mail. > > Well, no smoking gun there, it clearly knows that mm_insrel_full_idx > is much more expensive for this query than the other two... > > Looking back at Alban's original post, I finally see what the planner > is up to: > > -> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1) > Merge Cond: ("outer".number = "inner".number) > -> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1) > -> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1) > -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20)(actual time=43.975..117.246 rows=2 loops=1) > Index Cond: (dnumber = 558332) > > The reason it's choosing this indexscan is that that will give it data > sorted by mm_insrel_table.number, which it can feed into the mergejoin > without an extra sort step. Now sorting 30 rows is not going to take > nearly as much time as the indexscan eats up, so this still doesn't > make sense --- until you notice that it's estimating the top merge join > at considerably less than the cost of its inputs (165.07, vss 2796.82 > just for this input). That means it thinks it won't have to run the > inputs to completion in order to finish the mergejoin, and so it's > picking a sub-plan that has zero start cost. I see. Obviously that's not right for our case, so we'll have to figure out why it thinks that. > What this means is that the planner thinks the range of "number" values > in mm_product_table (the other side of the mergejoin) is much less than > the range in mm_insrel_table. Is that the case? Perhaps your ANALYZE Very much so. The mm_product_table only contains about 1500 products, while the mm_insrel_table contains a record for every relation in the application (This is MMBase; it handles all relations with a relation table). That's about 330,000 records. > stats for these tables are out of date. If not I'd like to see the > pg_stats entries for the two "number" columns. Especially during optimization sessions like this one we tend to analyse rather frequently. All these indices were created yesterday (except for the primary key index) and the corresponding tables were analyzed after each index creation (aboutish). Data hasn't changed since a while - we're in the middle of a test migration from mysql[1]. As for the stats, I included the one for mm_object as well, as every view contains at least a join with that table - thus it contains 1284556 records... I suspect we're in for another few painful surprises there. zorgweb_solaris=> select * from pg_stats where attname = 'number' and tablename IN ('mm_insrel_table', 'mm_product_table', 'mm_object'); -[ RECORD 1 ]-----+------------------------------------------------------------------------------------ schemaname | public tablename | mm_product_table attname | number null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070} correlation | 0.993398 -[ RECORD 2 ]-----+------------------------------------------------------------------------------------ schemaname | public tablename | mm_insrel_table attname | number null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {615920,689286,750855,812003,872741,933041,1004672,1068250,1134894,1198559,1261685} correlation | 0.664637 -[ RECORD 3 ]-----+------------------------------------------------------------------------------------ schemaname | public tablename | mm_object attname | number null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {287,124412,256534,375896,505810,643940,770327,899229,1028933,1153260,1262338} correlation | 0.858558 As a side note, I'm in the progress of rewriting MMBase code to use explicit joins where applicable. AFAIK the planner can handle those better. > regards, tom lane [1]: With MySQL(4) there was no way for us to enhance performance any more. There are a number of tree-like structures in our data model, and MySQL just lacks the features to cope with that. With PostgreSQL we at least can make use of the ltree contrib package - quite an improvement in performance so far. Even though the ltrees are applied on text columns with text2ltree() conversions in both the queries and the GIST index. We also figured we could use some triggers to generate data that could improve query performance (moving conversions from SELECT-time to INSERT-time), but unfortunately MMBase's caches are in the way there. Regards, Alban. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Tom Lane wrote: > Looking back at Alban's original post, I finally see what the planner > is up to: > > -> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1) > Merge Cond: ("outer".number = "inner".number) > -> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1) > -> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1) > -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20)(actual time=43.975..117.246 rows=2 loops=1) > Index Cond: (dnumber = 558332) > > The reason it's choosing this indexscan is that that will give it data > sorted by mm_insrel_table.number, which it can feed into the mergejoin > without an extra sort step. Now sorting 30 rows is not going to take > nearly as much time as the indexscan eats up, so this still doesn't > make sense --- until you notice that it's estimating the top merge join > at considerably less than the cost of its inputs (165.07, vss 2796.82 > just for this input). That means it thinks it won't have to run the > inputs to completion in order to finish the mergejoin, and so it's > picking a sub-plan that has zero start cost. I'm thinking that removing the indexes it's erroneously using now could help performance, as it can no longer use that index. It may however pick the primary key index (likely), or - if we remove even that one - a sequential scan... Experimenting will answer that. Thanks for your answers so far, at least now we know what's going on. Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys wrote: > Tom Lane wrote: > I'm thinking that removing the indexes it's erroneously using now could > help performance, as it can no longer use that index. It may however > pick the primary key index (likely), or - if we remove even that one - a > sequential scan... Experimenting will answer that. Well, look at the attachhed explain output of: zorgweb_solaris=> BEGIN; DROP INDEX mm_insrel_full_idx; DROP INDEX mm_insrel_relation_idx; EXPLAIN ANALYZE SELECT insrel.rnumber,insrel.dnumber,insrel.snumber,insrel.dir,insrel.number,medical_care_container.number,product.number FROM mm_medical_care_container medical_care_container, mm_insrel insrel, mm_product product WHERE medical_care_container.number=558332 AND (medical_care_container.number=insrel.dnumber AND product.number=insrel.snumber AND insrel.dir<>1); ROLLBACK; Only 13ms as opposed to 130-ish :) -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=118.34..211.13 rows=1 width=28) (actual time=10.943..12.988 rows=1 loops=1) -> Nested Loop (cost=118.34..205.49 rows=1 width=28) (actual time=10.903..12.942 rows=1 loops=1) -> Nested Loop (cost=118.34..201.67 rows=1 width=28) (actual time=10.857..12.891 rows=1 loops=1) -> Merge Join (cost=118.34..190.23 rows=3 width=28) (actual time=10.807..12.837 rows=1 loops=1) Merge Cond: ("outer".number = "inner".snumber) -> Index Scan using mm_product_table_pkey on mm_product_table (cost=0.00..67.90 rows=1571 width=4)(actual time=0.040..9.542 rows=1571 loops=1) -> Sort (cost=118.34..118.41 rows=30 width=24) (actual time=0.867..0.868 rows=2 loops=1) Sort Key: mm_insrel_table.snumber -> Nested Loop (cost=2.11..117.60 rows=30 width=24) (actual time=0.804..0.827 rows=2 loops=1) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=4) (actualtime=0.055..0.061 rows=1 loops=1) Index Cond: (number = 558332) -> Bitmap Heap Scan on mm_insrel_table (cost=2.11..113.50 rows=30 width=20) (actual time=0.723..0.734rows=2 loops=1) Recheck Cond: ((dnumber = 558332) AND (dir <> 1)) -> Bitmap Index Scan on mm_insrel_dnumber_dir_not_one_idx (cost=0.00..2.11 rows=30width=0) (actual time=0.705..0.705 rows=2 loops=1) Index Cond: (dnumber = 558332) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=4) (actual time=0.040..0.043rows=1 loops=1) Index Cond: ("outer".number = mm_object.number) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.80 rows=1 width=4) (actual time=0.038..0.040 rows=1loops=1) Index Cond: ("outer".number = mm_object.number) -> Index Scan using mm_medical_care_container_table_pkey on mm_medical_care_container_table (cost=0.00..5.64 rows=1width=4) (actual time=0.033..0.036 rows=1 loops=1) Index Cond: (558332 = number) Total runtime: 13.799 ms (22 rows)
Alban Hertroys <alban@magproductions.nl> writes: > zorgweb_solaris=> select * from pg_stats where attname = 'number' and > tablename IN ('mm_insrel_table', 'mm_product_table', 'mm_object'); > tablename | mm_product_table > histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070} > tablename | mm_insrel_table > {615920,689286,750855,812003,872741,933041,1004672,1068250,1134894,1198559,1261685} > tablename | mm_object > histogram_bounds | > {287,124412,256534,375896,505810,643940,770327,899229,1028933,1153260,1262338} OK, so here's our problem: according to those stats, the ranges of "number" in mm_product_table and mm_insrel_table don't overlap at all. So the cost model for mergejoin predicts that a mergejoin on "number" will have to read all of mm_product_table but only the first record from mm_insrel_table, and given the difference in size of the two tables, that looks like a pretty good deal. Given that the plan is not actually very fast, I suppose that the histogram is not telling the whole truth --- probably there are a few outlying records in one table or the other causing there to be a more significant overlap than the planner expects. If so, you can probably fix it by increasing the statistics target for that table. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >> zorgweb_solaris=> select * from pg_stats where attname = 'number' and >> tablename IN ('mm_insrel_table', 'mm_product_table', 'mm_object'); > >> tablename | mm_product_table >> histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070} > >> tablename | mm_insrel_table >> {615920,689286,750855,812003,872741,933041,1004672,1068250,1134894,1198559,1261685} > >> tablename | mm_object >> histogram_bounds | >> {287,124412,256534,375896,505810,643940,770327,899229,1028933,1153260,1262338} > > OK, so here's our problem: according to those stats, the ranges of > "number" in mm_product_table and mm_insrel_table don't overlap at all. That's correct, the numbers are generated by a global sequence. Insrel.number can never match a product.number. However, mm_product.number always matches either mm_insrel.snumber or mm_insrel.dnumber (source and destination respectively). The other way around this isn't the case; then snumber and dnumber match number-fields in other tables (they always do). > So the cost model for mergejoin predicts that a mergejoin on "number" > will have to read all of mm_product_table but only the first record from > mm_insrel_table, and given the difference in size of the two tables, > that looks like a pretty good deal. > > Given that the plan is not actually very fast, I suppose that the > histogram is not telling the whole truth --- probably there are a few > outlying records in one table or the other causing there to be a more > significant overlap than the planner expects. If so, you can probably > fix it by increasing the statistics target for that table. That's a bit odd, as the number fields of the different tables are globally unique by definition. > regards, tom lane Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > However, mm_product.number always matches either mm_insrel.snumber or > mm_insrel.dnumber (source and destination respectively). The other way > around this isn't the case; then snumber and dnumber match number-fields > in other tables (they always do). Oh, then we are looking at the wrong things: we should be comparing the histograms of the fields that are being used as the join keys in this query. I had thought they were both "number", but I must be confused. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >> However, mm_product.number always matches either mm_insrel.snumber or >> mm_insrel.dnumber (source and destination respectively). The other way >> around this isn't the case; then snumber and dnumber match number-fields >> in other tables (they always do). > > Oh, then we are looking at the wrong things: we should be comparing the > histograms of the fields that are being used as the join keys in this > query. I had thought they were both "number", but I must be confused. The design is certainly a bit confusing until you get used to it. It usually takes new devs here a while to find their way around MMBase (www.mmbase.org) and its peculiarities. It doesn't help that the documentation is in rather bad English. > regards, tom lane So this is what we're looking for, right? I can't say I understand how to interpret this, let alone come to conclusions. I'm afraid I totally depend on your interpretation here... zorgweb_solaris=> select * from pg_stats where (attname in ('snumber', 'dnumber') and tablename = 'mm_insrel_table') or (attname = 'number' and tablename = 'mm_product_table'); -[ RECORD 1 ]-----+----------------------------------------------------------------------------------------------------------- schemaname | public tablename | mm_product_table attname | number null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070} correlation | 0.993398 -[ RECORD 2 ]-----+----------------------------------------------------------------------------------------------------------- schemaname | public tablename | mm_insrel_table attname | snumber null_frac | 0 avg_width | 4 n_distinct | 14336 most_common_vals | {4300,5210,5366,2994,3724,4118,2982,3058,3072,3460} most_common_freqs | {0.00266667,0.002,0.002,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333} histogram_bounds | {135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034} correlation | 0.083602 -[ RECORD 3 ]-----+----------------------------------------------------------------------------------------------------------- schemaname | public tablename | mm_insrel_table attname | dnumber null_frac | 0 avg_width | 4 n_distinct | 11028 most_common_vals | {1117583,279,415,291,343,389,635,839,1043,319} most_common_freqs | {0.00433333,0.00333333,0.003,0.00266667,0.00266667,0.00266667,0.00266667,0.00266667,0.00266667,0.00233333} histogram_bounds | {147,717,3770,263126,327054,429524,461026,490094,518872,544098,1117603} correlation | 0.0571927 Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > tablename | mm_product_table > attname | number > histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070} > tablename | mm_insrel_table > attname | snumber > histogram_bounds | > {135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034} Hmm ... if I'm not still confused, these are the two columns being mergejoined in your slow query (would you double-check that?). But the numbers don't seem to add up. Given those stats the estimate should be that something over 20% of the mm_insrel_table has to be scanned to complete the join (since 6070 falls into the third decile of the other histogram). But we saw from Alban's original post that the planner must be estimating well under 10% of the table needs to be scanned. Either we're still confused about which columns are being joined, or there's some weird bug in the computation. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >> tablename | mm_product_table >> attname | number >> histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070} > >> tablename | mm_insrel_table >> attname | snumber >> histogram_bounds | >> {135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034} > > Hmm ... if I'm not still confused, these are the two columns being > mergejoined in your slow query (would you double-check that?). That's correct. I read up some on the meaning of the pg_stats values, and I noticed that mm_insrel_tables' snumber and dnumber columns seem to have a rather bad correlation. I think this could be improved by clustering on an index over (number, snumber, dnumber); is that correct? > But the numbers don't seem to add up. Given those stats the estimate > should be that something over 20% of the mm_insrel_table has to be > scanned to complete the join (since 6070 falls into the third decile > of the other histogram). But we saw from Alban's original post that > the planner must be estimating well under 10% of the table needs to > be scanned. Either we're still confused about which columns are being > joined, or there's some weird bug in the computation. Since the start of this thread the insrel table has grown to 339195 records (it was closer to 330,000), maybe that changed the statistics a bit. To be sure, attached is the query plan of the problematic query again at this moment. The other table involved, mm_medical_care_container_table, has the following stats on number: zorgweb_solaris=> select * from pg_stats where attname = 'number' and tablename = 'mm_medical_care_container_table'; -[ RECORD 1 ]-----+------------------------------------------------------------------------------- schemaname | public tablename | mm_medical_care_container_table attname | number null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {418768,436686,455444,473600,490610,508680,527182,545038,562786,578528,595132} correlation | 0.339138 I sure hope we get this mystery unveiled... Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // zorgweb_solaris=> explain analyze SELECT zorgweb_solaris-> insrel.rnumber,insrel.dnumber,insrel.snumber,insrel.dir,insrel.number,medical_care_container.number,product.number zorgweb_solaris-> FROM mm_medical_care_container medical_care_container,mm_insrel zorgweb_solaris-> insrel,mm_product product WHERE medical_care_container.number=558332 AND zorgweb_solaris-> (medical_care_container.number=insrel.dnumber AND zorgweb_solaris(> product.number=insrel.snumber AND insrel.dir<>1); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..178.61 rows=1 width=28) (actual time=56.089..137.304 rows=1 loops=1) -> Nested Loop (cost=0.00..174.67 rows=1 width=28) (actual time=56.041..137.250 rows=1 loops=1) -> Merge Join (cost=0.00..170.73 rows=1 width=28) (actual time=55.986..137.189 rows=1 loops=1) Merge Cond: ("outer".number = "inner".number) -> Nested Loop (cost=0.00..2966.75 rows=30 width=28) (actual time=46.161..125.315 rows=2 loops=1) -> Nested Loop (cost=0.00..2848.50 rows=30 width=24) (actual time=46.108..125.205 rows=2 loops=1) -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2678.20 rows=30 width=20)(actual time=46.042..125.067 rows=2 loops=1) Index Cond: (dnumber = 558332) -> Index Scan using mm_medical_care_container_table_pkey on mm_medical_care_container_table (cost=0.00..5.67 rows=1 width=4) (actual time=0.048..0.049 rows=1 loops=2) Index Cond: (558332 = number) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.93 rows=1 width=4) (actual time=0.041..0.043rows=1 loops=2) Index Cond: (mm_object.number = "outer".snumber) -> Index Scan using mm_product_table_pkey on mm_product_table (cost=0.00..67.90 rows=1571 width=4) (actualtime=0.024..9.462 rows=1571 loops=1) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.93 rows=1 width=4) (actual time=0.043..0.045 rows=1loops=1) Index Cond: ("outer".number = mm_object.number) -> Index Scan using mm_object_pkey on mm_object (cost=0.00..3.93 rows=1 width=4) (actual time=0.038..0.041 rows=1 loops=1) Index Cond: (number = 558332) Total runtime: 138.132 ms (18 rows)
Alban Hertroys <alban@magproductions.nl> writes: > I sure hope we get this mystery unveiled... I'm confused too. Would it be possible for you to send me a dump of your database? I need the full schema definitions of these tables, indexes, and views, but you could null out all but the various "number" columns being used in the joins and query conditions, so there shouldn't be any privacy issues. regards, tom lane
Alban Hertroys <alban@magproductions.nl> writes: >> I'm confused too. Would it be possible for you to send me a dump of >> your database? > Attached is a cleaned out database, the full schema is included, but > only the relevant tables contain any data. Thanks. After digging through it a bit, I understand what's happening, but I'm not seeing any simple fix. The problem is basically that you've got create or replace view mm_product as SELECT ... FROM mm_product_table JOIN mm_object mmop USING (number); and then the problem query has WHERE mm_product.number = insrel.snumber which causes the planner to conclude that mm_product_table.number, mm_object.number, and mm_insrel_table.snumber are all basically interchangeable. In particular it ends up performing the join between mm_product_table.number and mm_object.number as though mm_product_table.number were being joined to mm_insrel_table.snumber. Which is fine, except that it's thinking that the statistics for mm_object.number are applicable in this context, and they're completely misleading. After the join to mm_insrel_table, the statistics of the variable are really like mm_insrel_table.number --- in particular the fraction of the table that has to be visited is much larger than it would've been for mm_object as a whole. This is a problem we've understood in a generic form for awhile: a join or selection might change the statistics of a variable, and so the info stored in the catalogs ought to be modified somehow to predict what will happen at upper join levels. We've not seen it in this particular form before, though. I'm not sure if there's a whole lot you can do about it in the near term other than refactor your schema to avoid having different tables joining to different subranges of mm_object.number. (You don't necessarily have to get rid of mm_object --- just try assigning its keys from a serial, or something, so that there's no correlation to the ranges of keys in other tables.) We might be able to do something about actually solving the statistical problem in 8.3, but I fear it's too late to think about it for 8.2. regards, tom lane
I wrote: > Thanks. After digging through it a bit, I understand what's happening, > but I'm not seeing any simple fix. I forgot to mention that although I could reproduce your bad plan in 8.1, CVS HEAD doesn't fall into the trap. I don't believe we've done anything to fix the fundamental problem however --- it may just be a side effect of the changes in the indexscan cost model that cause it to not go for the bogus plan. regards, tom lane
Tom Lane wrote: > We might be able to do something about actually solving the statistical > problem in 8.3, but I fear it's too late to think about it for 8.2. I take it you mean you already have a very concrete idea on how to solve it. Come on, illuminate us poor dumb souls. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> We might be able to do something about actually solving the statistical >> problem in 8.3, but I fear it's too late to think about it for 8.2. > I take it you mean you already have a very concrete idea on how to solve > it. Come on, illuminate us poor dumb souls. No, I don't :-( ... that was intended to suggest that we might think of a solution given months to work on it rather than days. regards, tom lane
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >>> I'm confused too. Would it be possible for you to send me a dump of >>> your database? > >> Attached is a cleaned out database, the full schema is included, but >> only the relevant tables contain any data. > > Thanks. After digging through it a bit, I understand what's happening, > but I'm not seeing any simple fix. The problem is basically that > you've got > > create or replace view mm_product as > SELECT ... FROM mm_product_table JOIN mm_object mmop USING (number); > > and then the problem query has WHERE mm_product.number = insrel.snumber > which causes the planner to conclude that mm_product_table.number, > mm_object.number, and mm_insrel_table.snumber are all basically > interchangeable. In particular it ends up performing the join between > mm_product_table.number and mm_object.number as though > mm_product_table.number were being joined to mm_insrel_table.snumber. It's even worse, I guess, as the mm_insrel view joins mm_insrel_table with mm_object again. So basically the query performs a self-join on mm_object with a detour through mm_insrel_table and mm_product_table... > Which is fine, except that it's thinking that the statistics for > mm_object.number are applicable in this context, and they're completely > misleading. After the join to mm_insrel_table, the statistics of the > variable are really like mm_insrel_table.number --- in particular the > fraction of the table that has to be visited is much larger than it > would've been for mm_object as a whole. I don't entirely understand what you're saying here. Mm_object is always larger than any other table in the database, as every table joins with (different) records in it to determine it's otype and owner. So I don't understand how a fraction of any of those tables could be larger than mm_object as a whole... In fact, originally the schema used inheritance; every table inherited (directly or indirectly) from mm_object. As this resulted in unions, which caused much more performance problems than the current view-approach, I implemented the current approach. In fact, this approach was lent from what MMBase uses for the MSSQL layer. Well, as I implemented the way the views are defined, there is room for changes in that area. Suggestions are welcome. > This is a problem we've understood in a generic form for awhile: > a join or selection might change the statistics of a variable, > and so the info stored in the catalogs ought to be modified somehow > to predict what will happen at upper join levels. We've not seen > it in this particular form before, though. > > I'm not sure if there's a whole lot you can do about it in the near term > other than refactor your schema to avoid having different tables joining > to different subranges of mm_object.number. (You don't necessarily have > to get rid of mm_object --- just try assigning its keys from a serial, > or something, so that there's no correlation to the ranges of keys in > other tables.) Unfortunately the number key is required to correlate to the number keys in other tables. That's the whole point of that table. It's also already generated from a sequence... I am looking at a view options at the moment: 1.) Cluster mm_object on an index over otype - I'm not sure how that would influence the statistics; if it doesn't then this wouldn't change much. 2.) Change mm_object into a view over the tables that now join with it. I'll have to devise some way to get the otype and owner columns into the other tables. 3.) An extension to option 2; Creating seperate tables, only containing the relevant sections from mm_object, combining them into a view-version of mm_object. Like this: CREATE TABLE mm_product_object ( number integer PRIMARY KEY, otype integer, owner text ); CREATE TABLE mm_insrel_object ( number integer PRIMARY KEY, otype integer, owner text ); (I recall seeing an inheritance-like statement that makes copies of table definitions - seems useful in this case) CREATE OR REPLACE VIEW mm_object AS SELECT * FROM mm_product_object UNION ALL SELECT * FROM mm_insrel_object; It remains to be seen that MMBase can handle mm_object being a view, but (if not) it probably will work if it's an updatable view. I'm leaning to option 3, but if option 1 helps it saves me a lot of trouble. > We might be able to do something about actually solving the statistical > problem in 8.3, but I fear it's too late to think about it for 8.2. Well, I had hoped for a suitable workaround, and I believe I may have a few options now. Waiting for the next PostgreSQL release never really was an option for us (deadline is somewhere next week). So it doesn't really matter to us that there won't be a solution until 8.3, or maybe even later. Thanks for the help so far, glad to be able to point out an actual problem. Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > Mm_object is always larger than any other table in the database, as > every table joins with (different) records in it to determine it's otype > and owner. So I don't understand how a fraction of any of those tables > could be larger than mm_object as a whole... No, I said a larger fraction, not a larger absolute number of tuples. The problem is that because mm_product contains only very small values of "number", a mergejoin looks like a great way to join it to mm_object: only the first 5% of mm_object will need to be scanned. The bug consists in applying that 5% number to mm_insrel, for which it's not correct. regards, tom lane