Thread: View vs. Statement Query Plan
It seems that my server is happy to use some indices to optimize access when I do a specific query involving a UNION, but when I make a view and then query on that view, it doesn't use the indices any more. I have two tables that look like this: CREATE TABLE data ( rec_no INT PRIMARY KEY, day DATE NOT NULL, user_id INT NOT NULL, value INT NOT NULL ) WITHOUT OIDS; CREATE INDEX data_day ON data (day); CREATE INDEX data_user_id ON data (user_id); CREATE INDEX data_value ON data (value); data_4 has about 10 Mrows, data_4a has about 100 Krows. I created a view, data, combining these two tables: CREATE VIEW data AS SELECT * FROM data_4 UNION ALL SELECT * FROM data_4a But for some reason this view doesn't use the indices that an equivalant query uses: test=# explain select * from data_4 where user_id = 12345 union all select * from data_4a where user_id = 12345; NOTICE: QUERY PLAN: Append (cost=0.00..4334.59 rows=1080 width=16) -> Subquery Scan *SELECT* 1 (cost=0.00..4325.05 rows=1078 width=16) -> Index Scan using data_4_user_id on data_4 (cost=0.00..4325.05 rows=1078 width=16) -> Subquery Scan *SELECT* 2 (cost=0.00..9.54 rows=2 width=16) -> Index Scan using data_4a_user_id on data_4a (cost=0.00..9.54 rows=2 width=16) EXPLAIN test=# explain select * from data where user_id = 12345; NOTICE: QUERY PLAN: Subquery Scan data (cost=0.00..1638580.00 rows=100100000 width=16) -> Append (cost=0.00..1638580.00 rows=100100000 width=16) -> Subquery Scan *SELECT* 1 (cost=0.00..1636943.00 rows=100000000 width=16) -> Seq Scan on data_4 (cost=0.00..1636943.00 rows=100000000 width=16) -> Subquery Scan *SELECT* 2 (cost=0.00..1637.00 rows=100000 width=16) -> Seq Scan on data_4a (cost=0.00..1637.00 rows=100000 width=16) Any idea why this is? Should I be creating the view in a different way? cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson <cjs@cynic.net> writes: > But for some reason this view doesn't use the indices that an > equivalant query uses: You're essentially supposing that select * from (select * from a union select * from b) where foo; may be transformed into (select * from a where foo) union (select * from b where foo); I don't doubt that this transformation is valid in some cases ... but I do doubt that it is valid in all cases. If someone can supply a rigorous proof about when it is valid, I'd be willing to look into doing the necessary programming. regards, tom lane
On Tue, Jun 04, 2002 at 01:15:38AM -0400, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > But for some reason this view doesn't use the indices that an > > equivalant query uses: > > You're essentially supposing that > > select * from (select * from a union select * from b) where foo; > > may be transformed into > > (select * from a where foo) union (select * from b where foo); > > I don't doubt that this transformation is valid in some cases ... but > I do doubt that it is valid in all cases. If someone can supply a > rigorous proof about when it is valid, I'd be willing to look into > doing the necessary programming. IIRC, union does an implicit DISTINCT (there's UNION ALL, right). So if what is being selected is anything other than a simple statement, it'll be very hard to prove equivalence (i guess this is what the iscachable is for). HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Tue, Jun 04, 2002 at 01:15:38AM -0400, Tom Lane wrote: >> I don't doubt that this transformation is valid in some cases ... but >> I do doubt that it is valid in all cases. If someone can supply a >> rigorous proof about when it is valid, I'd be willing to look into >> doing the necessary programming. > IIRC, union does an implicit DISTINCT (there's UNION ALL, right). So if what > is being selected is anything other than a simple statement, it'll be very > hard to prove equivalence (i guess this is what the iscachable is > for). Yeah, the UNION vs. UNION ALL difference is one of the things that would need to be thought about. I think it's more likely that the transformation would work for UNION ALL than for UNION ... but I have not had the time to try to work it out. regards, tom lane
On Tue, Jun 04, 2002 at 07:15:59PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > On Tue, Jun 04, 2002 at 01:15:38AM -0400, Tom Lane wrote: > >> I don't doubt that this transformation is valid in some cases ... but > >> I do doubt that it is valid in all cases. If someone can supply a > >> rigorous proof about when it is valid, I'd be willing to look into > >> doing the necessary programming. > > > IIRC, union does an implicit DISTINCT (there's UNION ALL, right). So if what > > is being selected is anything other than a simple statement, it'll be very > > hard to prove equivalence (i guess this is what the iscachable is > > for). > > Yeah, the UNION vs. UNION ALL difference is one of the things that would > need to be thought about. I think it's more likely that the > transformation would work for UNION ALL than for UNION ... but I have > not had the time to try to work it out. One thing I don't know and that is how closely SQL follows relational algebra. Is it close enough that you can prove results in relational algebra and have them work in SQL. Or there enough special cases to make that tricky. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
On Wed, 5 Jun 2002, Martijn van Oosterhout wrote: > One thing I don't know and that is how closely SQL follows relational > algebra. Is it close enough that you can prove results in relational algebra > and have them work in SQL. Or there enough special cases to make that > tricky. Ha! Nowhere near close enough. SQL actually breaks the relational model in many ways; C. J. Date's books (particularly _The Third Manifesto_ and his _Introduction to Database Systems_) go into this in detail. I've not really had time to sit down and think about the characteristics of the problem of whether or not a WHERE clause on a pair of SELECT statements joined by UNION is distributitive or not. The easiest thing would be if someone could find a case that proves by example that the WHERE clause is not distributitive. Providing a formal proof that it is distributive seems to me rather difficult, because I can't even really see how to translate SQL into relational algebra.... cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
On Wed, Jun 05, 2002 at 06:35:10PM +0900, Curt Sampson wrote: > On Wed, 5 Jun 2002, Martijn van Oosterhout wrote: > > > One thing I don't know and that is how closely SQL follows relational > > algebra. Is it close enough that you can prove results in relational algebra > > and have them work in SQL. Or there enough special cases to make that > > tricky. > > Ha! Nowhere near close enough. SQL actually breaks the relational model > in many ways; C. J. Date's books (particularly _The Third Manifesto_ and > his _Introduction to Database Systems_) go into this in detail. Interesting. I don't know these books, I may see if I can find them. > I've not really had time to sit down and think about the characteristics > of the problem of whether or not a WHERE clause on a pair of SELECT > statements joined by UNION is distributitive or not. The easiest > thing would be if someone could find a case that proves by example > that the WHERE clause is not distributitive. Well, I think UNION ALL may be doable, but straight UNION could be tricky. You need something stricter than iscachable do be able to use functions. I've got a nifty example below. > Providing a formal proof that it is distributive seems to me rather > difficult, because I can't even really see how to translate SQL > into relational algebra.... Most places I've looked at on the web that relate SQL and relational algebra seem good, but tend to ignore outer joins and group by, which kinda blows a hole in most things. Nifty example: select lower(f) from (select f from a union select f from b); (select lower(f) from a) union (select lower(f) from b); If table a has "HELLO" and table b has "hello", the first will produce two rows of output, whereas the second produce only one. To work, you require the function to be *invertable* (1-to-1 mapping) which is pretty strict. Off the top of my head I'd say the vast majority would fail that test. UNION ALL would be fine I think, as long as the functions ware cachable and there were no aggregates. Basically, as long all you're pushing down is a cacheable projection. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
Martijn van Oosterhout sez: [...] } Nifty example: } } select lower(f) from (select f from a union select f from b); } (select lower(f) from a) union (select lower(f) from b); } } If table a has "HELLO" and table b has "hello", the first will produce two } rows of output, whereas the second produce only one. To work, you require } the function to be *invertable* (1-to-1 mapping) which is pretty strict. Off } the top of my head I'd say the vast majority would fail that test. } } UNION ALL would be fine I think, as long as the functions ware cachable and } there were no aggregates. Basically, as long all you're pushing down is a } cacheable projection. As long as you aren't messing with the columns (no projections, no value modifications, no aggregation), it is $\sigma_p(A) \cup \sigma_p(B)$ vs. $\sigma_p(A \cup B)$. These are equivalent, regardless of whether it is a proper UNION or a UNION ALL. } Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ --Greg
On Wed, Jun 05, 2002 at 10:49:46AM -0400, Gregory Seidman wrote: > As long as you aren't messing with the columns (no projections, no value > modifications, no aggregation), it is $\sigma_p(A) \cup \sigma_p(B)$ vs. > $\sigma_p(A \cup B)$. These are equivalent, regardless of whether it is a > proper UNION or a UNION ALL. Well, i think in this users case he was using a view. Unless you are using all the columns from the view you're going to have a problem, since dropping columns is also a projection. select a from (select a,b from tableA union select a,b from tableB) (select a from tableA) union (select a from tableB) are not equivalent. I don't think it's possible to simplify the first statement at all with UNION. OTOH, UNION ALL would be fine. Anyhow, for a programmtical point of view, all that would to be done would be to allow certain types of projections to be distributed across Append nodes. Basically, projections that drop fields and only involve cachable functions. But you need to think harder about what can be passed through Distinct nodes. Proving that it works for straight Append is easy. First it does one query and then the other. So any projections done just after can also be done just before with the same effect. Although fields get renamed, don't they. So when people ask why their view is slow, ask whether they mean UNION or UNION ALL and change it if it makes no difference. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.