Thread: Views with unions
Hello, While testing multi-select views I found some problems. Here are details. I have 3 tables and I created a view on them: create view view123 as select key, value from tab1 where key=1 union all select key, value from tab2 where key=2 union all select key, value from tab3 where key=3; When querying with no conditions, I get plan: test_db=# explain analyze select key, value from view123; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Subquery Scan view123 (cost=0.00..3.19 rows=15 width=11) (actual time=0.15..1.00 rows=15 loops=1) -> Append (cost=0.00..3.19 rows=15 width=11) (actual time=0.14..0.80 rows=15 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..1.06 rows=5 width=11) (actual time=0.13..0.30 rows=5 loops=1) -> Seq Scan on tab1 (cost=0.00..1.06 rows=5 width=11) (actual time=0.11..0.22 rows=5 loops=1) Filter: ("key" = 1) -> Subquery Scan "*SELECT* 2" (cost=0.00..1.06 rows=5 width=11) (actual time=0.07..0.22 rows=5 loops=1) -> Seq Scan on tab2 (cost=0.00..1.06 rows=5 width=11) (actual time=0.05..0.15 rows=5 loops=1) Filter: ("key" = 2) -> Subquery Scan "*SELECT* 3" (cost=0.00..1.06 rows=5 width=11) (actual time=0.06..0.22 rows=5 loops=1) -> Seq Scan on tab3 (cost=0.00..1.06 rows=5 width=11) (actual time=0.05..0.15 rows=5 loops=1) Filter: ("key" = 3) Total runtime: 1.57 msec (12 rows) But with "key = 3": test_db# explain analyze select key, value from view123 where key=3; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Subquery Scan view123 (cost=0.00..3.22 rows=7 width=11) (actual time=0.40..0.65 rows=5 loops=1) -> Append (cost=0.00..3.22 rows=7 width=11) (actual time=0.38..0.58 rows=5 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..1.07 rows=1 width=11) (actual time=0.18..0.18 rows=0 loops=1) -> Seq Scan on tab1 (cost=0.00..1.07 rows=1 width=11) (actual time=0.17..0.17 rows=0 loops=1) Filter: (("key" = 1) AND ("key" = 3)) -> Subquery Scan "*SELECT* 2" (cost=0.00..1.07 rows=1 width=11) (actual time=0.11..0.11 rows=0 loops=1) -> Seq Scan on tab2 (cost=0.00..1.07 rows=1 width=11) (actual time=0.11..0.11 rows=0 loops=1) Filter: (("key" = 2) AND ("key" = 3)) -> Subquery Scan "*SELECT* 3" (cost=0.00..1.07 rows=5 width=11) (actual time=0.08..0.25 rows=5 loops=1) -> Seq Scan on tab3 (cost=0.00..1.07 rows=5 width=11) (actual time=0.06..0.18 rows=5 loops=1) Filter: (("key" = 3) AND ("key" = 3)) Total runtime: 1.22 msec (12 rows) I would expect, that false filters, like (("key" = 1) AND ("key" = 3)) will make table full scan unnecessary. So I expectedplan like: test_db# explain analyze select key, value from view123 where key=3; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Subquery Scan view123 (cost=0.00..3.22 rows=7 width=11) (actual time=0.40..0.65 rows=5 loops=1) -> Append (cost=0.00..3.22 rows=7 width=11) (actual time=0.38..0.58 rows=5 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..1.07 rows=1 width=11) (actual time=0.18..0.18 rows=0 loops=1) -> Result (cost=0.00..0.00 rows=0 width=11) (actual time=0.01..0.01 rows=0 loops=1) ^^^^^^^^^^^ my change Filter: (("key" = 1) AND ("key" = 3)) [always false] ^^^^^^^^^^^ my change -> Subquery Scan "*SELECT* 2" (cost=0.00..1.07 rows=1 width=11) (actual time=0.11..0.11 rows=0 loops=1) -> Result (cost=0.00..0.00 rows=0 width=11) (actual time=0.01..0.01 rows=0 loops=1) ^^^^^^^^^^^ my change Filter: (("key" = 2) AND ("key" = 3)) [always false] ^^^^^^^^^^^ my change -> Subquery Scan "*SELECT* 3" (cost=0.00..1.07 rows=5 width=11) (actual time=0.08..0.25 rows=5 loops=1) -> Seq Scan on tab3 (cost=0.00..1.07 rows=5 width=11) (actual time=0.06..0.18 rows=5 loops=1) Filter: (("key" = 3) AND ("key" = 3)) Total runtime: 1.22 msec (12 rows) No "Seq Scan" on tables where filter is false. I realize that's how it works now, but: a) is there any way to avoid such scans? b) is it possible (or in TODO) to optimize for such cases? Regards, Mariusz Czułada
Mariusz, > While testing multi-select views I found some problems. Here are details. I > have 3 tables and I created a view on them: What version of PostgreSQL are you using? UNION views optimized extremely poorly through 7.2.4; things have been improved in 7.3 -- Josh Berkus Aglio Database Solutions San Francisco
Hi, Dnia nie 16. lutego 2003 04:54, Josh Berkus napisał: > > What version of PostgreSQL are you using? UNION views optimized extremely > poorly through 7.2.4; things have been improved in 7.3 PostgreSQL 7.3 on sparc-sun-solaris2.9, compiled by GCC gcc (GCC) 3.2 (self compiled on SunBlade 100). Mariusz
On Sat, 15 Feb 2003, Josh Berkus wrote: > Mariusz, > > > While testing multi-select views I found some problems. Here are details. I > > have 3 tables and I created a view on them: > > What version of PostgreSQL are you using? UNION views optimized extremely > poorly through 7.2.4; things have been improved in 7.3 Yeah, but I think what he's hoping is that it'll notice that "key=1 and key=3" would be noticed as a false condition so that it doesn't scan those tables since a row presumably can't satisify both. The question would be, is the expense of checking the condition for all queries greater than the potential gain for these sorts of queries. In addition, you'd have to be careful to make it work correctly with operator overloading, since someone could make operators whose semantics in cross-datatype comparisons are wierd.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Yeah, but I think what he's hoping is that it'll notice that > "key=1 and key=3" would be noticed as a false condition so that it doesn't > scan those tables since a row presumably can't satisify both. The question > would be, is the expense of checking the condition for all queries > greater than the potential gain for these sorts of queries. Yes, this is the key point: we won't put in an optimization that wins on a small class of queries unless there is no material cost added for planning cases where it doesn't apply. > In addition, you'd have to be careful to make it work correctly with > operator overloading, since someone could make operators whose > semantics in cross-datatype comparisons are wierd. In practice we would restrict such deductions to mergejoinable = operators, which are sufficiently semantics-constrained that I think you can treat equality at face value. Actually, in CVS tip we are on the hairy edge of being able to do this: generate_implied_equalities() actually detects that the given conditions imply that two constants are equal. But it doesn't do anything with the knowledge, because I couldn't figure out just what to do --- it's not always correct to add a "WHERE false" constraint to the top level, but where exactly do we add it? Exactly which relations are guaranteed to produce zero rows in such a case? (When there are outer joins in the picture, zero rows out of some relations doesn't mean zero rows out overall.) And how do we exploit that knowledge once we've got it? It'd be a pretty considerable amount of work to optimize a plan tree fully for this sort of thing (eg, suppressing unnecessary joins), and I doubt it's worth the trouble. regards, tom lane
Dnia nie 16. lutego 2003 19:51, Tom Lane napisał: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > Yeah, but I think what he's hoping is that it'll notice that > > "key=1 and key=3" would be noticed as a false condition so that it > > doesn't scan those tables since a row presumably can't satisify both. The Yes, that is what I expected. > > Yes, this is the key point: we won't put in an optimization that wins on > a small class of queries unless there is no material cost added for > planning cases where it doesn't apply. > > > In addition, you'd have to be careful to make it work correctly with > > operator overloading, since someone could make operators whose > > semantics in cross-datatype comparisons are wierd. > > It'd be a pretty considerable amount of work to optimize a plan tree > fully for this sort of thing (eg, suppressing unnecessary joins), and > I doubt it's worth the trouble. Ok, perhaps I should give some explaination about my case. We are gathering lots of log data in a few tables. Each table grows by some 300.000...500.000 rows a day. With average row size of 100 bytes we get up to 50MB of data per day. Keeping data for 1 year only gives us some 18GB per table. Also, in each table there is a field with very low cardinality (5..20 unique values). This field appears in most of our queries to the table, in 'where' clause (mostly key_field = 5, some times key_field in (1,2,3)). What I was thinking of is to implement some kind of horizontal table partitioning. I wanted to split physical storage of data to few smaller tables. In my case it could be come 12 subtables, 1..2 GB each. Now, with 'union-all' view (and lots of rules, of course) I could simultate partitioned table as Oracle implements it. IMHO while querying this view (supertable) for one or few 'key_field' values it should be much faster for scan 5 GB of 3 partitions (subtables) than 18GB for one big table. I realize it is not the only solution. Perhaps it could be implemented by a function taking key_filed value and returning all rows from proper table (p[lus functions for insert/update/delete). Perhaps application code (log feeder and report module) could be recoded to know about splitted tables. Still I think it is 'elegant' and clear. I wait for your comments, Mariusz Czulada