Thread: Re: Functional dependencies and GROUP BY - for subqueries
Hi All,
If group by clause has primary key, the targetlist may have columns which are not part of the aggregate and not part of group by clause. The relevant commit is e49ae8d3bc588294d07ce1a1272b31718cfca5ef and relevant mail thread has subject Functional dependencies and GROUP BY.CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products p LEFT JOIN sales s USING (product_id) GROUP BY product_id;
But, if I rewrite the query using views as follows
create view sel_product as SELECT p.product_id, p.name, p.price FROM products p;
create view sel_sales as SELECT s.units, s.product_id FROM ONLY sales s;
create view sel_product as SELECT p.product_id, p.name, p.price FROM products p;
create view sel_sales as SELECT s.units, s.product_id FROM ONLY sales s;
SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM sel_product p LEFT JOIN sel_sales s using(product_id) GROUP BY p.product_id;
The last SELECT statement gives error
ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM s...
ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM s...
The reason being, it doesn't look into the subqueries (in FROM clause) to infer that p.product_id is essentially product.product_id which is a primary key.
Attached find a crude patch to infer the same by traversing subqueries.
As I said the patch is crude and needs a better shape. If community is willing to accept the extension, I can work on it further.
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Attachment
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > The reason being, it doesn't look into the subqueries (in FROM clause) to > infer that p.product_id is essentially product.product_id which is a > primary key. Right. > Attached find a crude patch to infer the same by traversing subqueries. I think this is probably a bad idea. We could spend an infinite amount of code this way, with ever-increasing runtime cost and ever-decreasing usefulness, and where would we stop? I'm also a bit concerned about allowing illegal queries due to bugs of omission in the ever-more-complex checking code, which could be quite hard to find, and when we did find them we'd be faced with a backwards compatibility break if we fix them. A larger point is that the patch as proposed doesn't fix the stated problem, because it only descends into written-out subqueries. It would only succeed at looking into views if we applied it after rewriting, rather than in the parser. That's really not going to work. It would be a complete disaster if the dependencies of a query that references a view depend on the view's contents. regards, tom lane
On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That's correct. I tested it only with the written-out subqueries, to see if the idea works. But it started with the case involving views.
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:Right.
> The reason being, it doesn't look into the subqueries (in FROM clause) to
> infer that p.product_id is essentially product.product_id which is a
> primary key.I think this is probably a bad idea. We could spend an infinite amount
> Attached find a crude patch to infer the same by traversing subqueries.
of code this way, with ever-increasing runtime cost and ever-decreasing
usefulness, and where would we stop? I'm also a bit concerned about
allowing illegal queries due to bugs of omission in the
ever-more-complex checking code, which could be quite hard to find, and
when we did find them we'd be faced with a backwards compatibility break
if we fix them.
A larger point is that the patch as proposed doesn't fix the stated
problem, because it only descends into written-out subqueries.
That's correct. I tested it only with the written-out subqueries, to see if the idea works. But it started with the case involving views.
It
would only succeed at looking into views if we applied it after
rewriting, rather than in the parser. That's really not going to work.
It would be a complete disaster if the dependencies of a query that
references a view depend on the view's contents.
Can you please elaborate, why would it be a disaster?
Will we extend this functionality for written-out subqueries queries and/or views or none?
I am not touchy about the approach, I have taken. I am interested in the feature extension, whichever way it gets implemented.
regards, tom lane
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A larger point is that the patch as proposed doesn't fix the stated >> problem, because it only descends into written-out subqueries. It >> would only succeed at looking into views if we applied it after >> rewriting, rather than in the parser. That's really not going to work. >> It would be a complete disaster if the dependencies of a query that >> references a view depend on the view's contents. > Can you please elaborate, why would it be a disaster? Consider that we've done create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; create view v2 as select * from v1 group by id; Currently, v2 would be rejected but you would like to make it legal. Now consider alter table t1 drop primary key; This ALTER would have to be rejected, or else (with CASCADE) lead to dropping v2 but not v1. That's pretty ugly action-at-a-distance if you ask me. But worse, consider create or replace view v1 as select * from t2; where t2 exposes the same columns as t1 but lacks a primary-key constraint on id. This likewise would need to invalidate v2. We lack any dependency mechanism that could enforce that, and it seems seriously ugly that such a view redefinition could fail at all. (Note for instance that there's no place to put a CASCADE/RESTRICT option in CREATE OR REPLACE VIEW.) So quite aside from the implementation difficulties of looking into views for such constraints, I don't think the behavior would be pleasant if we did do it. Views are not supposed to expose properties of the underlying tables. regards, tom lane
> Can you please elaborate, why would it be a disaster?Consider that we've done
create table t1 (id int primary key, ... other stuff ...);
create view v1 as select * from t1;
create view v2 as select * from v1 group by id;
Currently, v2 would be rejected but you would like to make it legal.
Now consider
alter table t1 drop primary key;
This ALTER would have to be rejected, or else (with CASCADE) lead to
dropping v2 but not v1. That's pretty ugly action-at-a-distance
if you ask me. But worse, consider
create or replace view v1 as select * from t2;
where t2 exposes the same columns as t1 but lacks a primary-key
constraint on id. This likewise would need to invalidate v2. We lack
any dependency mechanism that could enforce that, and it seems seriously
ugly that such a view redefinition could fail at all. (Note for
instance that there's no place to put a CASCADE/RESTRICT option in
CREATE OR REPLACE VIEW.)
So quite aside from the implementation difficulties of looking into
views for such constraints, I don't think the behavior would be pleasant
if we did do it. Views are not supposed to expose properties of the
underlying tables.
Thanks for the explanation.
Is there any reason why do we want to check the functional dependencies at the time of parsing and not after rewrite? Obviously, by doing so, we will allow creation of certain views which will start throwing errors after the underlying table changes the primary key. Is it mandatory that we throw "functional dependency" related errors at the time of creation of views?
regards, tom lane
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > Is there any reason why do we want to check the functional dependencies at > the time of parsing and not after rewrite? Obviously, by doing so, we will > allow creation of certain views which will start throwing errors after the > underlying table changes the primary key. Is it mandatory that we throw > "functional dependency" related errors at the time of creation of views? From a usability standpoint, I would think so. And really the only excuse for the functional-dependency feature to exist at all is usability; it adds nothing you couldn't do without it. If we wanted to do something like this, I think the clean way to do it would be to invent a notion of unique/not-null/pkey constraints on views, so that the creator of a view could declaratively say that he wants such a property exposed. That is, the example would become something like create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; alter view v1 add primary key(id); create view v2 as select * from v1 group by id; The pkey constraint on v1 is just a catalog entry with a dependency on t1's pkey constraint; there's no actual index there. But now, v2 can be built with a dependency on v1's pkey, not t1's, and the action-at- a-distance problem goes away. For example, a "CREATE OR REPLACE v1" command could check that the new view definition still provides something for v1's pkey to depend on, and throw error or not without any need to examine the contents of other views. Dropping various elements of this schema would work unsurprisingly, too. This would, of course, require a significant chunk of new code, and personally I do not think the feature would be worth it. But it would be a clean and usable design. regards, tom lane
On Mon, Apr 29, 2013 at 7:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:From a usability standpoint, I would think so. And really the only
> Is there any reason why do we want to check the functional dependencies at
> the time of parsing and not after rewrite? Obviously, by doing so, we will
> allow creation of certain views which will start throwing errors after the
> underlying table changes the primary key. Is it mandatory that we throw
> "functional dependency" related errors at the time of creation of views?
excuse for the functional-dependency feature to exist at all is
usability; it adds nothing you couldn't do without it.
If we wanted to do something like this, I think the clean way to do it
would be to invent a notion of unique/not-null/pkey constraints on
views, so that the creator of a view could declaratively say that he
wants such a property exposed. That is, the example would become
something likealter view v1 add primary key(id);
create table t1 (id int primary key, ... other stuff ...);
create view v1 as select * from t1;create view v2 as select * from v1 group by id;The pkey constraint on v1 is just a catalog entry with a dependency on
t1's pkey constraint; there's no actual index there. But now, v2 can
be built with a dependency on v1's pkey, not t1's, and the action-at-
a-distance problem goes away. For example, a "CREATE OR REPLACE v1"
command could check that the new view definition still provides
something for v1's pkey to depend on, and throw error or not without any
need to examine the contents of other views. Dropping various elements
of this schema would work unsurprisingly, too.
This would, of course, require a significant chunk of new code, and
personally I do not think the feature would be worth it. But it
would be a clean and usable design.
Yes, this looks better design. But I do not see any interest as such. So, if I have to spend time here, there is higher chance it would go waste.
Will it be useful to have primary key grouping functionality extended to the subqueries? For example,
CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM (select * from products) p LEFT JOIN (select * from sales) s USING (product_id) GROUP BY product_id;
This subquery gives error (p.name should be part of group by clause), but functionally it's grouping based on primary key. Is there a better way to use the functional dependency for grouping?
CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM (select * from products) p LEFT JOIN (select * from sales) s USING (product_id) GROUP BY product_id;
This subquery gives error (p.name should be part of group by clause), but functionally it's grouping based on primary key. Is there a better way to use the functional dependency for grouping?
regards, tom lane
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company