Re: Idea: Avoid JOINs by using path expressions to follow FKs - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Idea: Avoid JOINs by using path expressions to follow FKs
Date
Msg-id CAFj8pRBPFCu9DhewNBKxm351C0MFOxKvwQpbZen4ViDq6+CgsQ@mail.gmail.com
Whole thread Raw
In response to Re: Idea: Avoid JOINs by using path expressions to follow FKs  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Idea: Avoid JOINs by using path expressions to follow FKs  ("Joel Jacobson" <joel@compiler.org>)
Re: Idea: Avoid JOINs by using path expressions to follow FKs  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers


I agree, it's not very comfortable. Maybe we can think of ways to improve the comfort?

Here are two such ideas:
 
Idea #1
=======

Initial semi-automated script-assisted renaming of existing foreign keys.

In my experiences, multiple foreign keys per primary table is quite common,
but not multiple foreign keys referencing the same foreign table from the same primary table.

If so, then a script can be written to rename most existing foreign keys:

--
-- Script to rename foreign keys to the name of the foreign table.
-- Tables with multiple foreign keys referencing the same foreign table are skipped.
--
DO
$_$
DECLARE
sql_cmd text;
BEGIN
FOR sql_cmd IN
  SELECT * FROM
  (
    SELECT
      format
      (
        'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',
        conrel_nsp.nspname,
        conrel.relname,
        pg_constraint.conname,
        confrel.relname
      ) AS sql_cmd,
      COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)
      AS count_foreign_keys_to_same_table
    FROM pg_constraint
    JOIN pg_class AS conrel
      ON conrel.oid = pg_constraint.conrelid
    JOIN pg_class AS confrel
      ON confrel.oid = pg_constraint.confrelid
    JOIN pg_namespace AS conrel_nsp
      ON conrel_nsp.oid = conrel.relnamespace
    WHERE pg_constraint.contype = 'f'
  ) AS x
  WHERE count_foreign_keys_to_same_table = 1
LOOP
  RAISE NOTICE '%', sql_cmd;
  EXECUTE sql_cmd;
END LOOP;
END
$_$;

For our example data model, this would produce:

ALTER TABLE public.orders RENAME CONSTRAINT orders_customer_id_fkey TO customers;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_order_id_fkey TO orders;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_product_id_fkey TO products;

you fix one issue, but you lost interesting informations


To clarify what I mean with multiple foreign keys to the same table, here is an example:

CREATE TABLE p (
a int,
b int,
PRIMARY KEY (a),
UNIQUE (a,b)
);

CREATE TABLE f1 (
a int,
b int,
FOREIGN KEY (a) REFERENCES p
);

CREATE TABLE f2 (
a int,
b int,
FOREIGN KEY (a) REFERENCES p,
FOREIGN KEY (a,b) REFERENCES p(a,b)
);

For this example, only f1's foreign key constraint would be renamed:

ALTER TABLE public.f1 RENAME CONSTRAINT f1_a_fkey TO p;

Idea #2
=======

Allow user to define the default format for new foreign key constraint name.

The format could use template patterns similar to how e.g. to_char() works.
If a conflict is found, it would do the same as today, try appending an increasing integer.

Users could then decide on a company-wide consistent naming convention
on how foreign keys are usually named, which would reduce the need to manually name them
using the CONSTRAINT keyword.

Finally, just for fun, here is an example of how we could write the query above,
if we would have real foreign keys on the catalogs:

  SELECT
    format
    (
      'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',
      pg_constraint.conrel.pg_namespace.nspname,
      pg_constraint.conrel.relname,
      pg_constraint.conname,
      pg_constraint.confrel.relname,
    ) AS sql_cmd,
    COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)
    AS count_foreign_keys_to_same_table
  FROM pg_constraint
  WHERE pg_constraint.contype = 'f'

In this example the foreign key constraint names have been
derived from the column names since both conrelid and confrelid,
reference pg_class.

I think this is a good example of where this improves the situation the most,
when you have multiple joins of the same table, forcing you to come up with multiple aliases
for the same table, keeping them all in memory while writing and reading such queries.

I do not have an opinion about this, I am sorry.  I cannot imagine so this can work. In some complex cases, the graphic query designer can work better. The invention of new syntax, or new tool should be better just than checking correct usage of foreign constraints. I have worked with SQL for over 25 years, and there were a lot of tools, and people don't use it too much. So I am not good at dialog in this area, because I am a little bit too sceptical :).

I remember multiple self joins only when developers used an EAV model. This is an antipattern, and today we have better tools, and we don't need it.  It is scary, because it is completely against the relational model. If I want to fix it, then I will invent a new different syntax type that can be used for optimization of this case. But I have no idea how to do it well. Maybe:

SELECT * FROM EAVTOENTITY( FROM data  GROUP BY objid COLUMN name varchar WHEN attrname = 'name',  surname varchar WHEN attrname = 'surname',  ...)

 
On second hand, it can be very nice to have some special strict mode in Postgres - maybe slower, not compatible, that disallow some dangerous or unsafe queries. But it is possible to solve in extensions, but nobody did it. Something like plpgsql_check for SQL - who will write sql_check?

Not a bad idea, this is a real problem, such a tool would be useful even with this proposed new syntax, as normal JOINs would continue to co-exist, for which nonsensical joins would still be possible.

Maybe some similar what we have in plpgsql - extra checks - with three levels, off, warnings, errors.



/Joel

pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: [PATCH] add concurrent_abort callback for output plugin
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: wal stats questions