Re: Improving inferred query column names - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Improving inferred query column names
Date
Msg-id CADkLM=fJFR0tWuo+GprqXw6sDnK8YHZ=81OMd7wRnGXAH9pUqg@mail.gmail.com
Whole thread Raw
In response to Re: Improving inferred query column names  (Vladimir Churyukin <vladimir@churyukin.com>)
List pgsql-hackers
On Sat, Feb 11, 2023 at 3:47 PM Vladimir Churyukin <vladimir@churyukin.com> wrote:
For backwards compatibility I guess you can have a GUC flag controlling that behavior that can be set into backwards compatibility mode if required.
The previous functionality can be declared deprecated and removed (with the flag) once the current version becomes unsupported. 

Seems more like a per-session setting than a GUC.

Here's a suggestion off the top of my head.

We create a session setting inferred_column_name_template.

The template takes a formatting directive %N which is just a counter

SET inferred_column_name_template = 'col_%N'

which would give you col_1, col_2, regardless of what kind of expression the columns were

We could introduce another directive, %T

SET inferred_column_name_template = '%T_%N'

which prints the datatype short name of the column. In this case, %N would increment per datatype, so text_1, integer_1, text_2, timestamptz_1, text_3

Getting fancier, we could introduce something less datatype centric, %F

SET inferred_column_name_template = '%F_%N'

Which would walk the following waterfall and stop on the first match
   1. The datatype short name if the expression is explicitly casted (either CAST or ::)
   2. the name of the function if the outermost expression was a function (aggregate, window, or scalar),  so sum_1, substr_1
   3. 'case' if the outermost expression was  case
   4. 'expr' if the expression was effectively an operator (  SELECT 3+4, 'a'  || 'b' etc)
   5. the datatype short name for anything that doesn't match any of the previous, and for explicit casts

Keeping track of all the %N counters could get silly, so maybe a %P which is simply the numeric column position of the column, so your result set would go like:  id, name, col_3, last_login, col_5.

We would have to account for the case where the user left either %N or %P out of the template, so one of them would be an implied suffix if both were absent, or we maybe go with 

SET inferred_column_name_prefix = '%F_';
SET inferred_column_name_counter = 'position';   /* position, counter, per_type_counter */

Or we just cook up a few predefined naming schemes, and let the user pick from those.

One caution I have is that I have seen several enterprise app database designs that have lots of user-customizable columns with names like varchar1, numeric4, etc. Presumably the user would know their environment and not pick a confusing template.

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Next
From: Mark Dilger
Date:
Subject: Re: Transparent column encryption