Thread: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

Hi,


--------------------------------------------------------------------------------------------------------------
Actual column names used while creation of foreign table are not allowed to be an
empty string, but when we use column_name as an empty string in OPTIONS during
CREATE or ALTER of foreign tables, it is allowed.

EXAMPLES:-
1) CREATE FOREIGN TABLE test_fdw("" VARCHAR(15), id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
ERROR:  zero-length delimited identifier at or near """"
LINE 1: CREATE FOREIGN TABLE test_fdw("" VARCHAR(15), id VARCHAR(5))...

2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) OPTIONS (column_name ''), id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
CREATE FOREIGN TABLE

postgres@43832=#\d test_fdw
                          Foreign table "public.test_fdw"
 Column |         Type          | Collation | Nullable | Default |   FDW options    
--------+-----------------------+-----------+----------+---------+------------------
 name   | character varying(15) |           |          |         | (column_name '')
 id     | character varying(5)  |           |          |         |
Server: localhost_fdw
FDW options: (schema_name 'public', table_name 'test')

--------------------------------------------------------------------------------------------------------------

Due to the above, when we try to simply select a remote table, the remote query uses
the empty column name from the FDW column option and the select fails.

EXAMPLES:-
1) select * from test_fdw;
ERROR:  zero-length delimited identifier at or near """"
CONTEXT:  remote SQL command: SELECT "", id FROM public.test

2) explain verbose select * from test_fdw;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Foreign Scan on public.test_fdw  (cost=100.00..297.66 rows=853 width=72)
   Output: name, id
   Remote SQL: SELECT "", id FROM public.test
(3 rows)

--------------------------------------------------------------------------------------------------------------

We can fix this issue either during fetching of FDW column option names while
building remote query or we do not allow at CREATE or ALTER of foreign tables itself.
We think it would be better to disallow adding the column_name option as empty in
CREATE or ALTER itself as we do not allow empty actual column names for a foreign
table. Unless I missed to understand the purpose of allowing column_name as empty.

THE PROPOSED SOLUTION OUTPUT:-
1) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) OPTIONS (column_name ''), id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
ERROR:  column generic option name cannot be empty

2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15), id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test');
CREATE FOREIGN TABLE

ALTER FOREIGN TABLE test_fdw ALTER COLUMN id OPTIONS (column_name '');
ERROR:  column generic option name cannot be empty

--------------------------------------------------------------------------------------------------------------

PFA, the fix and test cases patches attached. I ran "make check world" and do
not see any failure related to patches. But, I do see an existing failure
t/001_pgbench_with_server.pl


Regards,
Nishant.

P.S
Thanks to Jeevan Chalke and Suraj Kharage for their inputs for the proposal.
Nishant Sharma <nishant.sharma@enterprisedb.com> writes:
> Actual column names used while creation of foreign table are not allowed to
> be an
> empty string, but when we use column_name as an empty string in OPTIONS
> during
> CREATE or ALTER of foreign tables, it is allowed.

Is this really a bug?  The valid remote names are determined by
whatever underlies the FDW, and I doubt we should assume that
SQL syntax restrictions apply to every FDW.  Perhaps it would
be reasonable to apply such checks locally in SQL-based FDWs,
but I object to assuming such things at the level of
ATExecAlterColumnGenericOptions.

More generally, I don't see any meaningful difference between
this mistake and the more common one of misspelling the remote
column name, which is something we're not going to be able
to check for (at least not in anything like this way).  If
you wanted to move the ease-of-use goalposts materially,
you should be looking for a way to do that.

            regards, tom lane



On Fri, Aug 16, 2024 at 8:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Nishant Sharma <nishant.sharma@enterprisedb.com> writes:
> > Actual column names used while creation of foreign table are not allowed to
> > be an
> > empty string, but when we use column_name as an empty string in OPTIONS
> > during
> > CREATE or ALTER of foreign tables, it is allowed.
>
> Is this really a bug?  The valid remote names are determined by
> whatever underlies the FDW, and I doubt we should assume that
> SQL syntax restrictions apply to every FDW.  Perhaps it would
> be reasonable to apply such checks locally in SQL-based FDWs,
> but I object to assuming such things at the level of
> ATExecAlterColumnGenericOptions.

I agree.

>
> More generally, I don't see any meaningful difference between
> this mistake and the more common one of misspelling the remote
> column name, which is something we're not going to be able
> to check for (at least not in anything like this way).  If
> you wanted to move the ease-of-use goalposts materially,
> you should be looking for a way to do that.

I think this check should be delegated to an FDW validator.

--
Best Wishes,
Ashutosh Bapat



On Wed, Oct 9, 2024 at 7:12 AM Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:
> I have included them in v3. v3 does not allow empty schema_name &
> table_name options along with column_name.

I was looking at these patches today and trying to understand whether
the proposed error message is consistent with what we have done
elsewhere.

The proposed error message was "cannot use empty value for option
\"%s\". I looked for error messages that contained the phrase "empty"
or "zero". I did not find a case exactly like this one. The closet
analogues I found were things like this:

invalid cursor name: must not be empty
output cannot be empty string
DSM segment name cannot be empty
row path filter must not be empty string

These messages aren't quite as consistent as one might wish, so it's a
little hard to judge here. Nonetheless, I propose that the error
message we use here should end with either "cannot" or "must not"
followed by either "be empty" or "be empty string". I think my
preferred variant would be "value for option \"%s\" must not be empty
string". But there's certainly oodles of room for bikeshedding.

Apart from that, I see very little to complain about here. Once we
agree on the error message text I think something can be committed.
For everyone's convenience, I suggest merging the two patches into
one. I highly approve of separating patches by topic, but there's
usually no point in separating them by directory.

--
Robert Haas
EDB: http://www.enterprisedb.com



Hi,


Summary of this thread-

Tom had the following concern:-
Regarding location of check in
ATExecAlterColumnGenericOptions. And spell check issue
comparison. Which I believe was addressed by v2 and
its response.

Ashutosh had the suggestion:-
Check should be delegated to an FDW validator. Which I
believe was addressed in v2.

Michael had the following concern:-
We should also care about empty values in schema_name
and table_name. Which I believe I have addressed in v3
patch.

Robert had the following concern:-
Regarding error message and single patch for this. Which I
believe I have addressed in v4 patch.

Tom, Ashutosh, Michael, Robert please let me know if I was
able to address your concerns or if you feel differently.

Assuming Tom, Ashutosh, Michael and Robert feel as though
I have addressed the concerns mentioned above, does anybody
have any additional feedback or concerns with this patch? If not,
I would request we move to commit phase.
Thank you!


Regards,
Nishant Sharma (EDB).
On Thu, Dec 19, 2024 at 7:08 AM Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:
> Thanks Robert for your response and the review comments!
>
> I have addressed both your suggestions, by changing the error
> message and merging both the patches to one.
>
> PFA, patch set v4.

Cool. I don't want to commit this right before my vacation but I'll
look into it in the new year if nobody beats me to it.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Thu, Dec 19, 2024 at 8:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Cool. I don't want to commit this right before my vacation but I'll
> look into it in the new year if nobody beats me to it.

Nobody has beaten me to it, but I thought of one potential issue.
Suppose that somebody has a foreign table that exists today with one
of these properties set to the empty string. Such a foreign table is
not usable for queries, because the queries won't make it past scan.l
on the remote side, but it's allowed to exist. With this patch, it's
not allowed to exist any more. That would be fine if no such objects
exist in the wild, but if they do, people will get a pg_dump or
pg_upgrade failure when they try to upgrade to a release that includes
this change. Does that mean that we shouldn't commit this patch?

I'm inclined to think that it's probably still OK, because (1) few
users are likely to have such objects, (2) the workaround is easy,
just drop the object or fix it to do something useful, just as users
already need to do in plenty of other, similar cases and (3) unclear
error messages are not great and future users might benefit from this
error message being better. However, one could take the opposite
position and say that (C1) it is unlikely that anyone will try to do
this in the first place and (C2) the error message that is currently
generated when you try to do this isn't really all that unclear, and
therefore it's not worth creating even a minor dump-and-reload hazard;
and that position also seems pretty defensible to me.

Other opinions?

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> Nobody has beaten me to it, but I thought of one potential issue.
> Suppose that somebody has a foreign table that exists today with one
> of these properties set to the empty string. Such a foreign table is
> not usable for queries, because the queries won't make it past scan.l
> on the remote side, but it's allowed to exist.

Hmm.  A query not using that column would work just fine, so it's
at least arguable that there are such beasts in reality and the
user hasn't noticed the bug.  (I doubt this, but if we're worrying
about improbable failures...)

> With this patch, it's
> not allowed to exist any more. That would be fine if no such objects
> exist in the wild, but if they do, people will get a pg_dump or
> pg_upgrade failure when they try to upgrade to a release that includes
> this change. Does that mean that we shouldn't commit this patch?

I was down on it to start with, on the grounds that it wasn't adding
enough ease-of-use to justify even a relatively small amount of added
code.  I'm not sure that the dump-reload failure angle is much of a
concern in reality, but I would have voted not to commit before and
I still do.

            regards, tom lane



On Thu, Feb 27, 2025 at 4:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I was down on it to start with, on the grounds that it wasn't adding
> enough ease-of-use to justify even a relatively small amount of added
> code.  I'm not sure that the dump-reload failure angle is much of a
> concern in reality, but I would have voted not to commit before and
> I still do.

OK, fair. Anyone want to argue the other side?

--
Robert Haas
EDB: http://www.enterprisedb.com