Thread: libpq: unexpected return code from PQexecParams with a DO INSTEAD rule present

Let's say I have a table "_users", and also a view "users" that excludes
all soft-deleted records from that table:

> CREATE SCHEMA test_libpq;
> CREATE TABLE test_libpq._users (
>   id SERIAL PRIMARY KEY,
>   name VARCHAR(255) UNIQUE,
>   deleted_at TIMESTAMP
> );
> 
> CREATE VIEW test_libpq.users AS
> SELECT * FROM test_libpq._users
> WHERE deleted_at IS NULL;
Let's also add a DO INSTEAD rule that translates DELETE statements on
the view into soft-delete UPDATE statements:

> CREATE RULE soft_delete AS
> ON DELETE TO test_libpq.users DO INSTEAD (
>   UPDATE test_libpq._users SET deleted_at = NOW()
>   WHERE id = OLD.id AND deleted_at IS NULL
>   RETURNING *
> );

Cool, now if I go into psql command line and do a few inserts and
deletes, everything works as intended:

> postgres=# INSERT INTO test_libpq.users (name) VALUES ('Joe');
> INSERT 0 1
> postgres=# DELETE FROM test_libpq.users WHERE name = 'Joe' RETURNING id;
>  id 
> ----
>   1
> (1 row)
> 
> DELETE 0
> postgres=# SELECT * FROM test_libpq.users;
>  id | name | deleted_at 
> ----+------+------------
> (0 rows)
> 
> postgres=# SELECT * FROM test_libpq._users;
>  id | name |         deleted_at         
> ----+------+----------------------------
>   1 | Joe  | 2024-07-14 14:55:05.585433
> (1 row)

The problems begin when I translate that into C code using libpq. That
DELETE statement should return with a status code PGRES_TUPLES_OK and I
should be able to read a list of deleted ids. And indeed it works like
that with PQexec(). But not PQexecParams()!

With PQexecParams() it instead returns PGRES_COMMAND_OK and doesn't
return the list of deleted ids. This only happens with that DO INSTEAD
rule present, without it the function works as expected. But obviously
does a hard delete instead of a soft one.

This behavior isn't documented. 34.3.1. Main Functions [1] says this
about PQexecParams:

> PQexecParams is like PQexec, but offers additional
> functionality: parameter values can be specified separately from
> the command string proper, and query results can be requested in
> either text or binary format.
> [...]
> Unlike PQexec, PQexecParams allows at most one SQL command in
> the given string. (There can be semicolons in it, but not more
> than one nonempty command.) This is a limitation of the
> underlying protocol, but has some usefulness as an extra defense
> against SQL-injection attacks.
[1]: https://www.postgresql.org/docs/16/libpq-exec.html#LIBPQ-PQEXECPARAMS

The manual mentions some limitations in the protocol regarding
semicolons, but nothing about rules.

Here is a complete C++ program to reproduce the issue:
https://paste.sr.ht/~uh/80d023b772bcfe8c1eda8f6b69b5b4e2c0352dc1
(also pasted at the end of the email). I tested it with a postgresql
instance launched like this:

> docker run -it --rm -p 5432:5432 -e POSTGRES_PASSWORD=postgres postgres:16.3
I also noticed that this same problem affects another unofficial
postgresql library, but for Go: https://github.com/lib/pq. So this
problem is not local to libpq, but affects other clients as well.

The C++ program that I linked to above:
> #include <cstdio>
> #include <cstdlib>
> #include <cstring>
> #include <functional>
> #include <iostream>
> #include <libpq-fe.h>
> #include <sstream>
> #include <stdexcept>
> 
> #define TEST(EXPR)                                  \
>   try {                                             \
>     EXPR;                                           \
>     std::cout << "PASSED: " #EXPR "\n";             \
>   } catch (const std::runtime_error& e) {           \
>     std::cout << "FAILED: " #EXPR "\n" << e.what(); \
>     exit(EXIT_FAILURE);                             \
>   }
> 
> void assertRes(PGconn* conn, PGresult* res, ExecStatusType expected) {
>   auto status = PQresultStatus(res);
>   if (status != expected) {
>     auto err = std::stringstream()
>       << "query failed with status " << PQresStatus(status)
>       << "\nbut expected: " << PQresStatus(expected)
>       << "\nmessage: " << PQerrorMessage(conn);
>     throw std::runtime_error(err.str());
>   }
> }
> 
> PGconn* setup() {
>   PGconn* conn = PQconnectdb("postgresql://postgres:postgres@localhost:5432/postgres");
>   if (PQstatus(conn) != CONNECTION_OK) {
>     fprintf(stderr, "connection error: %s\n", PQerrorMessage(conn));
>     exit(EXIT_FAILURE);
>   }
> 
>   const char* query = R"(
>     DROP SCHEMA IF EXISTS test_libpq CASCADE;
>     CREATE SCHEMA test_libpq;
> 
>     CREATE TABLE test_libpq._users (
>       id SERIAL PRIMARY KEY,
>       name VARCHAR(255) UNIQUE,
>       deleted_at TIMESTAMP
>     );
> 
>     INSERT INTO test_libpq._users
>       (name)
>     VALUES
>       ('a user');
> 
>     -- A view that only contains non-deleted users:
>     CREATE VIEW test_libpq.users AS
>     SELECT * FROM test_libpq._users
>     WHERE deleted_at IS NULL;
> 
>     -- DELETE on this view does a soft delete instead:
>     CREATE RULE soft_delete AS
>     ON DELETE TO test_libpq.users DO INSTEAD (
>       UPDATE test_libpq._users SET deleted_at = NOW()
>       WHERE id = OLD.id AND deleted_at IS NULL
>       RETURNING *
>     );
>   )";
> 
>   PGresult* res = PQexec(conn, query);
>   assertRes(conn, res, PGRES_COMMAND_OK);
>   PQclear(res);
> 
>   return conn;
> }
> 
> static const char* softDelete = R"(
>   DELETE FROM test_libpq.users
>   RETURNING id
> )";
> 
> static const char* hardDelete = R"(
>   DELETE FROM test_libpq._users
>   RETURNING id
> )";
> 
> using execFn = std::function<PGresult*(PGconn* conn, const char* query)>;
> 
> void testDelete(execFn execQuery, const char* query) {
>   PGconn* conn = setup();
>   PGresult* res = execQuery(conn, query);
>   assertRes(conn, res, PGRES_TUPLES_OK);
>   PQclear(res);
>   PQfinish(conn);
> }
> 
> PGresult* usingPQExec(PGconn* conn, const char* query) {
>   return PQexec(conn, query); 
> }
> 
> PGresult* usingPQExecParams(PGconn* conn, const char* query) {
>   return PQexecParams(conn, query,
>                       0, nullptr, nullptr,
>                       nullptr, nullptr, 0);
> }
> 
> int main() {
>   TEST(testDelete(usingPQExec,       hardDelete));
>   TEST(testDelete(usingPQExecParams, hardDelete));
>   TEST(testDelete(usingPQExec,       softDelete));
>   TEST(testDelete(usingPQExecParams, softDelete)); // this fails
> }
Cheers!



Hi,

> Let's say I have a table "_users", and also a view "users" that excludes
> all soft-deleted records from that table:
>
> [...]

Thanks for the report. Here is what I discovered so far.

Took me some time to figure out exact steps to run the example (they
are different comparing to C code):

```
export PRFX=/home/eax/pginstall
g++ -g -Wall -o test_libpq -I$PRFX/include
-L$PRFX/lib/aarch64-linux-gnu/ test_libpq_soft_delete.cpp -lpq
LD_LIBRARY_PATH=$PRFX/lib/aarch64-linux-gnu/ ./test_libpq
```

Here is what is passed over the network in
`TEST(testDelete(usingPQExec, softDelete));` case:

```
192.168.88.36:34294 -> 192.168.88.36:5432, 53 (0x35) bytes

        00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

0000    51 00 00 00 34 0A 20 20 44 45 4C 45 54 45 20 46    Q...4.  DELETE F
0010    52 4F 4D 20 74 65 73 74 5F 6C 69 62 70 71 2E 75    ROM test_libpq.u
0020    73 65 72 73 0A 20 20 52 45 54 55 52 4E 49 4E 47    sers.  RETURNING
0030    20 69 64 0A 00                                      id..

192.168.88.36:5432 -> 192.168.88.36:34294, 60 (0x3c) bytes

        00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

0000    54 00 00 00 1B 00 01 69 64 00 00 00 00 00 00 00    T......id.......
0010    00 00 00 17 00 04 FF FF FF FF 00 00 44 00 00 00    ............D...
0020    0B 00 01 00 00 00 01 31 43 00 00 00 0D 44 45 4C    .......1C....DEL
0030    45 54 45 20 30 00 5A 00 00 00 05 49                ETE 0.Z....I
```

Note the second packet with PqMsg_RowDescription ('T') and
PqMsg_DataRow('D') messages. They contain the RETURNING ids (in this
case "1").

And here is `TEST(testDelete(usingPQExecParams, softDelete))`:

```
192.168.88.36:55166 -> 192.168.88.36:5432, 93 (0x5d) bytes

        00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

0000    50 00 00 00 37 00 0A 20 20 44 45 4C 45 54 45 20    P...7..  DELETE
0010    46 52 4F 4D 20 74 65 73 74 5F 6C 69 62 70 71 2E    FROM test_libpq.
0020    75 73 65 72 73 0A 20 20 52 45 54 55 52 4E 49 4E    users.  RETURNIN
0030    47 20 69 64 0A 00 00 00 42 00 00 00 0E 00 00 00    G id....B.......
0040    00 00 00 00 01 00 00 44 00 00 00 06 50 00 45 00    .......D....P.E.
0050    00 00 09 00 00 00 00 00 53 00 00 00 04             ........S....

192.168.88.36:5432 -> 192.168.88.36:55166, 35 (0x23) bytes

        00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

0000    31 00 00 00 04 32 00 00 00 04 6E 00 00 00 04 43    1....2....n....C
0010    00 00 00 0D 44 45 4C 45 54 45 20 30 00 5A 00 00    ....DELETE 0.Z..
0020    00 05 49                                           ..I
```

In this case libpq chooses extended query protocol (not 'P' and 'B'
packages - PqMsg_Parse and PqMsg_Bind messages). Then go "describe
portal", "execute" and "sync".

The server answers "parse complete" ('1'), "bind complete" ('2'), and
most importantly PqMsg_NoData ('n').

So far it seems to be an error somewhere in the protocol
implementation. Instead of NoData we should reply with DataRow.

-- 
Best regards,
Aleksander Alekseev



Hi Vasilii,

> > Let's say I have a table "_users", and also a view "users" that excludes
> > all soft-deleted records from that table:
> >
> > [...]
>
> Thanks for the report. Here is what I discovered so far.
>
> [...]

Ok, this has to do with different code paths of the different protocols.

PQexec() uses the simple query protocol. Corresponding function
exec_simple_query() just creates a receiver using CreateDestReceiver()
and passes it to PortalRun(). Corresponding callbacks of the receiver
printtup_startup() / printtup() have the access to TupleDesc and most
importantly don't rely on the value of portal->tupDesc. This is why
PQexec always works as you expect.

PQexecParams() uses the extended query protocol that works
differently. PqMsg_NoData message you get is an answer on
PqMsg_Describe ("describe the portal"). Note that you are sending
PqMsg_Describe when the portal is not yet executed. This PqMsg_NoData
response comes from the corresponding message handler,
exec_describe_portal_message(). This handler *does* rely on
portal->tupDesc, because it's all it has access to. And this value is
different depending on whether you access a regular table or a VIEW.

For a regular table ChoosePortalStrategy() returns
PORTAL_ONE_RETURNING and PortalStart() sets portal->tupDesc. For VIEWs
ChoosePortalStrategy() returns PORTAL_MULTI_QUERY and portal->tupDesc
is set to NULL. This is why you get different results for a regular
table and a VIEW when using PQexecParams() and the extended query
protocol.

It seems to me that we have several options now:

1. Claim that this is not a bug. To my knowledge, simple query
protocol never committed to behave identically to the extended query
protocol.

2. Teach the code to process this case differently than it does now.
In theory, during the Bind phase, when PortalStart() and
ChoosePortalStrategy() are called, we already have a query plan where
the rewrite rules are applied. So we should be able to deduce what
will actually happen when querying the VIEW.

3. Claim that we could do (2) but it isn't worth an effort and invest
our resources into working on something else.

Thoughts?

-- 
Best regards,
Aleksander Alekseev



Aleksander Alekseev <aleksander@timescale.com> writes:
> Ok, this has to do with different code paths of the different protocols.
> ...
> For a regular table ChoosePortalStrategy() returns
> PORTAL_ONE_RETURNING and PortalStart() sets portal->tupDesc. For VIEWs
> ChoosePortalStrategy() returns PORTAL_MULTI_QUERY and portal->tupDesc
> is set to NULL. This is why you get different results for a regular
> table and a VIEW when using PQexecParams() and the extended query
> protocol.

Thanks for doing some investigation here!

I think the actual issue is that the application of the rule generates
an UPDATE query that is not marked canSetTag.  Then, after we throw
away the original DELETE because the rule is marked DO INSTEAD, we
are left with a single non-canSetTag query, which causes
ChoosePortalStrategy to select PORTAL_MULTI_QUERY mode though perhaps
PORTAL_ONE_RETURNING would be more apropos.  It makes sense that
we wouldn't be willing to provide a result tupdesc in MULTI_QUERY
mode (there might be multiple candidate tupdescs, or none); so that
part of it isn't wrong.

I wonder whether ChoosePortalStrategy should be willing to select
PORTAL_ONE_RETURNING if there is one INSERT/UPDATE/DELETE/MERGE
RETURNING query, even if it's not canSetTag.  A brief experiment
with forcing that crashed because QueryListGetPrimaryStmt failed,
but perhaps it'd be okay for that to accept a non-canSetTag
statement too, if it's the only one.  I didn't push further than
that.

An alternative that might be proposed is to allow canSetTag to
be set on the single surviving statement, but I think we do not
want that, because it would change the command completion tag.
Right now all these alternatives produce "DELETE" as the command
tag, but that would cause them to produce "UPDATE".

Another problem: you would still get the same funny results if there
were also a DO ALSO rule (because now there's definitely multiple
queries in the portal), even if that rule didn't return anything
and so apparently shouldn't change the protocol-level behavior.

On the whole, given that this behavior has stood for a couple
of decades without complaints, I'm pretty leery of changing it.

> It seems to me that we have several options now:

> 1. Claim that this is not a bug. To my knowledge, simple query
> protocol never committed to behave identically to the extended query
> protocol.

It explicitly disclaims that, in fact.  I think this is basically
a consequence of the decision that extended protocol can return
only one tuple set, while simple protocol can return more.
We decided that that tuple set would always be the one returned
by the original query, not by any rule.  IIRC, that decision
predated our invention of RETURNING, and maybe we needed to think
harder about the interaction at that time, but we didn't.

The protocol documentation fails to say this AFAICS, which we
should improve if we elect not to change the behavior.  The
CREATE RULE man page has some related text but doesn't say it
either.

> 2. Teach the code to process this case differently than it does now.
> ...
> 3. Claim that we could do (2) but it isn't worth an effort and invest
> our resources into working on something else.

Yeah, that.  Rules are such a backwater that it's hard to justify
putting time into them.  I think if we wanted a cleaner answer here,
we might have to split canSetTag into two flags, one to set the
command tag and one to mark the primary tuple-returning query.
That would definitely not be back-patchable, and I'm unconvinced
that it's worth the development effort.

            regards, tom lane