Re: Drop or alter column under load give ERROR #42804 structure of query does not match function result type: - Mailing list pgsql-general

From Bill Moran
Subject Re: Drop or alter column under load give ERROR #42804 structure of query does not match function result type:
Date
Msg-id 20151009072750.6e5676880baecd18618f08b1@potentialtech.com
Whole thread Raw
In response to Drop or alter column under load give ERROR #42804 structure of query does not match function result type:  (Victor Blomqvist <vb@viblo.se>)
List pgsql-general
On Fri, 9 Oct 2015 14:32:44 +0800
Victor Blomqvist <vb@viblo.se> wrote:

> I have a heavily used PostgreSQL 9.3.5 database on CentOS 6. Sometimes I
> need to add/remove columns, preferably without any service interruptions,
> but I get temporary errors.
>
> I follow the safe operations list from
> https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql
> but many operations cause troubles anyway when the more busy tables are
> updated.

I seriously doubt if Paul did enough research to be sure that "safe" is an
absolute term for that list.

> Typically I have user defined functions for all operations, and my table
> and functions follow this pattern:
>
> CREATE TABLE users (
>   id integer PRIMARY KEY,
>   name varchar NOT NULL,
>   to_be_removed integer NOT NULL
> );
>
> CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS
> $$
> BEGIN
>   RETURN QUERY SELECT * FROM users WHERE id = id_;
> END;
> $$ LANGUAGE plpgsql;
>
> Then the actual queries are run by our application as
>
> SELECT id, name FROM select_users(18);
>
> As you can see the column to_be_removed is not selected.

Sure it is ... the function does SELECT *, which absolutely includes the
to_be_removed column. The fact that you ignore that column in a
subsequent superselect doesn't mean that the query in the function knows
to do so.

> Then to remove the
> column I use:
>
> ALTER TABLE users DROP COLUMN to_be_removed;
>
> However, while the system is under load sometimes (more frequently and
> persistent the more load the system is experiencing) I get errors like
> these:
>
>     ERROR #42804 structure of query does not match function result type:
> Number of returned columns (2) does not match expected column count (3).

This is a bit surprising to me. I would expect Postgres to have one or the
other definition of that row within a single transaction, but what seems to
be happening is that the ALTER causes the row definition to be changed in
the middle of the transaction, thus the the function may return 3 columns,
but when the outer query checks the type, it sees that it should only
have 2.

> The same error can happen when columns are added. Can this be avoided
> somehow, or do I need to take the system offline during these kind of
> changes?
>
> For reference, there was a similar but not same issue posted to psql-bugs a
> long time ago:
> http://www.postgresql.org/message-id/8254631e-61a5-4c03-899f-22fdcf369a46@e23g2000vbe.googlegroups.com
>
> I posted this same question at dba.stackexchange and got the advice to
> repost here:
> http://dba.stackexchange.com/questions/117511/postgresql-drop-column-under-load-give-wrong-number-of-columns-errors

This has a lot to do with internals. You should wait a bit to see if you
get a good answer, but if not you might need to post to the hackers list
so the developers can chime in. My opinion is that this is a bug, but it's
an obscure enough bug that it's not surprising that it's gone unfixed for
a while.

--
Bill Moran


pgsql-general by date:

Previous
From: Albe Laurenz
Date:
Subject: Re: Version management for extensions
Next
From: Adrian Klaver
Date:
Subject: Re: Drop or alter column under load give ERROR #42804 structure of query does not match function result type: