I have a heavily used PostgreSQL 9.3.5 database on CentOS 6. Sometimes I need to add/remove columns, preferably without any service interruptions, but I get temporary errors.
I follow the safe operations list from
https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql but many operations cause troubles anyway when the more busy tables are updated.
Typically I have user defined functions for all operations, and my table and functions follow this pattern:
CREATE TABLE users (
id integer PRIMARY KEY,
name varchar NOT NULL,
to_be_removed integer NOT NULL
);
CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS
$$
BEGIN
RETURN QUERY SELECT * FROM users WHERE id = id_;
END;
$$ LANGUAGE plpgsql;
Then the actual queries are run by our application as
SELECT id, name FROM select_users(18);
As you can see the column to_be_removed is not selected. Then to remove the column I use:
ALTER TABLE users DROP COLUMN to_be_removed;
However, while the system is under load sometimes (more frequently and persistent the more load the system is experiencing) I get errors like these:
ERROR #42804 structure of query does not match function result type: Number of returned columns (2) does not match expected column count (3).
The same error can happen when columns are added. Can this be avoided somehow, or do I need to take the system offline during these kind of changes?