pg_upgrade improvements - Mailing list pgsql-hackers

From Harold Giménez
Subject pg_upgrade improvements
Date
Msg-id CABQCq-Q9dotchbQsCdyYLBCEo1Z-ZnD_3-cgCt8ajeb5wGuLUQ@mail.gmail.com
Whole thread Raw
Responses Re: pg_upgrade improvements  (Stephen Frost <sfrost@snowman.net>)
Re: pg_upgrade improvements  (Peter Eisentraut <peter_e@gmx.net>)
Re: pg_upgrade improvements  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Hi all,

I've written a pg_upgrade wrapper for upgrading our users (heroku) to postgres 9.1. In the process I encountered a specific issue that could easily be improved. We've had this process work consistently for many users both internal and external, with the exception of just a few for whom the process fails and required manual handholding.

Before it performs the upgrade, the pg_upgrade program starts the old cluster, does various checks, and then attempts to stop it. On occasion stopping the cluster fails - I've posted command output on a gist [1]. Manually running the pg_upgrade shortly afterwards succeeds. We believe stopping the cluster times out because there are other connections to the cluster that are established in that small window. There could be incoming connections for a number of reasons: either the user or the user's applications are reestablishing connections, or something like collectd on the localhost attempts to connect during that small window.

Possible workarounds on the current version:

* Add an iptables rule to temporarily reject connections from the outside. This is not viable because in a multitenant environment a process may write an iptables rule, and meanwhile another process may permanently save rules, including the temporary one. We can defend against that, but it does add a lot of complexity.
* Rewrite pg_hba.conf temporarily while the pg_upgrade script runs to disallow any other connections.

A possible solution for pg_upgrade is for it to make pg_upgrade use the --force flag when stopping the cluster to kick connections out. There is no reason to be polite in this case. Another idea that was kicked around with my colleagues was to start the cluster in single-user mode, or only allow unix socket connections somewhere in /tmp. Anything that rejects other connections would be helpful.

It would also be nice if the invocation of pg_ctl didn't pipe its output to /dev/null. I'm sure it would contain information that would directly point at the root cause and could've saved some debugging and hand waving time.

Finally, just a note that while we haven't performed a huge number of upgrades yet, we have upgraded a few production systems and for the most part it has worked great.

Regards,

-Harold

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: invalid search_path complaints
Next
From: Robert Haas
Date:
Subject: Re: man pages for contrib programs