pg_receivewal makes a bad daemon - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | pg_receivewal makes a bad daemon |
Date | |
Msg-id | CA+TgmobgkyqMOwn64_5t3qQ6GdhOOMFki5w9f6278NhU5r7=oA@mail.gmail.com Whole thread Raw |
Responses |
Re: pg_receivewal makes a bad daemon
Re: pg_receivewal makes a bad daemon |
List | pgsql-hackers |
You might want to use pg_receivewal to save all of your WAL segments somewhere instead of relying on archive_command. It has, at the least, the advantage of working on the byte level rather than the segment level. But it seems to me that it is not entirely suitable as a substitute for archiving, for a couple of reasons. One is that as soon as it runs into a problem, it exits, which is not really what you want out of a daemon that's critical to the future availability of your system. Another is that you can't monitor it aside from looking at what it prints out, which is also not really what you want for a piece of critical infrastructure. The first problem seems somewhat more straightforward. Suppose we add a new command-line option, perhaps --daemon but we can bikeshed. If this option is specified, then it tries to keep going when it hits a problem, rather than just giving up. There's some fuzziness in my mind about exactly what this should mean. If the problem we hit is that we lost the connection to the remote server, then we should try to reconnect. But if the problem is something like a failure inside open_walfile() or close_walfile(), like a failed open() or fsync() or close() or something, it's a little less clear what to do. Maybe one idea would be to have a parent process and a child process, where the child process does all the work and the parent process just keeps re-launching it if it dies. It's not entirely clear that this is a suitable way of recovering from, say, an fsync() failure, given previous discussions claiming that - and I might be exaggerating a bit here - there is essentially no way to recover from a failed fsync() because the kernel might have already thrown out your data and you might as well just set the data center on fire - but perhaps an retry system that can't cope with certain corner cases is better than not having one at all, and perhaps we could revise the logic here and there to have the process doing the work take some action other than exiting when that's an intelligent approach. The second problem is a bit more complex. If you were transferring WAL to another PostgreSQL instance rather than to a frontend process, you could log to some place other than standard output, like for example a file, and you could periodically rotate that file, or alternatively you could log to syslog or the Windows event log. Even better, you could connect to PostgreSQL and run SQL queries against monitoring views and see what results you get. If the existing monitoring views don't give users what they need, we can improve them, but the whole infrastructure needed for this kind of thing is altogether lacking for any frontend program. It does not seem very appealing to reinvent log rotation, connection management, and monitoring views inside pg_receivewal, let alone in every frontend process where similar monitoring might be useful. But at least for me, without such capabilities, it is a little hard to take pg_receivewal seriously. I wonder first of all whether other people agree with these concerns, and secondly what they think we ought to do about it. One option is - do nothing. This could be based either on the idea that pg_receivewal is hopeless, or else on the idea that pg_receivewal can be restarted by some external system when required and monitored well enough as things stand. A second option is to start building out capabilities in pg_receivewal to turn it into something closer to what you'd expect of a normal daemon, with the addition of a retry capability as probably the easiest improvement. A third option is to somehow move towards a world where you can use the server to move WAL around even if you don't really want to run the server. Imagine a server running with no data directory and only a minimal set of running processes, just (1) a postmaster and (2) a walreceiver that writes to an archive directory and (3) non-database-connected backends that are just smart enough to handle queries for status information. This has the same problem that I mentioned on the thread about monitoring the recovery process, namely that we haven't got pg_authid. But against that, you get a lot of infrastructure for free: configuration files, process management, connection management, an existing wire protocol, memory contexts, rich error reporting, etc. I am curious to hear what other people think about the usefulness (or lack thereof) of pg_receivewal as thing stand today, as well as ideas about future direction. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: