Hot Standby conflict resolution handling - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Hot Standby conflict resolution handling |
Date | |
Msg-id | CABOikdP48wU561h5FvWL_qvvAz=GAiAjmMsfk30YVXU_MqED0Q@mail.gmail.com Whole thread Raw |
Responses |
Re: Hot Standby conflict resolution handling
|
List | pgsql-hackers |
<br />I was trying some simple queries on a Hot Standby with streaming replication.<br /><br />On standby, I do this:<br/>postgres=# begin transaction isolation level repeatable read;<br />BEGIN<br />postgres=# explain verbose selectcount(b) from test WHERE a > 100000;<br /><br clear="all" />On master, I insert a bunch of tuples in the table andrun VACUUM ANALYZE.<br />postgres=# INSERT INTO test VALUES (generate_series(110001,120000), 'foo', 1);<br />INSERT 010000<br />postgres=# VACUUM ANALYZE test;<br /> VACUUM<br /><br />After max_standby_streaming_delay, the standby startscancelling the queries. I get an error like this on the standby:<br />postgres=# explain verbose select count(b) fromtest WHERE a > 100000;<br />FATAL: terminating connection due to conflict with recovery<br /> DETAIL: User querymight have needed to see row versions that must be removed.<br />HINT: In a moment you should be able to reconnectto the database and repeat your command.<br />server closed the connection unexpectedly<br /> This probablymeans the server terminated abnormally<br /> before or while processing the request.<br />The connection to theserver was lost. Attempting reset: Succeeded.<br /><br />So I've couple questions/concerns here<br /><br />1. Why to throwa FATAL error here ? A plain ERROR should be enough to abort the transaction. There are four places in ProcessInterrupts()where we throw these kind of errors and three of them are FATAL.<br /><br />2911 if (DoingCommandRead)<br/> 2912 ereport(FATAL,<br />2913 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br/>2914 errmsg("terminating connection due to conflictwith recovery"),<br /> 2915 errdetail_recovery_conflict(),<br />2916 errhint("Ina moment you should be able to reconnect to the"<br />2917 " database and repeat yourcommand.")));<br /> 2918 else <br />2919 ereport(ERROR,<br />2920 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br />2921 errmsg("canceling statementdue to conflict with recovery"),<br /> 2922 errdetail_recovery_conflict()));<br /><br />Inthis particular test case, the backend is DoingCommandRead and that forces a FATAL error. I'm not sure why is that required. And even if its necessary, IMHO we should add a comment explaining that. In the other two places where we throwFATAL, one looks legitimate, but I'm not sure about the other.<br /><br />2836 else if (RecoveryConflictPending&& RecoveryConflictRetryable)<br />2837 {<br />2838 pgstat_report_recovery_conflict(RecoveryConflictReason);<br/>2839 ereport(FATAL,<br /> 2840 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br />2841 errmsg("terminating connectiondue to conflict with recovery"),<br />2842 errdetail_recovery_conflict()));<br /> 2843 }<br />2844 else if (RecoveryConflictPending)<br />2845 {<br />2846 /* Currentlythere is only one non-retryable recovery conflict */<br />2847 Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);<br/> 2848 pgstat_report_recovery_conflict(RecoveryConflictReason);<br />2849 ereport(FATAL,<br />2850 (errcode(ERRCODE_DATABASE_DROPPED),<br />2851 errmsg("terminating connection due to conflict with recovery"),<br /> 2852 errdetail_recovery_conflict()));<br/>2853 }<br /><br />AFAICS the first of these should be ereport(ERROR). Otherwiseirrespective of whether RecoveryConflictRetryable is true or false, we will always ereport(FATAL).<br /><br />2.For my test, the error message itself looks wrong because I did not actually remove any rows on the master. VACUUM probablymarked a bunch of pages as all-visible and that should have triggered a conflict on the standby in order to supportindex-only scans. IMHO we should improve the error message to avoid any confusion. Or we can add a new ProcSignalReasonto differentiate between a cancel due to clean up vs visibilitymap_set() operation.<br /><br />BTW, my currentset up is using 9.2.1, but I don't see any code changes in the master. So I would assume the issues will exist theretoo.<br /><br />Thanks,<br />Pavan<br /><br />-- <br />Pavan Deolasee<br /><a href="http://www.linkedin.com/in/pavandeolasee"target="_blank">http://www.linkedin.com/in/pavandeolasee</a><br />
pgsql-hackers by date: