Thread: Re: default_isolation_level='serializable' crashes on Windows

Re: default_isolation_level='serializable' crashes on Windows

From
"Kevin Grittner"
Date:
> "Kevin Grittner" wrote:
> Heikki Linnakangas wrote:
>> On 14.08.2012 14:25, Kevin Grittner wrote:

Attached is version 3.

>>> Oh, further testing this morning shows that while *queries* on
>>> the HS seem OK, streaming replication is now broken. I probably
>>> need to override transaction isolation on the recovery process.
>>> I'll take a look at that.
>>
>> Hmm, seems to work for me. Do you get an unexpected error or what?
>
> No, I wasn't getting errors in the clients or the logs, but I
> wasn't seeing data pop up on the replica when I expected. Perhaps I
> messed up my streaming replication configuration somehow.

Yeah, setup error. I accidentally dropped the primary_conninfo
setting from my recovery.conf file. Now that I've put that back, it
works just fine.

>> I didn't, I meant to point out that you can set
>> transaction_isolation just for the one transaction. But your
>> suggested hint is OK as well - you suggest to set it for the whole
>> session, which will also work around the problem. The "before the
>> first query in the transaction" isn't necessary in that case
>> though.

>> Yet another option is to suggest the "BEGIN ISOLATION LEVEL
>> REPEATABLE READ" syntax, instead of SET.
>
> I'm inclined toward hinting at a session override of the default.
> If you're typing away in psql, that's a lot less work. :-)

I tinkered with the messages (including correcting a reverse-logic
bug in which was displayed under what circumstances) and tweaked a
comment. I struggled with how to capitalize and quote. Let me know
what you think.

>>>>> Since the existing behavior is so bad, I'm inclined to think
>>>>> this merits backpatching to 9.1. Thoughts on that?
>>>>
>>>> Yes, we have to somehow fix the crash and the assertion failure
>>>> on 9.1.
>>>
>>> Should the check_transaction_read_only() stuff be back-patched to
>>> 9.1, too? So far as we know, that's fragile, not broken, right?
>>> Could the fix you envision there cause a behavioral change that
>>> could break anything that users might have in place?
>>
>> Good question. I don't see how it could cause a behavioral change,
>> but I've been wrong before.
>
> If we don't know of any actual existing bugs I'm inclined to not
> back-patch that part to 9.1, although it makes sense for 9.2 since
> we shouldn't be risking breakage of any production systems. I'm
> really cautious about giving anybody any excuse not to apply a
> minor update.

How about we fix the serializable versus HS & Windows bugs in one
patch, and then look at the other as a separate patch? If that's OK,
I think this is ready, unless my message text can be improved. (And
I will have a shot at my first back-patching....)

-Kevin



Attachment

Re: default_isolation_level='serializable' crashes on Windows

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> How about we fix the serializable versus HS & Windows bugs in one
> patch, and then look at the other as a separate patch? If that's OK,
> I think this is ready, unless my message text can be improved. (And
> I will have a shot at my first back-patching....)

I poked around this area a bit.  I notice that
check_transaction_read_only has got the same fundamental error: it
thinks it can safely consult RecoveryInProgress(), which in general
it cannot.

The particular cases we have discussed so far cannot lead to a crash
there, because in startup scenarios XactReadOnly wouldn't be on already.
However I think that a background process not connected to shared memory
could be crashed if somebody changed transaction_read_only from true to
false in postgresql.conf.  That's sufficiently far-fetched that maybe we
shouldn't worry about it, but still it seems ugly.

On reflection I think maybe the best solution is for
check_transaction_read_only to test IsTransactionState(), and just
allow the change always if outside a transaction.  That would make its
IsSubTransaction test a bit saner/safer as well.  We could do the same
thing in check_XactIsoLevel.  We still need a test in
GetSerializableTransactionSnapshot, or someplace else in the vicinity
of that, to cover the default_transaction_isolation = serializable case;
but if we leave the error check in place in check_XactIsoLevel, you'll
get an immediate rather than delayed error from trying to crank the
level up manually in hot standby, which seems more user-friendly.

Will send an updated patch as soon as I'm done testing this idea.

One other point: I notice that Kevin used ERRCODE_FEATURE_NOT_SUPPORTED
in the error messages he added, which seems sane in isolation.
However, the GUC-based code is (by default) throwing
ERRCODE_INVALID_PARAMETER_VALUE when it rejects due to
RecoveryInProgress.  I'm not totally sure whether that was thought about
or just fell out of not thinking.  Which one do we want here?
        regards, tom lane



Re: default_isolation_level='serializable' crashes on Windows

From
Tom Lane
Date:
I wrote:
> I poked around this area a bit.  I notice that
> check_transaction_read_only has got the same fundamental error: it
> thinks it can safely consult RecoveryInProgress(), which in general
> it cannot.

After rereading the whole thread I saw that Heikki had already pointed
this out, and come to the same conclusion about how to fix it:

> On reflection I think maybe the best solution is for
> check_transaction_read_only to test IsTransactionState(), and just
> allow the change always if outside a transaction.

Attached is a version of the patch that does it like that.  I've checked
that this fixes the problem (as well as Robert's earlier report) in an
EXEC_BACKEND build, which is as close as I can get to the Windows
environment.

I tweaked Kevin's error message to keep the same capitalization as the
existing text for the message in check_XactIsoLevel --- if we change
that it will cause work for the translators, and I don't think it's
enough of an improvement to justify that.

I also back-propagated the use of ERRCODE_FEATURE_NOT_SUPPORTED into
the GUC check hooks.  On reflection this seems more appropriate than
ERRCODE_INVALID_PARAMETER_VALUE.

Lastly, I simplified the added code in InitPostgres down to just a
bare assignment to XactIsoLevel.  It doesn't seem worthwhile to add
all the cycles involved in SetConfigOption(), when we have no desire
to change the GUC permanently.  (I think Kevin's code was wrong anyway
in that it was using PGC_S_OVERRIDE, which would impact the reset
state for the GUC.)

I think this is ready to go.  Kevin, do you want to apply it?  You
had mentioned wanting some practice with back-patches.

            regards, tom lane

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 0e9eb3a64f5db66dd2fca68252a597e68defdbf8..5d894ba77f74d02637008deedd4ca2919d8af6b8 100644
*** a/src/backend/commands/variable.c
--- b/src/backend/commands/variable.c
*************** show_log_timezone(void)
*** 533,543 ****
   * read-only may be changed to read-write only when in a top-level transaction
   * that has not yet taken an initial snapshot.    Can't do it in a hot standby
   * slave, either.
   */
  bool
  check_transaction_read_only(bool *newval, void **extra, GucSource source)
  {
!     if (*newval == false && XactReadOnly)
      {
          /* Can't go to r/w mode inside a r/o transaction */
          if (IsSubTransaction())
--- 533,548 ----
   * read-only may be changed to read-write only when in a top-level transaction
   * that has not yet taken an initial snapshot.    Can't do it in a hot standby
   * slave, either.
+  *
+  * If we are not in a transaction at all, just allow the change; it means
+  * nothing since XactReadOnly will be reset by the next StartTransaction().
+  * The IsTransactionState() test protects us against trying to check
+  * RecoveryInProgress() in contexts where shared memory is not accessible.
   */
  bool
  check_transaction_read_only(bool *newval, void **extra, GucSource source)
  {
!     if (*newval == false && XactReadOnly && IsTransactionState())
      {
          /* Can't go to r/w mode inside a r/o transaction */
          if (IsSubTransaction())
*************** check_transaction_read_only(bool *newval
*** 556,561 ****
--- 561,567 ----
          /* Can't go to r/w mode while recovery is still active */
          if (RecoveryInProgress())
          {
+             GUC_check_errcode(ERRCODE_FEATURE_NOT_SUPPORTED);
              GUC_check_errmsg("cannot set transaction read-write mode during recovery");
              return false;
          }
*************** check_transaction_read_only(bool *newval
*** 569,574 ****
--- 575,582 ----
   *
   * We allow idempotent changes at any time, but otherwise this can only be
   * changed in a toplevel transaction that has not yet taken a snapshot.
+  *
+  * As in check_transaction_read_only, allow it if not inside a transaction.
   */
  bool
  check_XactIsoLevel(char **newval, void **extra, GucSource source)
*************** check_XactIsoLevel(char **newval, void *
*** 598,604 ****
      else
          return false;

!     if (newXactIsoLevel != XactIsoLevel)
      {
          if (FirstSnapshotSet)
          {
--- 606,612 ----
      else
          return false;

!     if (newXactIsoLevel != XactIsoLevel && IsTransactionState())
      {
          if (FirstSnapshotSet)
          {
*************** check_XactIsoLevel(char **newval, void *
*** 616,621 ****
--- 624,630 ----
          /* Can't go to serializable mode while recovery is still active */
          if (newXactIsoLevel == XACT_SERIALIZABLE && RecoveryInProgress())
          {
+             GUC_check_errcode(ERRCODE_FEATURE_NOT_SUPPORTED);
              GUC_check_errmsg("cannot use serializable mode in a hot standby");
              GUC_check_errhint("You can use REPEATABLE READ instead.");
              return false;
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index b22faf9607d87f1a2bc7ce306e923dc12b8cfcc7..8b5a95ceaa0ff460d34231dadc2d3562fa7b0c63 100644
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
*************** GetSerializableTransactionSnapshot(Snaps
*** 1572,1577 ****
--- 1572,1590 ----
      Assert(IsolationIsSerializable());

      /*
+      * Can't use serializable mode while recovery is still active, as it is,
+      * for example, on a hot standby.  We could get here despite the check
+      * in check_XactIsoLevel() if default_transaction_isolation is set to
+      * serializable, so phrase the hint accordingly.
+      */
+     if (RecoveryInProgress())
+         ereport(ERROR,
+                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                  errmsg("cannot use serializable mode in a hot standby"),
+                  errdetail("\"default_transaction_isolation\" is set to \"serializable\"."),
+                  errhint("You can use \"SET default_transaction_isolation = 'repeatable read'\" to change the
default.")));
+
+     /*
       * A special optimization is available for SERIALIZABLE READ ONLY
       * DEFERRABLE transactions -- we can wait for a suitable snapshot and
       * thereby avoid all SSI overhead once it's running.
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 6a3fc6f693f66304a487032e2099d1660590d836..bef301a79c8fea1f602f5bd8b340bb4950d285fa 100644
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 584,589 ****
--- 584,598 ----
          /* statement_timestamp must be set for timeouts to work correctly */
          SetCurrentStatementStartTimestamp();
          StartTransactionCommand();
+
+         /*
+          * transaction_isolation will have been set to the default by the
+          * above.  If the default is "serializable", and we are in hot
+          * standby, we will fail if we don't change it to something lower.
+          * Fortunately, "read committed" is plenty good enough.
+          */
+         XactIsoLevel = XACT_READ_COMMITTED;
+
          (void) GetTransactionSnapshot();
      }


Re: default_isolation_level='serializable' crashes on Windows

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I tweaked Kevin's error message to keep the same capitalization as
> the existing text for the message in check_XactIsoLevel --- if we
> change that it will cause work for the translators, and I don't
> think it's enough of an improvement to justify that.
That's one of the reasons I agonized over it -- I think the way I
left it is a little better and more consistent with other messages,
but didn't know whether the difference was worth translator effort. 
I'm happy to trust your judgment on that.
> Lastly, I simplified the added code in InitPostgres down to just a
> bare assignment to XactIsoLevel.  It doesn't seem worthwhile to
> add all the cycles involved in SetConfigOption(), when we have no
> desire to change the GUC permanently.  (I think Kevin's code was
> wrong anyway in that it was using PGC_S_OVERRIDE, which would
> impact the reset state for the GUC.)
Point taken on PGC_S_OVERRIDE.  And that probably fixes the issue
that caused me to hold up when I was about ready to pull the trigger
this past weekend.  A final round of testing showed a "SET" line on
psql start, which is clearly wrong.  I suspected that I needed to go
to a lower level in setting that, but hadn't had a chance to sort
out just what the right path was.  In retrospect, just directly
assigning the value seems pretty obvious.
> I think this is ready to go.
With your changes, I agree.
> Kevin, do you want to apply it?  You had mentioned wanting some
> practice with back-patches.
I'm getting on a plane to Istanbul in less than 48 hours for the
VLDB conference, and scrambling to tie up loose ends.  I don't want
to be under that kind of time-pressure when I back-patch for the
first time, for fear of making a mess of things and not being around
to clean up the mess; so my first back-patch is probably best left
for another time.
I'll run through my tests again tonight, against your patch, not
that I expect any problems with it.  Unfortunately I can't test
Windows, as I don't have a build environment for that.
Thanks for going over this.
-Kevin



Re: default_isolation_level='serializable' crashes on Windows

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I'll run through my tests again tonight, against your patch, not
> that I expect any problems with it.  Unfortunately I can't test
> Windows, as I don't have a build environment for that.

FWIW, you can approximate Windows close enough for this type of problem
by building with EXEC_BACKEND defined.  I usually add the #define to
pg_config.h after configuring, though of course there's more than one
way to do it.
        regards, tom lane



Re: default_isolation_level='serializable' crashes on Windows

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Lastly, I simplified the added code in InitPostgres down to just a
>> bare assignment to XactIsoLevel.  It doesn't seem worthwhile to
>> add all the cycles involved in SetConfigOption(), when we have no
>> desire to change the GUC permanently.  (I think Kevin's code was
>> wrong anyway in that it was using PGC_S_OVERRIDE, which would
>> impact the reset state for the GUC.)
> Point taken on PGC_S_OVERRIDE.  And that probably fixes the issue
> that caused me to hold up when I was about ready to pull the trigger
> this past weekend.  A final round of testing showed a "SET" line on
> psql start, which is clearly wrong.  I suspected that I needed to go
> to a lower level in setting that, but hadn't had a chance to sort
> out just what the right path was.  In retrospect, just directly
> assigning the value seems pretty obvious.

Hm, I'm not sure why using SetConfigOption() would result in anything
happening client-side.  (If this GUC were GUC_REPORT, that would result
in a parameter-change report to the client; but it isn't, and anyway
that shouldn't cause psql to print "SET".)  Might be worth looking
closer at what was happening there.

>> Kevin, do you want to apply it?  You had mentioned wanting some
>> practice with back-patches.
> I'm getting on a plane to Istanbul in less than 48 hours for the
> VLDB conference, and scrambling to tie up loose ends.

OK, I've committed it.
        regards, tom lane