Thread: Arbitary file size limit in twophase.c

Arbitary file size limit in twophase.c

From
Gavin Sherry
Date:
There's an apparently arbitary limit of 10,000,000 bytes in twophase.c
on the size of a two phase commit file. I can't see why this limit
exists.

I hit this limit by creating a prepared transaction which included
dropping a schema with about 25,000 objects in it and then trying to
commit it. The result is that ReadTwoPhaseFile() returns NULL and
FinishPreparedTransaction() says it detected a corrupted file.

Thoughts?

Thanks,

Gavin


Re: Arbitary file size limit in twophase.c

From
Tom Lane
Date:
Gavin Sherry <swm@alcove.com.au> writes:
> There's an apparently arbitary limit of 10,000,000 bytes in twophase.c
> on the size of a two phase commit file. I can't see why this limit
> exists.

The comment seems perfectly clear about why the limit exists:
    * Check file length.  We can determine a lower bound pretty easily. We    * set an upper bound mainly to avoid
palloc()failure on a corrupt file.
 

although certainly the specific value has been picked out of the air.

Perhaps it'd be better to use malloc() than palloc(), so that we'd not
lose control on out-of-memory, and then deem the file "too big" only
if we couldn't malloc the space.

Or we could try to fix things so that the file doesn't have to be all in
memory, but that seems pretty invasive.
        regards, tom lane


Re: Arbitary file size limit in twophase.c

From
Gavin Sherry
Date:
On Tue, May 13, 2008 at 10:34:23AM -0400, Tom Lane wrote:
> Gavin Sherry <swm@alcove.com.au> writes:
> > There's an apparently arbitary limit of 10,000,000 bytes in twophase.c
> > on the size of a two phase commit file. I can't see why this limit
> > exists.
> 
> The comment seems perfectly clear about why the limit exists:
> 
>      * Check file length.  We can determine a lower bound pretty easily. We
>      * set an upper bound mainly to avoid palloc() failure on a corrupt file.

Oops. Where was my brain?

> Perhaps it'd be better to use malloc() than palloc(), so that we'd not
> lose control on out-of-memory, and then deem the file "too big" only
> if we couldn't malloc the space.

That seems better.

Thanks,

Gavin


Re: Arbitary file size limit in twophase.c

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> Gavin Sherry <swm@alcove.com.au> writes:
>> There's an apparently arbitary limit of 10,000,000 bytes in twophase.c
>> on the size of a two phase commit file. I can't see why this limit
>> exists.
> 
> The comment seems perfectly clear about why the limit exists:
> 
>      * Check file length.  We can determine a lower bound pretty easily. We
>      * set an upper bound mainly to avoid palloc() failure on a corrupt file.
> 
> although certainly the specific value has been picked out of the air.

I'm surprised the twophase file grows that big. If you exceed 10MB by 
dropping 25000 objects, that's 400 bytes per object. A 
TwoPhaseLockRecord is only 20-24 bytes + TwoPhaseRecordOnDisk header 
which is 8 bytes, so what's taking so much space?

[tests...] It looks like it's the shared cache invalidation messages 
that make up the rest of the bulk. They could probably be stored in a 
more dense format, we currently store each invalidation message as a 
separate twophase-record. I don't think I want to spend time on that 
myself, but if someone cares enough to write a patch I'll be happy to 
review it.

> Perhaps it'd be better to use malloc() than palloc(), so that we'd not
> lose control on out-of-memory, and then deem the file "too big" only
> if we couldn't malloc the space.

That doesn't seem much better to me than just letting the palloc fail.

If we're going to check for file length, we should definitely check the 
file length when we write it, so that we fail at PREPARE time, and not 
at COMMIT time.

I'll write up a patch along the lines of:
1. increase the limit to, say, 100 MB
2. Check the file length at PREPARE time.

> Or we could try to fix things so that the file doesn't have to be all in
> memory, but that seems pretty invasive.

Yep. And definitely not back-patchable.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Arbitary file size limit in twophase.c

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> If we're going to check for file length, we should definitely check the 
> file length when we write it, so that we fail at PREPARE time, and not 
> at COMMIT time.

I think this is mere self-delusion, unfortunately.  You can never be
certain at prepare time that a large alloc will succeed sometime later
in a different process.

Gavin's complaint is essentially that a randomly chosen hard limit is
bad, and I agree with that.  Choosing a larger hard limit doesn't make
it less random.

It might be worth checking at prepare that the file size doesn't exceed
MaxAllocSize, but any smaller limit strikes me as (a) unnecessarily
restrictive and (b) not actually creating any useful guarantee.
        regards, tom lane


Re: Arbitary file size limit in twophase.c

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> If we're going to check for file length, we should definitely check the
>> file length when we write it, so that we fail at PREPARE time, and not
>> at COMMIT time.
>
> I think this is mere self-delusion, unfortunately.  You can never be
> certain at prepare time that a large alloc will succeed sometime later
> in a different process.
>
> Gavin's complaint is essentially that a randomly chosen hard limit is
> bad, and I agree with that.  Choosing a larger hard limit doesn't make
> it less random.
>
> It might be worth checking at prepare that the file size doesn't exceed
> MaxAllocSize, but any smaller limit strikes me as (a) unnecessarily
> restrictive and (b) not actually creating any useful guarantee.

Hmm, I guess you're right.

Patch attached. I can't commit it myself right now, but will do so as
soon as I can, unless there's objections.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.42
diff -c -r1.42 twophase.c
*** src/backend/access/transam/twophase.c    12 May 2008 00:00:45 -0000    1.42
--- src/backend/access/transam/twophase.c    16 May 2008 09:56:56 -0000
***************
*** 56,61 ****
--- 56,62 ----
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
+ #include "utils/memutils.h"


  /*
***************
*** 866,871 ****
--- 867,881 ----
      hdr->total_len = records.total_len + sizeof(pg_crc32);

      /*
+      * If the file size exceeds MaxAllocSize, we won't be able to read it in
+      * ReadTwoPhaseFile. Check for that now, rather than fail at commit time.
+      */
+     if (hdr->total_len > MaxAllocSize)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("two-phase state file maximum length exceeed")));
+
+     /*
       * Create the 2PC state file.
       *
       * Note: because we use BasicOpenFile(), we are responsible for ensuring
***************
*** 1044,1051 ****
      }

      /*
!      * Check file length.  We can determine a lower bound pretty easily. We
!      * set an upper bound mainly to avoid palloc() failure on a corrupt file.
       */
      if (fstat(fd, &stat))
      {
--- 1054,1060 ----
      }

      /*
!      * Check file length.  We can determine a lower bound pretty easily.
       */
      if (fstat(fd, &stat))
      {
***************
*** 1059,1066 ****

      if (stat.st_size < (MAXALIGN(sizeof(TwoPhaseFileHeader)) +
                          MAXALIGN(sizeof(TwoPhaseRecordOnDisk)) +
!                         sizeof(pg_crc32)) ||
!         stat.st_size > 10000000)
      {
          close(fd);
          return NULL;
--- 1068,1074 ----

      if (stat.st_size < (MAXALIGN(sizeof(TwoPhaseFileHeader)) +
                          MAXALIGN(sizeof(TwoPhaseRecordOnDisk)) +
!                         sizeof(pg_crc32)))
      {
          close(fd);
          return NULL;

Re: Arbitary file size limit in twophase.c

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> It might be worth checking at prepare that the file size doesn't exceed
>> MaxAllocSize, but any smaller limit strikes me as (a) unnecessarily
>> restrictive and (b) not actually creating any useful guarantee.

> Patch attached. I can't commit it myself right now, but will do so as 
> soon as I can, unless there's objections.

Two bugs: "exceeed" -> "exceeded", please; and on the read side, you
should still have an upper-bound check, but it should be MaxAllocSize.
        regards, tom lane


Re: Arbitary file size limit in twophase.c

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> It might be worth checking at prepare that the file size doesn't exceed
>>> MaxAllocSize, but any smaller limit strikes me as (a) unnecessarily
>>> restrictive and (b) not actually creating any useful guarantee.
>
>> Patch attached. I can't commit it myself right now, but will do so as
>> soon as I can, unless there's objections.
>
> Two bugs: "exceeed" -> "exceeded", please;

Thanks.

> and on the read side, you
> should still have an upper-bound check, but it should be MaxAllocSize.

That seems like a highly unlikely failure scenario, where a two-phase
file is corrupt file so that it becomes larger than 1GB. It's not like
the check costs anything either, though, so I'll just put it back like
you suggested.

Updated patch attached. I think it's ok now, but I'll air this as a
patch before committing since I got it wrong before...

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
? GNUmakefile
? config.log
? config.status
? readFile-close-1.patch
? remove-twophase-file-size-limit-2.patch
? contrib/pg_standby/.deps
? contrib/pg_standby/pg_standby
? contrib/spi/.deps
? doc/src/sgml/cvsmsg
? src/Makefile.global
? src/backend/postgres
? src/backend/access/common/.deps
? src/backend/access/gin/.deps
? src/backend/access/gist/.deps
? src/backend/access/hash/.deps
? src/backend/access/heap/.deps
? src/backend/access/index/.deps
? src/backend/access/nbtree/.deps
? src/backend/access/transam/.deps
? src/backend/bootstrap/.deps
? src/backend/catalog/.deps
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/commands/.deps
? src/backend/executor/.deps
? src/backend/lib/.deps
? src/backend/libpq/.deps
? src/backend/main/.deps
? src/backend/nodes/.deps
? src/backend/optimizer/geqo/.deps
? src/backend/optimizer/path/.deps
? src/backend/optimizer/plan/.deps
? src/backend/optimizer/prep/.deps
? src/backend/optimizer/util/.deps
? src/backend/parser/.deps
? src/backend/port/.deps
? src/backend/postmaster/.deps
? src/backend/regex/.deps
? src/backend/rewrite/.deps
? src/backend/snowball/.deps
? src/backend/snowball/snowball_create.sql
? src/backend/storage/buffer/.deps
? src/backend/storage/file/.deps
? src/backend/storage/freespace/.deps
? src/backend/storage/ipc/.deps
? src/backend/storage/large_object/.deps
? src/backend/storage/lmgr/.deps
? src/backend/storage/page/.deps
? src/backend/storage/smgr/.deps
? src/backend/tcop/.deps
? src/backend/tsearch/.deps
? src/backend/utils/.deps
? src/backend/utils/probes.h
? src/backend/utils/adt/.deps
? src/backend/utils/cache/.deps
? src/backend/utils/error/.deps
? src/backend/utils/fmgr/.deps
? src/backend/utils/hash/.deps
? src/backend/utils/init/.deps
? src/backend/utils/mb/.deps
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps
? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_johab/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_win/.deps
? src/backend/utils/misc/.deps
? src/backend/utils/mmgr/.deps
? src/backend/utils/resowner/.deps
? src/backend/utils/sort/.deps
? src/backend/utils/time/.deps
? src/bin/initdb/.deps
? src/bin/initdb/initdb
? src/bin/pg_config/.deps
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/.deps
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/.deps
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/.deps
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/.deps
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/.deps
? src/bin/psql/d.c
? src/bin/psql/psql
? src/bin/scripts/.deps
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/.deps
? src/interfaces/ecpg/compatlib/exports.list
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.1
? src/interfaces/ecpg/ecpglib/.deps
? src/interfaces/ecpg/ecpglib/exports.list
? src/interfaces/ecpg/ecpglib/libecpg.so.6.1
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/pgtypeslib/.deps
? src/interfaces/ecpg/pgtypeslib/exports.list
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.1
? src/interfaces/ecpg/preproc/.deps
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/libpq/.deps
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.2
? src/pl/plpgsql/src/.deps
? src/port/.deps
? src/port/pg_config_paths.h
? src/test/regress/.deps
? src/test/regress/log
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/tmp_check
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/.deps
? src/timezone/zic
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.42
diff -c -r1.42 twophase.c
*** src/backend/access/transam/twophase.c    12 May 2008 00:00:45 -0000    1.42
--- src/backend/access/transam/twophase.c    16 May 2008 15:53:20 -0000
***************
*** 56,61 ****
--- 56,62 ----
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
+ #include "utils/memutils.h"


  /*
***************
*** 866,871 ****
--- 867,881 ----
      hdr->total_len = records.total_len + sizeof(pg_crc32);

      /*
+      * If the file size exceeds MaxAllocSize, we won't be able to read it in
+      * ReadTwoPhaseFile. Check for that now, rather than fail at commit time.
+      */
+     if (hdr->total_len > MaxAllocSize)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("two-phase state file maximum length exceeded")));
+
+     /*
       * Create the 2PC state file.
       *
       * Note: because we use BasicOpenFile(), we are responsible for ensuring
***************
*** 1045,1051 ****

      /*
       * Check file length.  We can determine a lower bound pretty easily. We
!      * set an upper bound mainly to avoid palloc() failure on a corrupt file.
       */
      if (fstat(fd, &stat))
      {
--- 1055,1063 ----

      /*
       * Check file length.  We can determine a lower bound pretty easily. We
!      * set an upper bound to avoid palloc() failure on a corrupt file, though
!      * we can't guarantee that we won't get an out of memory error anyway,
!      * even on a valid file.
       */
      if (fstat(fd, &stat))
      {
***************
*** 1060,1066 ****
      if (stat.st_size < (MAXALIGN(sizeof(TwoPhaseFileHeader)) +
                          MAXALIGN(sizeof(TwoPhaseRecordOnDisk)) +
                          sizeof(pg_crc32)) ||
!         stat.st_size > 10000000)
      {
          close(fd);
          return NULL;
--- 1072,1078 ----
      if (stat.st_size < (MAXALIGN(sizeof(TwoPhaseFileHeader)) +
                          MAXALIGN(sizeof(TwoPhaseRecordOnDisk)) +
                          sizeof(pg_crc32)) ||
!         stat.st_size > MaxAllocSize)
      {
          close(fd);
          return NULL;