Thread: A couple of patches for PostgreSQL 64bit support

A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
Hi all,

Here're a couple of patches for PostgreSQL 64bit support.  There're just
two extension to 64bit, size of shared memory and transaction ID.

Please take a look at overview.txt for this proposal and pathces, based
upon 8.0.3.

Any discussions are welcome.

--
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Proposal: 64bit Extension in PostgreSQL 8.0.x

                                                              July 7th, 2005
                     Koichi Suzuki (NTT DATA Intellilink)

1. Background and Purpose

64-bit architecture CPU is getting more and more popular in Intel-based CPU, such as 
EM64T, AMD64 and IA64.    Servers based upon such CPUs can provide much more memory 
available.   Tens of gigabytes of memory is available in a node, typically 16gigabytes 
or so.

Obviously, 32bit-based Linux and its applications can run on such a machine.   However, 
from the point of each process' view, size of avilable memory is limited by process user 
space, typically 1GB for kernel and 3GB for each process.   PostgreSQL's kernel uses 
shared memory to hold shared data and much more memory should be available as PostgreSQL 
is going to handle bigger database.   For this purpose, we need to extend PostgreSQL to 
64-bit program and make shared memory management to handle shared memory beyond 32bit 
limitation.

On the other hand, PostgreSQL is going to be used in mission-critical systems in 
enterprise environment.   In such application, we need to provide long period operation 
without stopping service.   Currently, PostgreSQL's transaction ID is limited by 32bit 
integer and when it is about to run out, we have to stop the database operation and run 
vacuum freeze to reuse the older transaction ID value.   Vacuum freeze operation scans 
all the database space and together with the bigger database size in oparation, it's 
going to take longer.   Because PostgreSQL is going to be used in busier systems, 
transaction IDs tend to run out more earlier.   To provide longer continuous operation, 
we need to make transaction ID 64bit-based.


2. How we can do?

It is basically very simple to do these two extensions.   We can locate their definitions 
and change them into 64-bit based ones.   However, there are much much more to be done.   
We have to find all the lines which deals with their value and have to modify them so that 
there will not be any loss of calculation precision.   Detailed result will be given 
later.

3. Environment

Our code will assume the following:

1) PostgreSQL server (i.e. postmaster and postgres processes) should run only on 64-bit 
   CPU based server machies, that is, EM64T, AMD64 and IA64.
2) Both 64bit (EM64T, AMD64 and IA64) and 32bit CPU(IA32) are allowed as clients.
3) Currently, we support only Linux 2.6.x kernel.

4. Specification changes

Due to these two changes, PostgreSQL's spec will change as follows:

4.1 Shared memory size

Shared memory size are specified in shared_buffers entry in postgresql.conf file.   
In 32-bit environment, it is limited to INTMAX/BLOCKSIZE.   Now new limitation is 
INTMAX/2.   This value specifies the number of blocks, so actual memory which can be 
specified by this parameter will be (INITMAX/2)*BLOCKSIZE.   Typical BLOCKSIZE is 8KB.   
Therefore, typicak maximum shared memory will be 8TB.   This is beyond the limitation of 
Linux user space for EM64T (512GB) and should be sufficient.

The reason why shared memory size specification is limited to INITMAX/2 is as follows:  
In 8.0.x buffer management, (buffer_number)*2 is used to produce buffer IDs and whole 
these subsystem is still based on 32bit calculation.   We'd like to keep the change as 
minimum as possible and having this limit will not have bad influence to the maxumum 
value of actual shared memory size.

4.2 Transaction ID (XID)

Type of transaction ID is pre-defined in the catalog and there are no way for users to 
redefine its type.   In this extension, transaction IDs are handled as follows:

1) On catalog, transaction ID's (such as XMIN and XMAX) are give the type "XID", as 
   in 32-bit environment.
2) If applications on 32-bit based environment trys to read the value of the type "XID", 
   it has to read this value as "unsigned long long" value.
3) In the case of 64-bit based environment, the value of �gXID�h can be handled as 
   "unsigned long" value, depending on compilers.
4) The length of the type "XID" is stored in the catalog "pg_type".

4.3 Configure

W have added two configuration options:
--enable-64bit-shared-memory
    enables 64bit shared memory by defining USE_64BIT_SHARED.
--enable-64bit-transaction-id
    enables 64-bit transaction id by defining USE_64BIT_XID.

5. Acknowledgement

I'd like to thank Mr.Tom Lane, Mr.Bruce Momjan and Mr.Jan Wieck for their stimulating discussions.  Many people from
FujitsuLtd., NTT DATA and NTT DATA Intellilink helped me.
 
Index: configure.in
===================================================================
--- configure.in    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ configure.in    (.../trunk/postgres-src)    (revision 55)
@@ -208,6 +208,36 @@
 AC_SUBST(enable_debug)

 #
+# 64-bit shared memory (--enable-64bit-shared-memory)
+#
+AC_MSG_CHECKING([whether to build with 64-bit shared memory.])
+PGAC_ARG_BOOL(enable, 64bit-shared-memory, no, [  --enable-64bit-shared-memory  enable 64-bit shared memory support],
+              [AC_DEFINE([USE_64BIT_SHARED], 1,
+                         [Define to 1 if you want 64-bit shared memory support. (--enable-64bit-shared-memory)])])
+AC_MSG_RESULT([$enable_64bit_shared_memory])
+
+# 64-bit shared memory support can be used only on the x86_64 linux.
+# spinlocks must be enabled when enable 64-bit shared memory support.
+if test "$enable_64bit_shared_memory" = yes ; then
+  check_64bit_shared_memory=0
+  case $host_os in
+    linux*)
+      if test "$host_cpu" = "x86_64" -o "$host_cpu" = "ia64" ; then
+        check_64bit_shared_memory=1
+      fi
+      ;;
+  esac
+
+  if test "$check_64bit_shared_memory" = 0 ; then
+    AC_MSG_ERROR([64-bit shared memory support can be used only on x86_64 linux or ia64 linux.])
+  else
+    if test "$enable_spinlocks" != yes ; then
+      AC_MSG_ERROR([spinlocks must be enabled when enable 64-bit shared memory support.])
+    fi
+  fi
+fi
+
+#
 # C compiler
 #

Index: src/include/pg_config.h.in
===================================================================
--- src/include/pg_config.h.in    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/include/pg_config.h.in    (.../trunk/postgres-src)    (revision 55)
@@ -660,6 +660,10 @@
 /* Define to select unnamed POSIX semaphores. */
 #undef USE_UNNAMED_POSIX_SEMAPHORES

+/* Define to 1 if you want 64-bit shared memory support.
+   (--enable-64bit-shared-memory) */
+#undef USE_64BIT_SHARED
+
 /* Number of bits in a file offset, on hosts where this is settable. */
 #undef _FILE_OFFSET_BITS

Index: src/include/postmaster/bgwriter.h
===================================================================
--- src/include/postmaster/bgwriter.h    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/include/postmaster/bgwriter.h    (.../trunk/postgres-src)    (revision 55)
@@ -30,7 +30,7 @@
 extern bool ForwardFsyncRequest(RelFileNode rnode, BlockNumber segno);
 extern void AbsorbFsyncRequests(void);

-extern int    BgWriterShmemSize(void);
+extern Size    BgWriterShmemSize(void);
 extern void BgWriterShmemInit(void);

 #endif   /* _BGWRITER_H */
Index: src/include/storage/lwlock.h
===================================================================
--- src/include/storage/lwlock.h    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/include/storage/lwlock.h    (.../trunk/postgres-src)    (revision 55)
@@ -68,7 +68,7 @@
 extern bool LWLockHeldByMe(LWLockId lockid);

 extern int    NumLWLocks(void);
-extern int    LWLockShmemSize(void);
+extern Size    LWLockShmemSize(void);
 extern void CreateLWLocks(void);

 #endif   /* LWLOCK_H */
Index: src/include/storage/buf_internals.h
===================================================================
--- src/include/storage/buf_internals.h    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/include/storage/buf_internals.h    (.../trunk/postgres-src)    (revision 55)
@@ -138,11 +138,11 @@
 extern void StrategyHintVacuum(bool vacuum_active);
 extern int StrategyDirtyBufferList(BufferDesc **buffers, BufferTag *buftags,
                         int max_buffers);
-extern int    StrategyShmemSize(void);
+extern Size    StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);

 /* buf_table.c */
-extern int    BufTableShmemSize(int size);
+extern Size    BufTableShmemSize(int size);
 extern void InitBufTable(int size);
 extern int    BufTableLookup(BufferTag *tagPtr);
 extern void BufTableInsert(BufferTag *tagPtr, int buf_id);
Index: src/include/storage/pg_shmem.h
===================================================================
--- src/include/storage/pg_shmem.h    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/include/storage/pg_shmem.h    (.../trunk/postgres-src)    (revision 55)
@@ -29,8 +29,8 @@
     int32        magic;            /* magic # to identify Postgres segments */
 #define PGShmemMagic  679834893
     pid_t        creatorPID;        /* PID of creating process */
-    uint32        totalsize;        /* total size of segment */
-    uint32        freeoffset;        /* offset to first free space */
+    Size        totalsize;        /* total size of segment */
+    Size        freeoffset;        /* offset to first free space */
 #ifndef WIN32                    /* Windows doesn't have useful inode#s */
     dev_t        device;            /* device data directory is on */
     ino_t        inode;            /* inode number of data directory */
@@ -45,7 +45,7 @@
 extern void PGSharedMemoryReAttach(void);
 #endif

-extern PGShmemHeader *PGSharedMemoryCreate(uint32 size, bool makePrivate,
+extern PGShmemHeader *PGSharedMemoryCreate(Size size, bool makePrivate,
                      int port);
 extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
 extern void PGSharedMemoryDetach(void);
Index: src/include/storage/bufmgr.h
===================================================================
--- src/include/storage/bufmgr.h    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/include/storage/bufmgr.h    (.../trunk/postgres-src)    (revision 55)
@@ -137,7 +137,7 @@
 #ifdef NOT_USED
 extern void PrintPinnedBufs(void);
 #endif
-extern int    BufferShmemSize(void);
+extern Size    BufferShmemSize(void);
 extern RelFileNode BufferGetFileNode(Buffer buffer);

 extern void SetBufferCommitInfoNeedsSave(Buffer buffer);
Index: src/backend/port/sysv_shmem.c
===================================================================
--- src/backend/port/sysv_shmem.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/port/sysv_shmem.c    (.../trunk/postgres-src)    (revision 55)
@@ -51,7 +51,7 @@
 unsigned long UsedShmemSegID = 0;
 void       *UsedShmemSegAddr = NULL;

-static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, uint32 size);
+static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size);
 static void IpcMemoryDetach(int status, Datum shmaddr);
 static void IpcMemoryDelete(int status, Datum shmId);
 static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
@@ -71,7 +71,7 @@
  * print out an error and abort.  Other types of errors are not recoverable.
  */
 static void *
-InternalIpcMemoryCreate(IpcMemoryKey memKey, uint32 size)
+InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size)
 {
     IpcMemoryId shmid;
     void       *memAddress;
@@ -99,14 +99,14 @@
          */
         ereport(FATAL,
                 (errmsg("could not create shared memory segment: %m"),
-        errdetail("Failed system call was shmget(key=%lu, size=%u, 0%o).",
-                  (unsigned long) memKey, size,
+        errdetail("Failed system call was shmget(key=%lu, size=%lu, 0%o).",
+                  (unsigned long) memKey, (unsigned long) size,
                   IPC_CREAT | IPC_EXCL | IPCProtection),
                  (errno == EINVAL) ?
                  errhint("This error usually means that PostgreSQL's request for a shared memory "
                          "segment exceeded your kernel's SHMMAX parameter.  You can either "
                          "reduce the request size or reconfigure the kernel with larger SHMMAX.  "
-               "To reduce the request size (currently %u bytes), reduce "
+               "To reduce the request size (currently %lu bytes), reduce "
            "PostgreSQL's shared_buffers parameter (currently %d) and/or "
                          "its max_connections parameter (currently %d).\n"
                          "If the request size is already small, it's possible that it is less than "
@@ -114,28 +114,28 @@
                          "reconfiguring SHMMIN is called for.\n"
                          "The PostgreSQL documentation contains more information about shared "
                          "memory configuration.",
-                         size, NBuffers, MaxBackends) : 0,
+                         (unsigned long) size, NBuffers, MaxBackends) : 0,
                  (errno == ENOMEM) ?
                  errhint("This error usually means that PostgreSQL's request for a shared "
                "memory segment exceeded available memory or swap space. "
-               "To reduce the request size (currently %u bytes), reduce "
+               "To reduce the request size (currently %lu bytes), reduce "
            "PostgreSQL's shared_buffers parameter (currently %d) and/or "
                          "its max_connections parameter (currently %d).\n"
                          "The PostgreSQL documentation contains more information about shared "
                          "memory configuration.",
-                         size, NBuffers, MaxBackends) : 0,
+                         (unsigned long) size, NBuffers, MaxBackends) : 0,
                  (errno == ENOSPC) ?
                  errhint("This error does *not* mean that you have run out of disk space. "
                          "It occurs either if all available shared memory IDs have been taken, "
                          "in which case you need to raise the SHMMNI parameter in your kernel, "
                          "or because the system's overall limit for shared memory has been "
              "reached.  If you cannot increase the shared memory limit, "
-        "reduce PostgreSQL's shared memory request (currently %u bytes), "
+        "reduce PostgreSQL's shared memory request (currently %lu bytes), "
         "by reducing its shared_buffers parameter (currently %d) and/or "
                          "its max_connections parameter (currently %d).\n"
                          "The PostgreSQL documentation contains more information about shared "
                          "memory configuration.",
-                         size, NBuffers, MaxBackends) : 0));
+                         (unsigned long) size, NBuffers, MaxBackends) : 0));
     }

     /* Register on-exit routine to delete the new segment */
@@ -289,7 +289,7 @@
  * zero will be passed.
  */
 PGShmemHeader *
-PGSharedMemoryCreate(uint32 size, bool makePrivate, int port)
+PGSharedMemoryCreate(Size size, bool makePrivate, int port)
 {
     IpcMemoryKey NextShmemSegID;
     void       *memAddress;
Index: src/backend/utils/misc/guc.c
===================================================================
--- src/backend/utils/misc/guc.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/utils/misc/guc.c    (.../trunk/postgres-src)    (revision 55)
@@ -964,7 +964,11 @@
             NULL
         },
         &NBuffers,
+#ifndef USE_64BIT_SHARED
         1000, 16, INT_MAX / BLCKSZ, NULL, NULL
+#else
+        1000, 16, INT_MAX / 2, NULL, NULL
+#endif    /* USE_64BIT_SHARED */
     },

     {
Index: src/backend/postmaster/bgwriter.c
===================================================================
--- src/backend/postmaster/bgwriter.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/postmaster/bgwriter.c    (.../trunk/postgres-src)    (revision 55)
@@ -458,7 +458,7 @@
  * BgWriterShmemSize
  *        Compute space needed for bgwriter-related shared memory
  */
-int
+Size
 BgWriterShmemSize(void)
 {
     /*
Index: src/backend/storage/lmgr/lwlock.c
===================================================================
--- src/backend/storage/lmgr/lwlock.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/storage/lmgr/lwlock.c    (.../trunk/postgres-src)    (revision 55)
@@ -123,17 +123,17 @@
 /*
  * Compute shmem space needed for LWLocks.
  */
-int
+Size
 LWLockShmemSize(void)
 {
     int            numLocks = NumLWLocks();
-    uint32        spaceLocks;
+    Size        spaceLocks;

     /* Allocate the LWLocks plus space for shared allocation counter. */
     spaceLocks = numLocks * sizeof(LWLock) + 2 * sizeof(int);
     spaceLocks = MAXALIGN(spaceLocks);

-    return (int) spaceLocks;
+    return (Size) spaceLocks;
 }


@@ -144,7 +144,7 @@
 CreateLWLocks(void)
 {
     int            numLocks = NumLWLocks();
-    uint32        spaceLocks = LWLockShmemSize();
+    Size        spaceLocks = LWLockShmemSize();
     LWLock       *lock;
     int            id;

Index: src/backend/storage/buffer/buf_init.c
===================================================================
--- src/backend/storage/buffer/buf_init.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/storage/buffer/buf_init.c    (.../trunk/postgres-src)    (revision 55)
@@ -96,7 +96,7 @@

     BufferBlocks = (char *)
         ShmemInitStruct("Buffer Blocks",
-                        NBuffers * BLCKSZ, &foundBufs);
+                        (Size)NBuffers * BLCKSZ, &foundBufs);

     if (foundDescs || foundBufs)
     {
@@ -193,16 +193,16 @@
  * compute the size of shared memory for the buffer pool including
  * data pages, buffer descriptors, hash tables, etc.
  */
-int
+Size
 BufferShmemSize(void)
 {
-    int            size = 0;
+    Size        size = 0;

     /* size of buffer descriptors */
     size += MAXALIGN(NBuffers * sizeof(BufferDesc));

     /* size of data pages */
-    size += NBuffers * MAXALIGN(BLCKSZ);
+    size += (Size)NBuffers * MAXALIGN(BLCKSZ);

     /* size of stuff controlled by freelist.c */
     size += StrategyShmemSize();
Index: src/backend/storage/buffer/buf_table.c
===================================================================
--- src/backend/storage/buffer/buf_table.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/storage/buffer/buf_table.c    (.../trunk/postgres-src)    (revision 55)
@@ -40,7 +40,7 @@
  * Estimate space needed for mapping hashtable
  *        size is the desired hash table size (possibly more than NBuffers)
  */
-int
+Size
 BufTableShmemSize(int size)
 {
     return hash_estimate_size(size, sizeof(BufferLookupEnt));
Index: src/backend/storage/buffer/freelist.c
===================================================================
--- src/backend/storage/buffer/freelist.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/storage/buffer/freelist.c    (.../trunk/postgres-src)    (revision 55)
@@ -798,12 +798,12 @@
  *
  * estimate the size of shared memory used by the freelist-related structures.
  */
-int
+Size
 StrategyShmemSize(void)
 {
     /* A1out list can hold 50% of NBuffers, per Johnson and Shasha */
     int            nCDBs = NBuffers + NBuffers / 2;
-    int            size = 0;
+    Size        size = 0;

     /* size of CDB lookup hash table */
     size += BufTableShmemSize(nCDBs);
Index: src/backend/storage/ipc/ipci.c
===================================================================
--- src/backend/storage/ipc/ipci.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/storage/ipc/ipci.c    (.../trunk/postgres-src)    (revision 55)
@@ -60,7 +60,7 @@

     if (!IsUnderPostmaster)
     {
-        int            size;
+        Size            size;
         int            numSemas;

         /*
@@ -86,7 +86,7 @@
         /* might as well round it off to a multiple of a typical page size */
         size += 8192 - (size % 8192);

-        elog(DEBUG3, "invoking IpcMemoryCreate(size=%d)", size);
+        elog(DEBUG3, "invoking IpcMemoryCreate(size=%lu)", (unsigned long) size);

         /*
          * Create the shmem segment
Index: src/backend/storage/ipc/shmem.c
===================================================================
--- src/backend/storage/ipc/shmem.c    (.../tags/postgres-src/8.0.3/postgres-src)    (revision 55)
+++ src/backend/storage/ipc/shmem.c    (.../trunk/postgres-src)    (revision 55)
@@ -146,8 +146,8 @@
 void *
 ShmemAlloc(Size size)
 {
-    uint32        newStart;
-    uint32        newFree;
+    Size        newStart;
+    Size        newFree;
     void       *newSpace;

     /* use volatile pointer to prevent code rearrangement */

Attachment

A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
Hi, all,

I have posted a couple of patches with regard to 64bit environment
support to PATCHES ml.   It expands size of shared memory to 64bit space
and extends XID to 64bit.   Please take a look at it.

--
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Re: A couple of patches for PostgreSQL 64bit support

From
Tom Lane
Date:
Koichi Suzuki <koichi@intellilink.co.jp> writes:
> Here're a couple of patches for PostgreSQL 64bit support.  There're just
> two extension to 64bit, size of shared memory and transaction ID.

I asked originally for some experimental evidence showing any value
in having more than 2Gb of shared buffers.  In the absence of any
convincing demonstration, I'm not very inclined to worry about whether
we can handle wider-than-int shared memory size.

As for the XID change, I don't think this patch accurately reflects the
size of the impact.  There are a lot of things that in practice need to
be the same size as XID (CID, most obviously, but I suspect also OID).
And again, some demonstration of the performance impact would be
appropriate.  Here, not only do you have to prove that widening XID
isn't a big performance hit in itself, but you also have to convince
us that it's a win compared to the existing approach of vacuuming at
least every billion transactions.

            regards, tom lane

Re: A couple of patches for PostgreSQL 64bit support

From
Hans-Jürgen Schönig
Date:
Tom Lane wrote:
> Koichi Suzuki <koichi@intellilink.co.jp> writes:
>
>>Here're a couple of patches for PostgreSQL 64bit support.  There're just
>>two extension to 64bit, size of shared memory and transaction ID.
>
>
> I asked originally for some experimental evidence showing any value
> in having more than 2Gb of shared buffers.  In the absence of any
> convincing demonstration, I'm not very inclined to worry about whether
> we can handle wider-than-int shared memory size.

There is some practical evidence. Recently the number of large boxes in
the field is almost growing exponentially. Today I have heard somebody
say "this box has 'just 4 gig of ram' ".
On large installations we have already seen problems with too small
caches (= 2gb).
Surprisingly this has turned out to be a quite important issue in the
field. Tests have shown that the cache provided by the OS is a lot worse
for the database.

64-bit XIDs seem to be an overkill - the only practical impact I can see
is an even larger tuple header (this can be an issue on large boxes too
- at least compared to Oracle).

    Best regards,

        Hans

Re: A couple of patches for PostgreSQL 64bit support

From
Tom Lane
Date:
=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:
> There is some practical evidence. Recently the number of large boxes in
> the field is almost growing exponentially. Today I have heard somebody
> say "this box has 'just 4 gig of ram' ".
> On large installations we have already seen problems with too small
> caches (= 2gb).
> Surprisingly this has turned out to be a quite important issue in the
> field. Tests have shown that the cache provided by the OS is a lot worse
> for the database.

*What* tests?  This is all handwaving :-(

What I would find credible is a set of, say, OSDL test runs, showing a
continuing increase of performance with shared_buffers right up to the
2Gb limit.  Everything done to date says that you hit the point of
diminishing returns well before that.

            regards, tom lane

Re: A couple of patches for PostgreSQL 64bit support

From
Hans-Jürgen Schönig
Date:
Tom Lane wrote:
> =?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:
>
>>There is some practical evidence. Recently the number of large boxes in
>>the field is almost growing exponentially. Today I have heard somebody
>>say "this box has 'just 4 gig of ram' ".
>>On large installations we have already seen problems with too small
>>caches (= 2gb).
>>Surprisingly this has turned out to be a quite important issue in the
>>field. Tests have shown that the cache provided by the OS is a lot worse
>>for the database.
>
>
> *What* tests?  This is all handwaving :-(
>
> What I would find credible is a set of, say, OSDL test runs, showing a
> continuing increase of performance with shared_buffers right up to the
> 2Gb limit.  Everything done to date says that you hit the point of
> diminishing returns well before that.
>
>             regards, tom lane


well, you can easily try it on a big machine with gigs of ram and
nothing but the database running. using a very low number of shared
buffers will lead to worse performance than many shared buffers - even
if the operating system caches some disk i/O (which is done by linux if
nobody else want to have some ram). i have no public hard figures i
could post here but customers have told me that 2Q cache vs. kernel
cache is around 5-10 times faster (it varies of course).

the 2gb thing is especially important for data crunchers - not
necessarily for 'normal' OLTP databases. just assume somebody getting 5
gig of data and doing some repeated computation with this data. you
definitely don't want to go to disk in this case. people will assume
that postgresql can work with large caches ("it is a good database - why
do i get errors on startup" - this is the usual story). people rather
tend to rely on PostgreSQL than on some operating system thing ;).

i might have some time to provide some real hard facts to prove this but
i am a bit busy at the moment.

    best regards,

        hans



Re: A couple of p.tches for PostgreSQL 64bit support

From
Alvaro Herrera
Date:
On Thu, Jul 07, 2005 at 06:23:40PM +0200, Hans-Jürgen Schönig wrote:
> Tom Lane wrote:
> >Koichi Suzuki <koichi@intellilink.co.jp> writes:
> >
> >>Here're a couple of patches for PostgreSQL 64bit support.  There're just
> >>two extension to 64bit, size of shared memory and transaction ID.
> >
> >
> >I asked originally for some experimental evidence showing any value
> >in having more than 2Gb of shared buffers.  In the absence of any
> >convincing demonstration, I'm not very inclined to worry about whether
> >we can handle wider-than-int shared memory size.
>
> There is some practical evidence. Recently the number of large boxes in
> the field is almost growing exponentially. Today I have heard somebody
> say "this box has 'just 4 gig of ram' ".

Yes, but the point is if it's a good idea to have that many shared
buffers.  Is there a measurable difference between that, and leaving the
extra memory for the kernel to manage cache?  Remember that there have
been measurements that showed, for previous releases, that having shared
buffers set too high was detrimental to performance.  So the first thing
to do is present results showing that this is no longer true.

This could very well be the case, because of the rewriting of the buffer
manager.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

Re: A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
I have some experimeltal data about this extension.   I will gather it
and post hopefully this weekend.

Tom Lane wrote:
> Koichi Suzuki <koichi@intellilink.co.jp> writes:
>
>>Here're a couple of patches for PostgreSQL 64bit support.  There're just
>>two extension to 64bit, size of shared memory and transaction ID.
>
>
> I asked originally for some experimental evidence showing any value
> in having more than 2Gb of shared buffers.  In the absence of any
> convincing demonstration, I'm not very inclined to worry about whether
> we can handle wider-than-int shared memory size.
>
> As for the XID change, I don't think this patch accurately reflects the
> size of the impact.  There are a lot of things that in practice need to
> be the same size as XID (CID, most obviously, but I suspect also OID).
> And again, some demonstration of the performance impact would be
> appropriate.  Here, not only do you have to prove that widening XID
> isn't a big performance hit in itself, but you also have to convince
> us that it's a win compared to the existing approach of vacuuming at
> least every billion transactions.
>
>             regards, tom lane
>


--
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Re: A couple of patches for PostgreSQL 64bit support

From
ITAGAKI Takahiro
Date:
Hans-J|rgen Schvnig <postgres@cybertec.at> wrote:

> 64-bit XIDs seem to be an overkill - the only practical impact I can see
> is an even larger tuple header (this can be an issue on large boxes too
> - at least compared to Oracle).

I agreed, too. The changes of XIDs cannot be ignored because the overhead
will be 32bytes per tuple.

Avoiding overheads, can XIDs/CIDs be different bit length? For example,
can XIDs/CIDs be changed to 48/16-bit or 40/24-bit?

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories


Re: A couple of p.tches for PostgreSQL 64bit support

From
Alvaro Herrera
Date:
On Fri, Jul 08, 2005 at 09:38:07AM +0900, ITAGAKI Takahiro wrote:
> Hans-J|rgen Schvnig <postgres@cybertec.at> wrote:
>
> > 64-bit XIDs seem to be an overkill - the only practical impact I can see
> > is an even larger tuple header (this can be an issue on large boxes too
> > - at least compared to Oracle).
>
> I agreed, too. The changes of XIDs cannot be ignored because the overhead
> will be 32bytes per tuple.
>
> Avoiding overheads, can XIDs/CIDs be different bit length? For example,
> can XIDs/CIDs be changed to 48/16-bit or 40/24-bit?

Not unless you change the definition of HeapTupleFields
(src/include/access/htup.h).  Alignment concerns would probably bite you
if you changed it, anyway.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)

Re: A couple of p.tches for PostgreSQL 64bit support

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On Fri, Jul 08, 2005 at 09:38:07AM +0900, ITAGAKI Takahiro wrote:
>> Avoiding overheads, can XIDs/CIDs be different bit length? For example,
>> can XIDs/CIDs be changed to 48/16-bit or 40/24-bit?

> Not unless you change the definition of HeapTupleFields
> (src/include/access/htup.h).  Alignment concerns would probably bite you
> if you changed it, anyway.

I don't think we could feasibly reduce the width of CID anyway; we've
already seen a few complaints of people overrunning 2^32 commands per
transaction, and surely this is a bigger rather than a lesser concern
if you are thinking of large databases.

It probably would be possible to keep CID at 32 bits and lay out the
HeapTupleHeader so that you only pay for three, not four, 64-bit
fields ... but that's still twelve bytes added per tuple.

            regards, tom lane

Re: A couple of p.tches for PostgreSQL 64bit support

From
Satoshi Nagayasu
Date:
Hi guys,

BTW, I found the work_mem is still limited to 2GB.

If we support 64bit shared memory, we also need to support
64bit work_mem.

Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>




Re: A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
Hi,

Attached is a result of pgbench with 64bit patch PostgreSQL (base is
8.0.1).  Benchmark machine is dual opteron (1.4GHz, 1MB cache each) with
8GB of memory and 120GB of IDE hard disk.

Koichi Suzuki wrote:
> I have some experimeltal data about this extension.   I will gather it
> and post hopefully this weekend.


--
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Attachment

Re: A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
> I asked originally for some experimental evidence showing any value
> in having more than 2Gb of shared buffers.  In the absence of any
> convincing demonstration, I'm not very inclined to worry about whether
> we can handle wider-than-int shared memory size.

Hi,

Attached is a result of pgbench with 64bit patch PostgreSQL (base is
8.0.1).  Benchmark machine is dual opteron (1.4GHz, 1MB cache each) with
8GB of memory and 120GB of IDE hard disk.

Koichi Suzuki wrote:

>> I have some experimeltal data about this extension.   I will gather it
>> and post hopefully this weekend.

-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Attachment

Re: A couple of patches for PostgreSQL 64bit support

From
Mark Wong
Date:
Hi,

I grabbed the patches to try, but I was wondering if it would be more
interesting to try them against CVS rather than 8.0.3 (and if it would
be easy to port :)?

Mark

Re: A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
Mark,

I've not seen CVS in detail.   I begain this work against 8.0.1 and
continued thru 8.0.2 to 8.0.3.  It was not a great work.   The patch is
rather straightforward and I appreciate if you try to port against CVS.

Mark Wong wrote:
> Hi,
>
> I grabbed the patches to try, but I was wondering if it would be more
> interesting to try them against CVS rather than 8.0.3 (and if it would
> be easy to port :)?
>
> Mark
>


--
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Re: A couple of patches for PostgreSQL 64bit support

From
Tom Lane
Date:
Koichi Suzuki <koichi@intellilink.co.jp> writes:
> Here're a couple of patches for PostgreSQL 64bit support.  There're just
> two extension to 64bit, size of shared memory and transaction ID.

I've applied the part of this that seemed reasonably noncontroversial,
namely the fixes to do shared memory size calculation in size_t
arithmetic instead of int arithmetic.  (While at it, I extended the Size
convention to all the shared memory sizing routines, not just buffers,
and added code to detect overflows in the calculations.  That way we
don't need a "64 bit" configure switch.)  While I still remain
unconvinced that there's any real-world need for more than 2Gb of
shared_buffers, this change certainly makes the code more robust against
configuration errors, and it has essentially zero cost (except maybe a
few more milliseconds during postmaster start).

On the other hand, I think the 64-bit XID idea needs considerably more
work before being proposed again.  That would incur serious costs due
to the expansion of tuple headers, and there's no evidence that the
distributed cost would be bought back by avoiding one vacuum pass every
billion transactions.  (Your description of the patch claimed that
vacuuming requires shutting down the database, which is simply wrong.)
Also, as previously noted, you can't just whack the size of XID around
without considering side-effects on CID, OID, Datum, etc.

            regards, tom lane

Re: A couple of patches for PostgreSQL 64bit support

From
Koichi Suzuki
Date:
Thanks a lot for the port to CVS.

I agree that we need more benckmark efforts to clarify real outcome of
"more than 2GB" memory.   Please let me spend some more for this.  I
will post benchmark results.   As long as I see from pgbench, it looks
more memory gets more throuput.   Maybe big SQL against big dataset is
another example to show the effect.

I also agree that we need much more study to show the effect of 64bit
TID (and perhaps CID).   Based on the patch I posted, I'll continue my
effort and also post the results for discussion.

Best Regards;

Tom Lane wrote:
> Koichi Suzuki <koichi@intellilink.co.jp> writes:
>
>>Here're a couple of patches for PostgreSQL 64bit support.  There're just
>>two extension to 64bit, size of shared memory and transaction ID.
>
>
> I've applied the part of this that seemed reasonably noncontroversial,
> namely the fixes to do shared memory size calculation in size_t
> arithmetic instead of int arithmetic.  (While at it, I extended the Size
> convention to all the shared memory sizing routines, not just buffers,
> and added code to detect overflows in the calculations.  That way we
> don't need a "64 bit" configure switch.)  While I still remain
> unconvinced that there's any real-world need for more than 2Gb of
> shared_buffers, this change certainly makes the code more robust against
> configuration errors, and it has essentially zero cost (except maybe a
> few more milliseconds during postmaster start).
>
> On the other hand, I think the 64-bit XID idea needs considerably more
> work before being proposed again.  That would incur serious costs due
> to the expansion of tuple headers, and there's no evidence that the
> distributed cost would be bought back by avoiding one vacuum pass every
> billion transactions.  (Your description of the patch claimed that
> vacuuming requires shutting down the database, which is simply wrong.)
> Also, as previously noted, you can't just whack the size of XID around
> without considering side-effects on CID, OID, Datum, etc.
>
>             regards, tom lane
>


--
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------