Re: A Windows x64 port of PostgreSQL - Mailing list pgsql-hackers

From Ken Camann
Subject Re: A Windows x64 port of PostgreSQL
Date
Msg-id 63c05a820807022113k4e4e1f45r2d0abdd33b1631c7@mail.gmail.com
Whole thread Raw
In response to Re: A Windows x64 port of PostgreSQL  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A Windows x64 port of PostgreSQL  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Jul 2, 2008 at 8:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Ken Camann" <kjcamann@gmail.com> writes:
>> Oh I see.  Between this and looking again at the warning list, I see
>> that it will probably take a lot more work than I thought.  There are
>> about 450 occurrences of the assumption that sizeof(size_t) ==
>> sizeof(int).
>
> [ blink... ]  There are *zero* occurrences of the assumption that
> sizeof(size_t) == sizeof(int), unless maybe in some of that grotty
> #ifdef WIN32 code.  Postgres has run on 64-bit platforms for many
> years now.

Hi Tom.

I knew about the previous 64 bit platform support, which is why I was
so surprised to see the problem.  Unless I am missing an important
#define that somehow makes this stuff go away (but I don't think so,
given how much of it there is) it does happen to be in there.  If I
haven't done anything wrong, I would assume no one noticed because
those architectures define sizeof(long) to be >= sizeof(size_t).

Well actually, let me be as strict as possible because I don't know
the latest C standards very well (I am a C++ programmer).  Am I
correct that the standard says that sizeof(size_t) must be
sizeof(void*), and that no compiler has ever said otherwise?  I think
so, given what size_t is supposed to mean. So I tend use sizeof(void*)
and sizeof(size_t) interchangeably.  Sorry for the confusion if that
is less clear.  According to postgres.h (not conditionally defined by
anything) states that all the code assumes:

sizeof(Datum) == sizeof(long) >= sizeof(void *) >= 4

where the first equation is reflexively true because Datum is a long
typedef.  EMT64/AMD64 is new compared to the older architectures, I
would guess the older ones predate the time when it became a somewhat
de facto standard to leave "long int" at 4 bytes, and make "long long"
the new 64-bit type.  In fact this definition is so common that it
will soon be the de jour C++ standard definition.  I assume ISO C
still will not fix byte lengths to the declarators since they've
fought it for so long.  In any case, if sizeof(long) = 4 this fails to
be true.

This is more interesting still (in c.h)

/** Size*        Size of any memory resident object, as returned by sizeof.*/
typedef size_t Size;

/** Index*        Index into any memory resident array.** Note:*        Indices are non negative.*/
typedef unsigned int Index;

/** Offset*        Offset into any memory resident array.** Note:*        This differs from an Index in that an Index
isalways*        non negative, whereas Offset may be negative.*/
 
typedef signed int Offset;

There seems to be an interesting mix of size_t, long, and int in use
for memory.  No one has noticed possibly because the shared buffers
per single user have never been bigger than 2GB for anyone.  Postgres
documentation recommends "big" numbers like 20 or 30 MB, and the
default is much smaller.  In order to have had problems with this,
you'd probably need all the following to happen at once:

1.) a huge enterprise (with lots of money to buy memory but using
postgres and not Oracle) doing data warehousing on enormous tables
2.) on a platform where sizeof(int) = sizeof(long) = 4 but sizeof(void*) = 8
3.) a DBA who wanted the shared buffers > 2 GB
4.) An operating system supporting > 2GB of memory
5.) An operating system willing to allocate continuous blocks > 2 GB
6.) An cstdlib implementation of malloc willing to allocate continuous
blocks > 2 GB
7.) Exactly the right query to make it explode.

I think it happens to work out that not all of those have happened
simultaneously yet.

Anyway, there are a lot of other sizeof(int) == sizeof(size_t)
assumptions in totally unimportant places, here's one in bootstrap.c

"int len;

len = strlen(str); //possible loss of data"

That kind is very common.

-Ken


pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: Git Repository for WITH RECURSIVE and others
Next
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0