Thread: [HACKERS] Should logtape.c blocks be of type long?

[HACKERS] Should logtape.c blocks be of type long?

From
Peter Geoghegan
Date:
logtape.c stores block numbers on disk. These block numbers are
represented in memory as being of type long. This is why BufFile
clients that use the block-based interface (currently limited to
logtape.c and gistbuildbuffers.c) must live with a specific
limitation, as noted within buffile.c:

/** BufFileSeekBlock --- block-oriented seek** Performs absolute seek to the start of the n'th BLCKSZ-sized block of*
thefile.  Note that users of this interface will fail if their files* exceed BLCKSZ * LONG_MAX bytes, but that is quite
alot; we don't work* with tables bigger than that, either...** Result is 0 if OK, EOF if not.  Logical position is not
movedif an* impossible seek is attempted.*/
 
int
BufFileSeekBlock(BufFile *file, long blknum)
{   ...
}

That restriction is fine on 64-bit Linux, where it is actually true
that "we don't work with tables that big either", but on Win64 long is
still only 4 bytes. 32-bit systems are similarly affected. Of course,
MaxBlockNumber is 0xFFFFFFFE, whereas LONG_MAX is only 0x7FFFFFFF on
affected systems.

Is it worth changing this interface to use int64, which is already
defined to be a "long int" on most real-world installations anyway?
Though it hardly matters, this would have some cost: logtape.c temp
files would have a higher storage footprint on win64 (i.e., the same
overhead as elsewhere).

I tend to be suspicious of use of the type "long" in general, because
in general one should assume that it is no wider than "int". This
calls into question why any code that uses "long" didn't just use
"int", at least in my mind.

-- 
Peter Geoghegan



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Robert Haas
Date:
On Sun, Feb 26, 2017 at 2:44 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> I tend to be suspicious of use of the type "long" in general, because
> in general one should assume that it is no wider than "int". This
> calls into question why any code that uses "long" didn't just use
> "int", at least in my mind.

Yeah.  Using things that are guaranteed to be the size we want them to
be (and the same size on all platforms) seems like a good plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> logtape.c stores block numbers on disk. These block numbers are
> represented in memory as being of type long.

Yeah.  This code is far older than our willingness to assume that every
platform can support int64, and I'm pretty sure that use of "long" was
just a compromise to get the widest values we could use portably and
without a lot of notational hassle.  (There are some similar choices in
the area of memory usage, particularly calculations related to work_mem.)

Having said that, I'm not sure it's worth the trouble of changing.
The platforms where there's a difference are probably not muscular
enough that anyone would ever get past 16TB in a temp file anyhow.
        regards, tom lane



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Peter Geoghegan
Date:
On Sun, Feb 26, 2017 at 9:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  This code is far older than our willingness to assume that every
> platform can support int64, and I'm pretty sure that use of "long" was
> just a compromise to get the widest values we could use portably and
> without a lot of notational hassle.  (There are some similar choices in
> the area of memory usage, particularly calculations related to work_mem.)

I'm glad that you pointed out the history with work_mem calculations
specifically, since I have found this confusing in the past. I was
about to ask "what about 64-bit Windows?", but then remembered that we
don't actually support large allocations on that platform (this is why
MAX_KILOBYTES exists).

> Having said that, I'm not sure it's worth the trouble of changing.
> The platforms where there's a difference are probably not muscular
> enough that anyone would ever get past 16TB in a temp file anyhow.

As things stand, a 64-bit windows installation would have any CLUSTER
of a table that exceeds 16TiB fail, possibly pretty horribly (I
haven't thought through the consequences much). This is made more
likely by the fact that we've made tuplesort faster in the past few
releases (gains which the MAX_KILOBYTES restriction won't impinge on
too much, particularly in Postgres 10). I find that unacceptable, at
least for Postgres 10.

-- 
Peter Geoghegan



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Robert Haas
Date:
On Sun, Feb 26, 2017 at 10:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Having said that, I'm not sure it's worth the trouble of changing.
> The platforms where there's a difference are probably not muscular
> enough that anyone would ever get past 16TB in a temp file anyhow.

Is this comment a reflection of your generally-low opinion about
Windows, or some specific limitation that I don't know about?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, Feb 26, 2017 at 9:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Having said that, I'm not sure it's worth the trouble of changing.
>> The platforms where there's a difference are probably not muscular
>> enough that anyone would ever get past 16TB in a temp file anyhow.

> As things stand, a 64-bit windows installation would have any CLUSTER
> of a table that exceeds 16TiB fail, possibly pretty horribly (I
> haven't thought through the consequences much). This is made more
> likely by the fact that we've made tuplesort faster in the past few
> releases (gains which the MAX_KILOBYTES restriction won't impinge on
> too much, particularly in Postgres 10). I find that unacceptable, at
> least for Postgres 10.

[ shrug... ]  If you're excited enough about it to do the work, I won't
stand in your way.  But I don't find it to be a stop-ship issue.
        regards, tom lane



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Peter Geoghegan
Date:
On Sun, Feb 26, 2017 at 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ shrug... ]  If you're excited enough about it to do the work, I won't
> stand in your way.  But I don't find it to be a stop-ship issue.

I'll add it to my todo list for Postgres 10.

I think it's worth being consistent about a restriction like this, as
Robert said. Given that fixing this issue will not affect the machine
code generated by compilers for the majority of platforms we support,
doing so seems entirely worthwhile to me.

-- 
Peter Geoghegan



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Michael Paquier
Date:
On Sun, Feb 26, 2017 at 10:04:20AM -0800, Peter Geoghegan wrote:
> I think it's worth being consistent about a restriction like this, as
> Robert said. Given that fixing this issue will not affect the machine
> code generated by compilers for the majority of platforms we support,
> doing so seems entirely worthwhile to me.

(Reviving an old thread, fives years later..)

As far as I can see, no patches have been sent to do that, and the
changes required to replace long by int64 in the logtape and tuplesort
code are rather minimal as long as some care is taken with all the
internals of logtape (correct me of course if I'm wrong).  I really
don't see why we shouldn't do the switch based on the argument of
Windows not being able to handle more than 16TB worth of blocks as
things stand because of long in these code paths.

So, attached is a patch to change these longs to int64.  Any thoughts?
--
Michael

Attachment

Re: [HACKERS] Should logtape.c blocks be of type long?

From
Peter Geoghegan
Date:
On Thu, Sep 21, 2023 at 9:46 PM Michael Paquier <michael@paquier.xyz> wrote:
> So, attached is a patch to change these longs to int64.  Any thoughts?

No new thoughts. I'm still all in favor of this. Thanks for picking it up.

At some point we should completely ban the use of "long".

--
Peter Geoghegan



Re: [HACKERS] Should logtape.c blocks be of type long?

From
Michael Paquier
Date:
On Thu, Sep 21, 2023 at 09:53:02PM -0700, Peter Geoghegan wrote:
> No new thoughts. I'm still all in favor of this. Thanks for picking it up.

Okay, thanks.  I guess that nobody would complain if I were to apply
that..

> At some point we should completely ban the use of "long".

Indeed, or Windows decides that making long 8-byte is wiser, but I
doubt that's ever going to happen on backward-compatibility ground.
--
Michael

Attachment

Re: [HACKERS] Should logtape.c blocks be of type long?

From
Michael Paquier
Date:
On Sun, Sep 24, 2023 at 10:42:49AM +0900, Michael Paquier wrote:
> Indeed, or Windows decides that making long 8-byte is wiser, but I
> doubt that's ever going to happen on backward-compatibility ground.

While looking more at that, I've noticed that I missed BufFileAppend()
and BufFileSeekBlock(), that themselves rely on long.  The other code
paths calling these two routines rely on BlockNumber (aka uint32), so
that seems to be the bottom of it.

For now, I have registered this patch to the next CF:
https://commitfest.postgresql.org/45/4589/

Comments are welcome.
--
Michael

Attachment

Re: [HACKERS] Should logtape.c blocks be of type long?

From
Heikki Linnakangas
Date:
On 26/09/2023 07:15, Michael Paquier wrote:
> On Sun, Sep 24, 2023 at 10:42:49AM +0900, Michael Paquier wrote:
>> Indeed, or Windows decides that making long 8-byte is wiser, but I
>> doubt that's ever going to happen on backward-compatibility ground.
> 
> While looking more at that, I've noticed that I missed BufFileAppend()
> and BufFileSeekBlock(), that themselves rely on long.  The other code
> paths calling these two routines rely on BlockNumber (aka uint32), so
> that seems to be the bottom of it.

BufFileTellBlock should be adjusted too. Or removed altogether; it's 
been commented out since year 2000. Other than that, looks good to me.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: [HACKERS] Should logtape.c blocks be of type long?

From
Michael Paquier
Date:
On Thu, Nov 16, 2023 at 03:03:46PM +0100, Heikki Linnakangas wrote:
> BufFileTellBlock should be adjusted too. Or removed altogether; it's been
> commented out since year 2000. Other than that, looks good to me.

Yes, I recall wondering about what to do on this one so I just let it
be when updating the last version of the patch as we have many
NOT_USED stuff in the core code.  After 23 years being around for no
purpose, I have just applied a patch to remove it for the sake of this
change, then applied the main change.

Thanks for double-checking, Heikki!
--
Michael

Attachment