Thread: Use of "long" in incremental sort code

Use of "long" in incremental sort code

From

David Rowley

Date:

30 June 2020, 04:13:01

Hi,

I noticed the incremental sort code makes use of the long datatype a
few times, e.g in TuplesortInstrumentation and
IncrementalSortGroupInfo.  (64-bit windows machines have sizeof(long)
== 4).  I understand that the values are in kilobytes and it would
take 2TB to cause them to wrap. Never-the-less, I think it would be
better to choose a better-suited type. work_mem is still limited to
2GB on 64-bit Windows machines, so perhaps there's some argument that
it does not matter about fields that related to in-memory stuff, but
the on-disk fields are wrong.  The in-memory fields likely raise the
bar further for fixing the 2GB work_mem limit on Windows.

Maybe Size would be better for the in-memory fields and uint64 for the
on-disk fields?

David

Re: Use of "long" in incremental sort code

From

Tom Lane

Date:

30 June 2020, 04:20:00

David Rowley <dgrowleyml@gmail.com> writes:
> I noticed the incremental sort code makes use of the long datatype a
> few times, e.g in TuplesortInstrumentation and
> IncrementalSortGroupInfo.  (64-bit windows machines have sizeof(long)
> == 4).  I understand that the values are in kilobytes and it would
> take 2TB to cause them to wrap. Never-the-less, I think it would be
> better to choose a better-suited type. work_mem is still limited to
> 2GB on 64-bit Windows machines, so perhaps there's some argument that
> it does not matter about fields that related to in-memory stuff, but
> the on-disk fields are wrong.  The in-memory fields likely raise the
> bar further for fixing the 2GB work_mem limit on Windows.

> Maybe Size would be better for the in-memory fields and uint64 for the
> on-disk fields?

There is a fairly widespread issue that memory-size-related GUCs and
suchlike variables are limited to represent sizes that fit in a "long".
Although Win64 is the *only* platform where that's an issue, maybe
it's worth doing something about.  But we shouldn't just fix the sort
code, if we do do something.

(IOW, I don't agree with doing a fix that doesn't also fix work_mem.)

            regards, tom lane

Re: Use of "long" in incremental sort code

From

David Rowley

Date:

30 June 2020, 04:24:00

On Tue, 30 Jun 2020 at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There is a fairly widespread issue that memory-size-related GUCs and
> suchlike variables are limited to represent sizes that fit in a "long".
> Although Win64 is the *only* platform where that's an issue, maybe
> it's worth doing something about.  But we shouldn't just fix the sort
> code, if we do do something.
>
> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)

I raised it mostly because this new-to-PG13-code is making the problem worse.

If we're not going to change the in-memory fields, then shouldn't we
at least change the ones for disk space tracking?

David

Re: Use of "long" in incremental sort code

From

Peter Eisentraut

Date:

30 June 2020, 11:21:37

On 2020-06-30 06:24, David Rowley wrote:
> On Tue, 30 Jun 2020 at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There is a fairly widespread issue that memory-size-related GUCs and
>> suchlike variables are limited to represent sizes that fit in a "long".
>> Although Win64 is the *only* platform where that's an issue, maybe
>> it's worth doing something about.  But we shouldn't just fix the sort
>> code, if we do do something.
>>
>> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)
> 
> I raised it mostly because this new-to-PG13-code is making the problem worse.

Yeah, we recently got rid of a bunch of inappropriate use of long, so it 
seems reasonable to make this new code follow that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Use of "long" in incremental sort code

From

James Coleman

Date:

02 July 2020, 16:01:21

On Tue, Jun 30, 2020 at 7:21 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-06-30 06:24, David Rowley wrote:
> > On Tue, 30 Jun 2020 at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> There is a fairly widespread issue that memory-size-related GUCs and
> >> suchlike variables are limited to represent sizes that fit in a "long".
> >> Although Win64 is the *only* platform where that's an issue, maybe
> >> it's worth doing something about.  But we shouldn't just fix the sort
> >> code, if we do do something.
> >>
> >> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)
> >
> > I raised it mostly because this new-to-PG13-code is making the problem worse.
>
> Yeah, we recently got rid of a bunch of inappropriate use of long, so it
> seems reasonable to make this new code follow that.

I've attached a patch to make this change but with one tweak: I
decided to use unint64 for both memory and disk (rather than Size in
some cases) since we aggregated across multiple runs and have shared
code that deals with both values.

James

Attachment

v1-0001-Use-unint64-instead-of-long-for-space-used-variab.patch

Re: Use of "long" in incremental sort code

From

Peter Geoghegan

Date:

02 July 2020, 17:36:29

On Mon, Jun 29, 2020 at 9:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
> I noticed the incremental sort code makes use of the long datatype a
> few times, e.g in TuplesortInstrumentation and
> IncrementalSortGroupInfo.

I agree that long is terrible, and should generally be avoided.

> Maybe Size would be better for the in-memory fields and uint64 for the
> on-disk fields?

FWIW we have to use int64 for the in-memory tuplesort.c fields. This
is because it must be possible for the fields to have negative values
in the context of tuplesort. If there is going to be a general rule
for in-memory fields, then ISTM that it'll have to be "use int64".

logtape.c uses long for on-disk fields. It also relies on negative
values, albeit to a fairly limited degree (it uses -1 as a magic
value).

-- 
Peter Geoghegan

Re: Use of "long" in incremental sort code

From

James Coleman

Date:

02 July 2020, 17:53:32

On Thu, Jul 2, 2020 at 1:36 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Jun 29, 2020 at 9:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > I noticed the incremental sort code makes use of the long datatype a
> > few times, e.g in TuplesortInstrumentation and
> > IncrementalSortGroupInfo.
>
> I agree that long is terrible, and should generally be avoided.
>
> > Maybe Size would be better for the in-memory fields and uint64 for the
> > on-disk fields?
>
> FWIW we have to use int64 for the in-memory tuplesort.c fields. This
> is because it must be possible for the fields to have negative values
> in the context of tuplesort. If there is going to be a general rule
> for in-memory fields, then ISTM that it'll have to be "use int64".
>
> logtape.c uses long for on-disk fields. It also relies on negative
> values, albeit to a fairly limited degree (it uses -1 as a magic
> value).

Do you think it's reasonable to use int64 across the board for memory
and disk space numbers then? If so, I can update the patch.

James

Re: Use of "long" in incremental sort code

From

Peter Geoghegan

Date:

02 July 2020, 18:07:03

On Thu, Jul 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
> Do you think it's reasonable to use int64 across the board for memory
> and disk space numbers then? If so, I can update the patch.

Using int64 as a replacement for long is the safest general strategy,
and so ISTM that it might be worth doing that even in cases where it
isn't clearly necessary. After all, any code that uses long must have
been written with the assumption that that was the same thing as
int64, at least on most platforms.

There is nothing wrong with using Size/size_t, and doing so is often
slightly clearer. But it's no drop-in replacement for long.

-- 
Peter Geoghegan

Re: Use of "long" in incremental sort code

From

Tom Lane

Date:

02 July 2020, 19:39:42

Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Jul 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
>> Do you think it's reasonable to use int64 across the board for memory
>> and disk space numbers then? If so, I can update the patch.

> Using int64 as a replacement for long is the safest general strategy,

mumble ssize_t mumble

            regards, tom lane

Re: Use of "long" in incremental sort code

From

Peter Geoghegan

Date:

02 July 2020, 19:42:48

On Thu, Jul 2, 2020 at 12:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> mumble ssize_t mumble

That's from POSIX, though. I imagine MSVC won't be happy (surprise!).

-- 
Peter Geoghegan

Re: Use of "long" in incremental sort code

From

Tom Lane

Date:

02 July 2020, 19:44:23

Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Jul 2, 2020 at 12:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> mumble ssize_t mumble

> That's from POSIX, though. I imagine MSVC won't be happy (surprise!).

We've got quite a few uses of it already, so apparently it's fine.

            regards, tom lane

Re: Use of "long" in incremental sort code

From

James Coleman

Date:

02 July 2020, 19:47:46

On Thu, Jul 2, 2020 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Thu, Jul 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
> >> Do you think it's reasonable to use int64 across the board for memory
> >> and disk space numbers then? If so, I can update the patch.
>
> > Using int64 as a replacement for long is the safest general strategy,
>
> mumble ssize_t mumble

But wouldn't that mean we'd get int on 32-bit systems, and since we're
accumulating data we could go over that value in both memory and disk?

My assumption is that it's preferable to have the "this run value" and
the "total used across multiple runs" and both of those for disk and
memory to be the same. In that case it seems we want to guarantee
64-bits.

Patch using int64 attached.

James

Attachment

v2-0001-Use-int64-instead-of-long-for-space-used-variable.patch

Re: Use of "long" in incremental sort code

From

Peter Geoghegan

Date:

02 July 2020, 19:49:48

On Thu, Jul 2, 2020 at 12:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That's from POSIX, though. I imagine MSVC won't be happy (surprise!).
>
> We've got quite a few uses of it already, so apparently it's fine.

Oh, looks like we have a compatibility hack for MSVC within
win32_port.h, where ssize_t is typedef'd to __int64. I didn't realize
that it was okay to use ssize_t.

-- 
Peter Geoghegan

Re: Use of "long" in incremental sort code

From

Peter Geoghegan

Date:

02 July 2020, 19:55:02

On Thu, Jul 2, 2020 at 12:47 PM James Coleman <jtc331@gmail.com> wrote:
> But wouldn't that mean we'd get int on 32-bit systems, and since we're
> accumulating data we could go over that value in both memory and disk?
>
> My assumption is that it's preferable to have the "this run value" and
> the "total used across multiple runs" and both of those for disk and
> memory to be the same. In that case it seems we want to guarantee
> 64-bits.

I agree. There seems to be little reason to accommodate platform level
conventions, beyond making sure that everything works on less popular
or obsolete platforms.

I suppose that it's a little idiosyncratic to use int64 like this. But
it makes sense, and isn't nearly as ugly as the long thing, so I don't
think that it should really matter.

-- 
Peter Geoghegan

Re: Use of "long" in incremental sort code

From

Tom Lane

Date:

02 July 2020, 19:55:31

James Coleman <jtc331@gmail.com> writes:
> On Thu, Jul 2, 2020 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> mumble ssize_t mumble

> But wouldn't that mean we'd get int on 32-bit systems, and since we're
> accumulating data we could go over that value in both memory and disk?

Certainly, a number that's meant to represent the amount of data *on disk*
shouldn't use ssize_t.  But I think it's appropriate if you want to
represent in-memory quantities while also allowing negative values.

I guess if you're expecting in-memory sizes exceeding 2GB, you might worry
that ssize_t could overflow.  I'm dubious that a 32-bit machine could get
to that, though, seeing that it's going to have other demands on its
address space.

> My assumption is that it's preferable to have the "this run value" and
> the "total used across multiple runs" and both of those for disk and
> memory to be the same. In that case it seems we want to guarantee
> 64-bits.

If you're not going to distinguish in-memory from not-in-memory, agreed.

            regards, tom lane

Re: Use of "long" in incremental sort code

From

David Rowley

Date:

31 July 2020, 02:11:46

On Fri, 3 Jul 2020 at 07:47, James Coleman <jtc331@gmail.com> wrote:
> Patch using int64 attached.

I added this to the open items list for PG13.

David

Re: Use of "long" in incremental sort code

From

James Coleman

Date:

31 July 2020, 14:02:16

On Thu, Jul 30, 2020 at 10:12 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 3 Jul 2020 at 07:47, James Coleman <jtc331@gmail.com> wrote:
> > Patch using int64 attached.
>
> I added this to the open items list for PG13.
>
> David

I'd previously attached a patch [1], and there seemed to be agreement
it was reasonable (lightly so, but I also didn't see any
disagreement); would someone be able to either commit the change or
provide some additional feedback?

Thanks,
James

[1]: https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com

Re: Use of "long" in incremental sort code

From

David Rowley

Date:

02 August 2020, 02:26:25

On Sat, 1 Aug 2020 at 02:02, James Coleman <jtc331@gmail.com> wrote:
> I'd previously attached a patch [1], and there seemed to be agreement
> it was reasonable (lightly so, but I also didn't see any
> disagreement); would someone be able to either commit the change or
> provide some additional feedback?

It looks fine to me. Pushed.

David

> [1]: https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com

RE: Use of "long" in incremental sort code

From

"Tang, Haiying"

Date:

19 October 2020, 03:57:00

Hi

Found one more place needed to be changed(long -> int64).

Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG )

And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below.
Obviously, the ">=" is meaningless, right?

-        SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples);
+        SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples);

Please take a check at the attached patch file.

Previous disscution:
https://www.postgresql.org/message-id/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com

Best regards
Tang

Attachment

0001-Use-int64-instead-of-long-for-space-used-variables-a.patch

RE: Use of "long" in incremental sort code

From

"Tang, Haiying"

Date:

21 October 2020, 06:06:52

Hi

>Found one more place needed to be changed(long -> int64).
>
>Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG )
>
>And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below.
>Obviously, the ">=" is meaningless, right?
>
>And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below.
>Obviously, the ">=" is meaningless, right?
>
>-        SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples);
>+        SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples);
>
>Please take a check at the attached patch file.

I have added it to commit fest.
https://commitfest.postgresql.org/30/2772/

Best regards
Tang

-----Original Message-----
From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com> 
Sent: Monday, October 19, 2020 12:57 PM
To: David Rowley <dgrowleyml@gmail.com>; James Coleman <jtc331@gmail.com>
Cc: pgsql-hackers@postgresql.org
Subject: RE: Use of "long" in incremental sort code

Hi

Found one more place needed to be changed(long -> int64).

Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG )

And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below.
Obviously, the ">=" is meaningless, right?

-        SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples);
+        SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples);

Please take a check at the attached patch file.

Previous disscution:
https://www.postgresql.org/message-id/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com

Best regards
Tang

Re: Use of "long" in incremental sort code

From

Tomas Vondra

Date:

21 October 2020, 21:00:05

On Wed, Oct 21, 2020 at 06:06:52AM +0000, Tang, Haiying wrote:
>Hi
>
>>Found one more place needed to be changed(long -> int64).
>>
>>Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG )
>>
>>And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below.
>>Obviously, the ">=" is meaningless, right?
>>
>>And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below.
>>Obviously, the ">=" is meaningless, right?
>>
>>-        SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples);
>>+        SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples);
>>
>>Please take a check at the attached patch file.
>
>I have added it to commit fest.
>https://commitfest.postgresql.org/30/2772/
>

Thanks, the changes seem fine to me. I'll do a bit more review and get
it pushed.


regards
Tomas

Re: Use of "long" in incremental sort code

From

Tomas Vondra

Date:

03 November 2020, 02:53:53

Hi,

I took another look at this, and 99% of the patch (the fixes to sort
debug messages) seems fine to me.  Attached is the part I plan to get
committed, including commit message etc.


The one change I decided to remove is this change in tuplesort_free:

-       long            spaceUsed;
+       int64           spaceUsed;

The reason why I think this variable should be 'long' is that we're
using it for this:

     spaceUsed = LogicalTapeSetBlocks(state->tapeset);

and LogicalTapeSetBlocks is defined like this:

     extern long LogicalTapeSetBlocks(LogicalTapeSet *lts);

FWIW the "long" is not introduced by incremental sort - it used to be in
tuplesort_end, the incremental sort patch just moved it to a different
function. It's a bit confusing that tuplesort_updatemax has this:

     int64           spaceUsed;

But I'd argue this is actually wrong, and should be "long" instead. (And
this actually comes from the incremental sort patch, by me.)


FWIW while looking at what the other places calling LogicalTapeSetBlocks
do, and I noticed this:

     uint64          disk_used = LogicalTapeSetBlocks(...);

in the disk-based hashagg patch. So that's a third data type ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v2-0001-Use-INT64_FORMAT-to-print-int64-variables-in-sort-de.patch

Re: Use of "long" in incremental sort code

From

Tomas Vondra

Date:

03 November 2020, 21:42:46

On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote:
>Hi,
>
>I took another look at this, and 99% of the patch (the fixes to sort
>debug messages) seems fine to me.  Attached is the part I plan to get
>committed, including commit message etc.
>

I've pushed this part. Thanks for the patch, Haiying Tang.

>
>The one change I decided to remove is this change in tuplesort_free:
>
>-       long            spaceUsed;
>+       int64           spaceUsed;
>
>The reason why I think this variable should be 'long' is that we're
>using it for this:
>
>    spaceUsed = LogicalTapeSetBlocks(state->tapeset);
>
>and LogicalTapeSetBlocks is defined like this:
>
>    extern long LogicalTapeSetBlocks(LogicalTapeSet *lts);
>
>FWIW the "long" is not introduced by incremental sort - it used to be in
>tuplesort_end, the incremental sort patch just moved it to a different
>function. It's a bit confusing that tuplesort_updatemax has this:
>
>    int64           spaceUsed;
>
>But I'd argue this is actually wrong, and should be "long" instead. (And
>this actually comes from the incremental sort patch, by me.)
>
>
>FWIW while looking at what the other places calling LogicalTapeSetBlocks
>do, and I noticed this:
>
>    uint64          disk_used = LogicalTapeSetBlocks(...);
>
>in the disk-based hashagg patch. So that's a third data type ...
>

IMHO this should simply switch the current int64 variable to long, as it
was before. Not sure about about the hashagg uint64 variable.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Use of "long" in incremental sort code

From

David Rowley

Date:

04 November 2020, 21:58:44

On Wed, 4 Nov 2020 at 10:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> IMHO this should simply switch the current int64 variable to long, as it
> was before. Not sure about about the hashagg uint64 variable.

IMO, we should just get rid of the use of "long" here.   As far as I'm
concerned, using long in the core code at all is just unnecessary and
just increases the chances of having bugs.

How often do people forget that we support a 64-bit platform that has
sizeof(long) == 4?

Can't we use size_t and ssize_t if we really need a processor
word-sized type? And use int64/uint64 when we really want a 64-bit
type.

David

Re: Use of "long" in incremental sort code

From

Tomas Vondra

Date:

04 November 2020, 22:13:41

On 11/4/20 10:58 PM, David Rowley wrote:
> On Wed, 4 Nov 2020 at 10:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> IMHO this should simply switch the current int64 variable to long, as it
>> was before. Not sure about about the hashagg uint64 variable.
> 
> IMO, we should just get rid of the use of "long" here.   As far as I'm
> concerned, using long in the core code at all is just unnecessary and
> just increases the chances of having bugs.
> 
> How often do people forget that we support a 64-bit platform that has
> sizeof(long) == 4?
> 
> Can't we use size_t and ssize_t if we really need a processor
> word-sized type? And use int64/uint64 when we really want a 64-bit
> type.
> 

Perhaps. But I guess it's a bit strange to have function declared as 
returning long, but store the result in int64 everywhere. That was the 
point I was trying to make - it's not just a matter of changing all the 
variables to int64, IMHO.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use of "long" in incremental sort code

From

James Coleman

Date:

04 November 2020, 23:53:32

On Tue, Nov 3, 2020 at 4:42 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote:
> >Hi,
> >
> >I took another look at this, and 99% of the patch (the fixes to sort
> >debug messages) seems fine to me.  Attached is the part I plan to get
> >committed, including commit message etc.
> >
>
> I've pushed this part. Thanks for the patch, Haiying Tang.
>
> >
> >The one change I decided to remove is this change in tuplesort_free:
> >
> >-       long            spaceUsed;
> >+       int64           spaceUsed;
> >
> >The reason why I think this variable should be 'long' is that we're
> >using it for this:
> >
> >    spaceUsed = LogicalTapeSetBlocks(state->tapeset);
> >
> >and LogicalTapeSetBlocks is defined like this:
> >
> >    extern long LogicalTapeSetBlocks(LogicalTapeSet *lts);
> >
> >FWIW the "long" is not introduced by incremental sort - it used to be in
> >tuplesort_end, the incremental sort patch just moved it to a different
> >function. It's a bit confusing that tuplesort_updatemax has this:
> >
> >    int64           spaceUsed;
> >
> >But I'd argue this is actually wrong, and should be "long" instead. (And
> >this actually comes from the incremental sort patch, by me.)
> >
> >
> >FWIW while looking at what the other places calling LogicalTapeSetBlocks
> >do, and I noticed this:
> >
> >    uint64          disk_used = LogicalTapeSetBlocks(...);
> >
> >in the disk-based hashagg patch. So that's a third data type ...
> >
>
> IMHO this should simply switch the current int64 variable to long, as it
> was before. Not sure about about the hashagg uint64 variable.

Is there anything that actually limits tape code to using at most 4GB
on 32-bit systems?

James

Re: Use of "long" in incremental sort code

From

Anastasia Lubennikova

Date:

23 November 2020, 11:11:04

On 05.11.2020 02:53, James Coleman wrote:
> On Tue, Nov 3, 2020 at 4:42 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote:
>>> Hi,
>>>
>>> I took another look at this, and 99% of the patch (the fixes to sort
>>> debug messages) seems fine to me.  Attached is the part I plan to get
>>> committed, including commit message etc.
>>>
>> I've pushed this part. Thanks for the patch, Haiying Tang.
>>
>>> The one change I decided to remove is this change in tuplesort_free:
>>>
>>> -       long            spaceUsed;
>>> +       int64           spaceUsed;
>>>
>>> The reason why I think this variable should be 'long' is that we're
>>> using it for this:
>>>
>>>     spaceUsed = LogicalTapeSetBlocks(state->tapeset);
>>>
>>> and LogicalTapeSetBlocks is defined like this:
>>>
>>>     extern long LogicalTapeSetBlocks(LogicalTapeSet *lts);
>>>
>>> FWIW the "long" is not introduced by incremental sort - it used to be in
>>> tuplesort_end, the incremental sort patch just moved it to a different
>>> function. It's a bit confusing that tuplesort_updatemax has this:
>>>
>>>     int64           spaceUsed;
>>>
>>> But I'd argue this is actually wrong, and should be "long" instead. (And
>>> this actually comes from the incremental sort patch, by me.)
>>>
>>>
>>> FWIW while looking at what the other places calling LogicalTapeSetBlocks
>>> do, and I noticed this:
>>>
>>>     uint64          disk_used = LogicalTapeSetBlocks(...);
>>>
>>> in the disk-based hashagg patch. So that's a third data type ...
>>>
>> IMHO this should simply switch the current int64 variable to long, as it
>> was before. Not sure about about the hashagg uint64 variable.
> Is there anything that actually limits tape code to using at most 4GB
> on 32-bit systems?

At first glance, I haven't found anything that could limit tape code. It 
uses BufFile, which is not limited by the OS file size limit.
Still, If we want to change 'long' in LogicalTapeSetBlocks, we should 
probably also update nBlocksWritten and other variables.

As far as I see, the major part of the patch was committed, so l update 
the status of the CF entry to "Committed". Feel free to create a new 
entry, if you're going to continue working on the remaining issue.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company