Thread: Use of "long" in incremental sort code
Hi, I noticed the incremental sort code makes use of the long datatype a few times, e.g in TuplesortInstrumentation and IncrementalSortGroupInfo. (64-bit windows machines have sizeof(long) == 4). I understand that the values are in kilobytes and it would take 2TB to cause them to wrap. Never-the-less, I think it would be better to choose a better-suited type. work_mem is still limited to 2GB on 64-bit Windows machines, so perhaps there's some argument that it does not matter about fields that related to in-memory stuff, but the on-disk fields are wrong. The in-memory fields likely raise the bar further for fixing the 2GB work_mem limit on Windows. Maybe Size would be better for the in-memory fields and uint64 for the on-disk fields? David
David Rowley <dgrowleyml@gmail.com> writes: > I noticed the incremental sort code makes use of the long datatype a > few times, e.g in TuplesortInstrumentation and > IncrementalSortGroupInfo. (64-bit windows machines have sizeof(long) > == 4). I understand that the values are in kilobytes and it would > take 2TB to cause them to wrap. Never-the-less, I think it would be > better to choose a better-suited type. work_mem is still limited to > 2GB on 64-bit Windows machines, so perhaps there's some argument that > it does not matter about fields that related to in-memory stuff, but > the on-disk fields are wrong. The in-memory fields likely raise the > bar further for fixing the 2GB work_mem limit on Windows. > Maybe Size would be better for the in-memory fields and uint64 for the > on-disk fields? There is a fairly widespread issue that memory-size-related GUCs and suchlike variables are limited to represent sizes that fit in a "long". Although Win64 is the *only* platform where that's an issue, maybe it's worth doing something about. But we shouldn't just fix the sort code, if we do do something. (IOW, I don't agree with doing a fix that doesn't also fix work_mem.) regards, tom lane
On Tue, 30 Jun 2020 at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > There is a fairly widespread issue that memory-size-related GUCs and > suchlike variables are limited to represent sizes that fit in a "long". > Although Win64 is the *only* platform where that's an issue, maybe > it's worth doing something about. But we shouldn't just fix the sort > code, if we do do something. > > (IOW, I don't agree with doing a fix that doesn't also fix work_mem.) I raised it mostly because this new-to-PG13-code is making the problem worse. If we're not going to change the in-memory fields, then shouldn't we at least change the ones for disk space tracking? David
On 2020-06-30 06:24, David Rowley wrote: > On Tue, 30 Jun 2020 at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There is a fairly widespread issue that memory-size-related GUCs and >> suchlike variables are limited to represent sizes that fit in a "long". >> Although Win64 is the *only* platform where that's an issue, maybe >> it's worth doing something about. But we shouldn't just fix the sort >> code, if we do do something. >> >> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.) > > I raised it mostly because this new-to-PG13-code is making the problem worse. Yeah, we recently got rid of a bunch of inappropriate use of long, so it seems reasonable to make this new code follow that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 30, 2020 at 7:21 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-06-30 06:24, David Rowley wrote: > > On Tue, 30 Jun 2020 at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> There is a fairly widespread issue that memory-size-related GUCs and > >> suchlike variables are limited to represent sizes that fit in a "long". > >> Although Win64 is the *only* platform where that's an issue, maybe > >> it's worth doing something about. But we shouldn't just fix the sort > >> code, if we do do something. > >> > >> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.) > > > > I raised it mostly because this new-to-PG13-code is making the problem worse. > > Yeah, we recently got rid of a bunch of inappropriate use of long, so it > seems reasonable to make this new code follow that. I've attached a patch to make this change but with one tweak: I decided to use unint64 for both memory and disk (rather than Size in some cases) since we aggregated across multiple runs and have shared code that deals with both values. James
Attachment
On Mon, Jun 29, 2020 at 9:13 PM David Rowley <dgrowleyml@gmail.com> wrote: > I noticed the incremental sort code makes use of the long datatype a > few times, e.g in TuplesortInstrumentation and > IncrementalSortGroupInfo. I agree that long is terrible, and should generally be avoided. > Maybe Size would be better for the in-memory fields and uint64 for the > on-disk fields? FWIW we have to use int64 for the in-memory tuplesort.c fields. This is because it must be possible for the fields to have negative values in the context of tuplesort. If there is going to be a general rule for in-memory fields, then ISTM that it'll have to be "use int64". logtape.c uses long for on-disk fields. It also relies on negative values, albeit to a fairly limited degree (it uses -1 as a magic value). -- Peter Geoghegan
On Thu, Jul 2, 2020 at 1:36 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Jun 29, 2020 at 9:13 PM David Rowley <dgrowleyml@gmail.com> wrote: > > I noticed the incremental sort code makes use of the long datatype a > > few times, e.g in TuplesortInstrumentation and > > IncrementalSortGroupInfo. > > I agree that long is terrible, and should generally be avoided. > > > Maybe Size would be better for the in-memory fields and uint64 for the > > on-disk fields? > > FWIW we have to use int64 for the in-memory tuplesort.c fields. This > is because it must be possible for the fields to have negative values > in the context of tuplesort. If there is going to be a general rule > for in-memory fields, then ISTM that it'll have to be "use int64". > > logtape.c uses long for on-disk fields. It also relies on negative > values, albeit to a fairly limited degree (it uses -1 as a magic > value). Do you think it's reasonable to use int64 across the board for memory and disk space numbers then? If so, I can update the patch. James
On Thu, Jul 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote: > Do you think it's reasonable to use int64 across the board for memory > and disk space numbers then? If so, I can update the patch. Using int64 as a replacement for long is the safest general strategy, and so ISTM that it might be worth doing that even in cases where it isn't clearly necessary. After all, any code that uses long must have been written with the assumption that that was the same thing as int64, at least on most platforms. There is nothing wrong with using Size/size_t, and doing so is often slightly clearer. But it's no drop-in replacement for long. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Jul 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote: >> Do you think it's reasonable to use int64 across the board for memory >> and disk space numbers then? If so, I can update the patch. > Using int64 as a replacement for long is the safest general strategy, mumble ssize_t mumble regards, tom lane
On Thu, Jul 2, 2020 at 12:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > mumble ssize_t mumble That's from POSIX, though. I imagine MSVC won't be happy (surprise!). -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Jul 2, 2020 at 12:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> mumble ssize_t mumble > That's from POSIX, though. I imagine MSVC won't be happy (surprise!). We've got quite a few uses of it already, so apparently it's fine. regards, tom lane
On Thu, Jul 2, 2020 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Geoghegan <pg@bowt.ie> writes: > > On Thu, Jul 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote: > >> Do you think it's reasonable to use int64 across the board for memory > >> and disk space numbers then? If so, I can update the patch. > > > Using int64 as a replacement for long is the safest general strategy, > > mumble ssize_t mumble But wouldn't that mean we'd get int on 32-bit systems, and since we're accumulating data we could go over that value in both memory and disk? My assumption is that it's preferable to have the "this run value" and the "total used across multiple runs" and both of those for disk and memory to be the same. In that case it seems we want to guarantee 64-bits. Patch using int64 attached. James
Attachment
On Thu, Jul 2, 2020 at 12:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > That's from POSIX, though. I imagine MSVC won't be happy (surprise!). > > We've got quite a few uses of it already, so apparently it's fine. Oh, looks like we have a compatibility hack for MSVC within win32_port.h, where ssize_t is typedef'd to __int64. I didn't realize that it was okay to use ssize_t. -- Peter Geoghegan
On Thu, Jul 2, 2020 at 12:47 PM James Coleman <jtc331@gmail.com> wrote: > But wouldn't that mean we'd get int on 32-bit systems, and since we're > accumulating data we could go over that value in both memory and disk? > > My assumption is that it's preferable to have the "this run value" and > the "total used across multiple runs" and both of those for disk and > memory to be the same. In that case it seems we want to guarantee > 64-bits. I agree. There seems to be little reason to accommodate platform level conventions, beyond making sure that everything works on less popular or obsolete platforms. I suppose that it's a little idiosyncratic to use int64 like this. But it makes sense, and isn't nearly as ugly as the long thing, so I don't think that it should really matter. -- Peter Geoghegan
James Coleman <jtc331@gmail.com> writes: > On Thu, Jul 2, 2020 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> mumble ssize_t mumble > But wouldn't that mean we'd get int on 32-bit systems, and since we're > accumulating data we could go over that value in both memory and disk? Certainly, a number that's meant to represent the amount of data *on disk* shouldn't use ssize_t. But I think it's appropriate if you want to represent in-memory quantities while also allowing negative values. I guess if you're expecting in-memory sizes exceeding 2GB, you might worry that ssize_t could overflow. I'm dubious that a 32-bit machine could get to that, though, seeing that it's going to have other demands on its address space. > My assumption is that it's preferable to have the "this run value" and > the "total used across multiple runs" and both of those for disk and > memory to be the same. In that case it seems we want to guarantee > 64-bits. If you're not going to distinguish in-memory from not-in-memory, agreed. regards, tom lane
On Fri, 3 Jul 2020 at 07:47, James Coleman <jtc331@gmail.com> wrote: > Patch using int64 attached. I added this to the open items list for PG13. David
On Thu, Jul 30, 2020 at 10:12 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 3 Jul 2020 at 07:47, James Coleman <jtc331@gmail.com> wrote: > > Patch using int64 attached. > > I added this to the open items list for PG13. > > David I'd previously attached a patch [1], and there seemed to be agreement it was reasonable (lightly so, but I also didn't see any disagreement); would someone be able to either commit the change or provide some additional feedback? Thanks, James [1]: https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com
On Sat, 1 Aug 2020 at 02:02, James Coleman <jtc331@gmail.com> wrote: > I'd previously attached a patch [1], and there seemed to be agreement > it was reasonable (lightly so, but I also didn't see any > disagreement); would someone be able to either commit the change or > provide some additional feedback? It looks fine to me. Pushed. David > [1]: https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com
Hi Found one more place needed to be changed(long -> int64). Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG ) And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below. Obviously, the ">=" is meaningless, right? - SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples); + SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples); Please take a check at the attached patch file. Previous disscution: https://www.postgresql.org/message-id/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com Best regards Tang
Attachment
Hi >Found one more place needed to be changed(long -> int64). > >Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG ) > >And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below. >Obviously, the ">=" is meaningless, right? > >And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below. >Obviously, the ">=" is meaningless, right? > >- SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples); >+ SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples); > >Please take a check at the attached patch file. I have added it to commit fest. https://commitfest.postgresql.org/30/2772/ Best regards Tang -----Original Message----- From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com> Sent: Monday, October 19, 2020 12:57 PM To: David Rowley <dgrowleyml@gmail.com>; James Coleman <jtc331@gmail.com> Cc: pgsql-hackers@postgresql.org Subject: RE: Use of "long" in incremental sort code Hi Found one more place needed to be changed(long -> int64). Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG ) And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below. Obviously, the ">=" is meaningless, right? - SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples); + SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples); Please take a check at the attached patch file. Previous disscution: https://www.postgresql.org/message-id/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com Best regards Tang
On Wed, Oct 21, 2020 at 06:06:52AM +0000, Tang, Haiying wrote: >Hi > >>Found one more place needed to be changed(long -> int64). >> >>Also changed the output for int64 data(Debug mode on & define EXEC_SORTDEBUG ) >> >>And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below. >>Obviously, the ">=" is meaningless, right? >> >>And, maybe there's a typo in " src\backend\executor\nodeIncrementalSort.c" as below. >>Obviously, the ">=" is meaningless, right? >> >>- SO1_printf("Sorting presorted prefix tuplesort with >= %ld tuples\n", nTuples); >>+ SO1_printf("Sorting presorted prefix tuplesort with %ld tuples\n", nTuples); >> >>Please take a check at the attached patch file. > >I have added it to commit fest. >https://commitfest.postgresql.org/30/2772/ > Thanks, the changes seem fine to me. I'll do a bit more review and get it pushed. regards Tomas
Hi, I took another look at this, and 99% of the patch (the fixes to sort debug messages) seems fine to me. Attached is the part I plan to get committed, including commit message etc. The one change I decided to remove is this change in tuplesort_free: - long spaceUsed; + int64 spaceUsed; The reason why I think this variable should be 'long' is that we're using it for this: spaceUsed = LogicalTapeSetBlocks(state->tapeset); and LogicalTapeSetBlocks is defined like this: extern long LogicalTapeSetBlocks(LogicalTapeSet *lts); FWIW the "long" is not introduced by incremental sort - it used to be in tuplesort_end, the incremental sort patch just moved it to a different function. It's a bit confusing that tuplesort_updatemax has this: int64 spaceUsed; But I'd argue this is actually wrong, and should be "long" instead. (And this actually comes from the incremental sort patch, by me.) FWIW while looking at what the other places calling LogicalTapeSetBlocks do, and I noticed this: uint64 disk_used = LogicalTapeSetBlocks(...); in the disk-based hashagg patch. So that's a third data type ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote: >Hi, > >I took another look at this, and 99% of the patch (the fixes to sort >debug messages) seems fine to me. Attached is the part I plan to get >committed, including commit message etc. > I've pushed this part. Thanks for the patch, Haiying Tang. > >The one change I decided to remove is this change in tuplesort_free: > >- long spaceUsed; >+ int64 spaceUsed; > >The reason why I think this variable should be 'long' is that we're >using it for this: > > spaceUsed = LogicalTapeSetBlocks(state->tapeset); > >and LogicalTapeSetBlocks is defined like this: > > extern long LogicalTapeSetBlocks(LogicalTapeSet *lts); > >FWIW the "long" is not introduced by incremental sort - it used to be in >tuplesort_end, the incremental sort patch just moved it to a different >function. It's a bit confusing that tuplesort_updatemax has this: > > int64 spaceUsed; > >But I'd argue this is actually wrong, and should be "long" instead. (And >this actually comes from the incremental sort patch, by me.) > > >FWIW while looking at what the other places calling LogicalTapeSetBlocks >do, and I noticed this: > > uint64 disk_used = LogicalTapeSetBlocks(...); > >in the disk-based hashagg patch. So that's a third data type ... > IMHO this should simply switch the current int64 variable to long, as it was before. Not sure about about the hashagg uint64 variable. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 4 Nov 2020 at 10:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > IMHO this should simply switch the current int64 variable to long, as it > was before. Not sure about about the hashagg uint64 variable. IMO, we should just get rid of the use of "long" here. As far as I'm concerned, using long in the core code at all is just unnecessary and just increases the chances of having bugs. How often do people forget that we support a 64-bit platform that has sizeof(long) == 4? Can't we use size_t and ssize_t if we really need a processor word-sized type? And use int64/uint64 when we really want a 64-bit type. David
On 11/4/20 10:58 PM, David Rowley wrote: > On Wed, 4 Nov 2020 at 10:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> IMHO this should simply switch the current int64 variable to long, as it >> was before. Not sure about about the hashagg uint64 variable. > > IMO, we should just get rid of the use of "long" here. As far as I'm > concerned, using long in the core code at all is just unnecessary and > just increases the chances of having bugs. > > How often do people forget that we support a 64-bit platform that has > sizeof(long) == 4? > > Can't we use size_t and ssize_t if we really need a processor > word-sized type? And use int64/uint64 when we really want a 64-bit > type. > Perhaps. But I guess it's a bit strange to have function declared as returning long, but store the result in int64 everywhere. That was the point I was trying to make - it's not just a matter of changing all the variables to int64, IMHO. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 3, 2020 at 4:42 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote: > >Hi, > > > >I took another look at this, and 99% of the patch (the fixes to sort > >debug messages) seems fine to me. Attached is the part I plan to get > >committed, including commit message etc. > > > > I've pushed this part. Thanks for the patch, Haiying Tang. > > > > >The one change I decided to remove is this change in tuplesort_free: > > > >- long spaceUsed; > >+ int64 spaceUsed; > > > >The reason why I think this variable should be 'long' is that we're > >using it for this: > > > > spaceUsed = LogicalTapeSetBlocks(state->tapeset); > > > >and LogicalTapeSetBlocks is defined like this: > > > > extern long LogicalTapeSetBlocks(LogicalTapeSet *lts); > > > >FWIW the "long" is not introduced by incremental sort - it used to be in > >tuplesort_end, the incremental sort patch just moved it to a different > >function. It's a bit confusing that tuplesort_updatemax has this: > > > > int64 spaceUsed; > > > >But I'd argue this is actually wrong, and should be "long" instead. (And > >this actually comes from the incremental sort patch, by me.) > > > > > >FWIW while looking at what the other places calling LogicalTapeSetBlocks > >do, and I noticed this: > > > > uint64 disk_used = LogicalTapeSetBlocks(...); > > > >in the disk-based hashagg patch. So that's a third data type ... > > > > IMHO this should simply switch the current int64 variable to long, as it > was before. Not sure about about the hashagg uint64 variable. Is there anything that actually limits tape code to using at most 4GB on 32-bit systems? James
On 05.11.2020 02:53, James Coleman wrote: > On Tue, Nov 3, 2020 at 4:42 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote: >>> Hi, >>> >>> I took another look at this, and 99% of the patch (the fixes to sort >>> debug messages) seems fine to me. Attached is the part I plan to get >>> committed, including commit message etc. >>> >> I've pushed this part. Thanks for the patch, Haiying Tang. >> >>> The one change I decided to remove is this change in tuplesort_free: >>> >>> - long spaceUsed; >>> + int64 spaceUsed; >>> >>> The reason why I think this variable should be 'long' is that we're >>> using it for this: >>> >>> spaceUsed = LogicalTapeSetBlocks(state->tapeset); >>> >>> and LogicalTapeSetBlocks is defined like this: >>> >>> extern long LogicalTapeSetBlocks(LogicalTapeSet *lts); >>> >>> FWIW the "long" is not introduced by incremental sort - it used to be in >>> tuplesort_end, the incremental sort patch just moved it to a different >>> function. It's a bit confusing that tuplesort_updatemax has this: >>> >>> int64 spaceUsed; >>> >>> But I'd argue this is actually wrong, and should be "long" instead. (And >>> this actually comes from the incremental sort patch, by me.) >>> >>> >>> FWIW while looking at what the other places calling LogicalTapeSetBlocks >>> do, and I noticed this: >>> >>> uint64 disk_used = LogicalTapeSetBlocks(...); >>> >>> in the disk-based hashagg patch. So that's a third data type ... >>> >> IMHO this should simply switch the current int64 variable to long, as it >> was before. Not sure about about the hashagg uint64 variable. > Is there anything that actually limits tape code to using at most 4GB > on 32-bit systems? At first glance, I haven't found anything that could limit tape code. It uses BufFile, which is not limited by the OS file size limit. Still, If we want to change 'long' in LogicalTapeSetBlocks, we should probably also update nBlocksWritten and other variables. As far as I see, the major part of the patch was committed, so l update the status of the CF entry to "Committed". Feel free to create a new entry, if you're going to continue working on the remaining issue. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company