Thread: Should we represent temp files as unsigned long int instead of signed long int type?
Should we represent temp files as unsigned long int instead of signed long int type?
From
Ashutosh Sharma
Date:
Hi All, At present, we represent temp files as a signed long int number. And depending on the system architecture (32 bit or 64 bit), the range of signed long int varies, for example on a 32-bit system it will range from -2,147,483,648 to 2,147,483,647. AFAIU, this will not allow a session to create more than 2 billion temporary files and that is not a small number at all, but still what if we make it an unsigned long int which will allow a session to create 4 billion temporary files if needed. I might be sounding a little stupid here because 2 billion temporary files is like 2000 peta bytes (2 billion * 1GB), considering each temp file is 1GB in size which is not a small data size at all, it is a huge amount of data storage. However, since the variable we use to name temporary files is a static long int (static long tempFileCounter = 0;), there is a possibility that this number will get exhausted soon if the same session is trying to create too many temp files via multiple queries. Just adding few lines of code related to this from postmaster.c: /* * Number of temporary files opened during the current session; * this is used in generation of tempfile names. */ static long tempFileCounter = 0; /* * Generate a tempfile name that should be unique within the current * database instance. */ snprintf(tempfilepath, sizeof(tempfilepath), "%s/%s%d.%ld", tempdirpath, PG_TEMP_FILE_PREFIX, MyProcPid, tempFileCounter++); -- With Regards, Ashutosh Sharma.
Re: Should we represent temp files as unsigned long int instead of signed long int type?
From
Tom Lane
Date:
Ashutosh Sharma <ashu.coek88@gmail.com> writes: > At present, we represent temp files as a signed long int number. And > depending on the system architecture (32 bit or 64 bit), the range of > signed long int varies, for example on a 32-bit system it will range > from -2,147,483,648 to 2,147,483,647. AFAIU, this will not allow a > session to create more than 2 billion temporary files and that is not > a small number at all, but still what if we make it an unsigned long > int which will allow a session to create 4 billion temporary files if > needed. AFAIK, nothing particularly awful will happen if that counter wraps around. Perhaps if you gamed the system really hard, you could cause a collision with a still-extant temp file from the previous cycle, but I seriously doubt that could happen by accident. So I don't think there's anything to worry about here. Maybe we could make that filename pattern %lu not %ld, but minus sign is a perfectly acceptable filename character, so such a change would be cosmetic. regards, tom lane
Re: Should we represent temp files as unsigned long int instead of signed long int type?
From
Robert Haas
Date:
On Wed, Oct 25, 2023 at 1:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > At present, we represent temp files as a signed long int number. And > depending on the system architecture (32 bit or 64 bit), the range of > signed long int varies, for example on a 32-bit system it will range > from -2,147,483,648 to 2,147,483,647. AFAIU, this will not allow a > session to create more than 2 billion temporary files and that is not > a small number at all, but still what if we make it an unsigned long > int which will allow a session to create 4 billion temporary files if > needed. I might be sounding a little stupid here because 2 billion > temporary files is like 2000 peta bytes (2 billion * 1GB), considering > each temp file is 1GB in size which is not a small data size at all, > it is a huge amount of data storage. However, since the variable we > use to name temporary files is a static long int (static long > tempFileCounter = 0;), there is a possibility that this number will > get exhausted soon if the same session is trying to create too many > temp files via multiple queries. I think we use signed integer types in a bunch of places where an unsigned integer type would be straight-up better, and this is one of them. I don't know whether it really matters, though. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Should we represent temp files as unsigned long int instead of signed long int type?
From
Michael Paquier
Date:
On Wed, Oct 25, 2023 at 03:07:39PM -0400, Tom Lane wrote: > AFAIK, nothing particularly awful will happen if that counter wraps > around. Perhaps if you gamed the system really hard, you could cause > a collision with a still-extant temp file from the previous cycle, > but I seriously doubt that could happen by accident. So I don't > think there's anything to worry about here. Maybe we could make > that filename pattern %lu not %ld, but minus sign is a perfectly > acceptable filename character, so such a change would be cosmetic. In the mood of removing long because it may be 4 bytes or 8 bytes depending on the environment, I'd suggest to change it to either int64 or uint64. Not that it matters much for this specific case, but that makes the code more portable. -- Michael
Attachment
Re: Should we represent temp files as unsigned long int instead of signed long int type?
From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes: > In the mood of removing long because it may be 4 bytes or 8 bytes > depending on the environment, I'd suggest to change it to either int64 > or uint64. Not that it matters much for this specific case, but that > makes the code more portable. Then you're going to need a not-so-portable conversion spec in the snprintf call. Not sure it's any improvement. regards, tom lane