Thread: General Bug Report: Files greater than 1 GB are created while sorting
============================================================================ POSTGRESQL BUG REPORT TEMPLATE ============================================================================ Your name : Doug Mitchell Your email address : doug@mitchcraft.com Category : runtime: back-end Severity : serious Summary: Files greater than 1 GB are created while sorting System Configuration -------------------- Operating System : Linux 2.2.5 ELF (RedHat 6.0) PostgreSQL version : 6.5 Compiler used : cc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) Hardware: --------- PII-300 with 384 MB RAM Versions of other tools: ------------------------ $ rpm -q make flex make-3.77-6 flex-2.5.4a-6 -------------------------------------------------------------------------- Problem Description: -------------------- The backed is creating files bigger than 1 GB when sorting and it will break when the file gets to 2 GB. Here are the biggest files: 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:10 bigtable_pkey 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:36 pg_temp.2446.0 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:55 bigtable 1122136 -rw------- 1 postgres postgres 1147937412 Jun 30 21:39 pg_temp2769.3 1148484 -rw------- 1 postgres postgres 1174890288 Jun 30 21:26 pg_temp2769.4 I also have some smaller ".1" files that are the rest of the above files along with everything else you might expect to find in a PG database directory. It's those two last big ones that are troublesome. Table and indicies are segmenting just fine at 1GB, but some sort files just keep growing. I did actually get a back-end error one time when one exceeded 2 GB. Thanks, Doug -------------------------------------------------------------------------- Test Case: ---------- Just do: mydb=> select * into bigtable2 from bigtable order by custno; You might want to decrease RELSEGSZ to see it faster. Mail be back if you can't reproduce it. (and please make the bug report form boxes bigger!) -------------------------------------------------------------------------- Solution: --------- Something is not using the Magnetic Disk Storage Manager, but is writing a temp file out on its own during the sort. --------------------------------------------------------------------------
Re: [BUGS] General Bug Report: Files greater than 1 GB are created while sorting
From
Bruce Momjian
Date:
Nice analysis of a problem. Probably psort code is not using those routines because we never expected the sorts to get over 2Gigs. > The backed is creating files bigger than 1 GB when sorting > and it will break when the file gets to 2 GB. > > Here are the biggest files: > > 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:10 bigtable_pkey > 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:36 pg_temp.2446.0 > 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:55 bigtable > 1122136 -rw------- 1 postgres postgres 1147937412 Jun 30 21:39 pg_temp2769.3 > 1148484 -rw------- 1 postgres postgres 1174890288 Jun 30 21:26 pg_temp2769.4 > > I also have some smaller ".1" files that are the rest of the above > files along with everything else you might expect to find in a > PG database directory. It's those two last big ones that are > troublesome. > > Table and indicies are segmenting just fine at 1GB, but > some sort files just keep growing. I did actually get a > back-end error one time when one exceeded 2 GB. > > Thanks, > Doug > > > -------------------------------------------------------------------------- > > Test Case: > ---------- > Just do: > mydb=> select * into bigtable2 from bigtable order by custno; > > You might want to decrease RELSEGSZ to see it faster. > Mail be back if you can't reproduce it. > (and please make the bug report form boxes bigger!) > > -------------------------------------------------------------------------- > > Solution: > --------- > Something is not using the Magnetic Disk Storage Manager, > but is writing a temp file out on its own during the sort. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [BUGS] General Bug Report: Files greater than 1 GB are created while sorting
From
Bruce Momjian
Date:
I haved added to the TODO list: * Allow creation of sort tables > 1 Gig > Problem Description: > -------------------- > > The backed is creating files bigger than 1 GB when sorting > and it will break when the file gets to 2 GB. > > Here are the biggest files: > > 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:10 bigtable_pkey > 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:36 pg_temp.2446.0 > 1049604 -rw------- 1 postgres postgres 1073741824 Jun 30 19:55 bigtable > 1122136 -rw------- 1 postgres postgres 1147937412 Jun 30 21:39 pg_temp2769.3 > 1148484 -rw------- 1 postgres postgres 1174890288 Jun 30 21:26 pg_temp2769.4 I have renamed these sort temp tables to pg_sorttemp so they will not be confused with actual temp tables. Because people installing 6.5.1 will have stopped their postmasters, that will not cause any problems. You are safe up to 2 gigs, and at that point, the OS will can cause a problem. The new naming should make the cause clearer. Don't know if we can get this done in 6.5.1 because the change to segment these requires some work. Looks like the psort code goes right to fd/*, bypassing the storage manager. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [BUGS] General Bug Report: Files greater than 1 GB are created while sorting
From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes: > I have renamed these sort temp tables to pg_sorttemp so they will not be > confused with actual temp tables. I didn't realize that the names generated for temp tables were so close to those generated for temp files. Changing one or the other does seem like a good idea. But I do not like "pg_sorttemp" because fd.c's temporary-file mechanism is used for more things than just sorting. Hash joins, for example. Can we think of a better name? Alternatively, how about including the user-given name for a temp table into its real name? That would be helpful for debugging, I'm sure. I'm thinking of something like snprintf(newrelname, NAMEDATALEN, "pg_temp.%d.%u.%s", (int) MyProcPid, uniqueId++, userrelname); (relying on snprintf to truncate the user name if too long, here). > You are safe up to 2 gigs, and at that point, the OS will can cause a > problem. The new naming should make the cause clearer. Don't know if > we can get this done in 6.5.1 because the change to segment these > requires some work. Looks like the psort code goes right to fd/*, > bypassing the storage manager. Yes, it will take some thought to figure out how to handle multi-segment temp files without cluttering the code too badly. I think it can be handled inside fd.c, though. Note that under ordinary circumstances, the data being processed by a sort or hash join will be written into several temp files that each get just a fraction of the data; so you would not actually see a problem until you got to several-times-2-Gig total data volume. regards, tom lane
Re: [BUGS] General Bug Report: Files greater than 1 GB are created while sorting
From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > I have renamed these sort temp tables to pg_sorttemp so they will not be > > confused with actual temp tables. > > I didn't realize that the names generated for temp tables were so close > to those generated for temp files. Changing one or the other does seem > like a good idea. But I do not like "pg_sorttemp" because fd.c's > temporary-file mechanism is used for more things than just sorting. > Hash joins, for example. Can we think of a better name? I am open to other names. It is really for joining and sorting. Suggestions? > > Alternatively, how about including the user-given name for a temp table > into its real name? That would be helpful for debugging, I'm sure. > I'm thinking of something like > > snprintf(newrelname, NAMEDATALEN, "pg_temp.%d.%u.%s", > (int) MyProcPid, uniqueId++, userrelname); > > (relying on snprintf to truncate the user name if too long, here). You can only do the truncation of the user-supplied part, not the actual numbers. I guess we could. I wanted it to be uniform, I guess. > Note that under ordinary circumstances, the data being processed by a > sort or hash join will be written into several temp files that each get > just a fraction of the data; so you would not actually see a problem > until you got to several-times-2-Gig total data volume. He already has 1.4 gig sort files. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026