Thread: BUG #15460: Error while creating index or constraint
The following bug has been logged on the website: Bug reference: 15460 Logged by: doskabouter Email address: paul.vanderlinden@mapcreator.eu PostgreSQL version: 11.0 Operating system: Windows Server 2008 R2 Description: On large tables I get an error on: CREATE INDEX nx_tablename ON tablename USING btree (col1); col1 being an integer and ALTER TABLE ONLY tablename ADD CONSTRAINT pk_tablename PRIMARY KEY (col1, col2); col1 and col2 both integer First noticed that with pg_restore (from a pg_dump from PG10) pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 14005; 1259 546886907 INDEX nx_tablename postgres pg_restore: [archiver (db)] could not execute query: ERROR: could not determine size of temporary file "0" Command was: CREATE INDEX nx_tablename ON tablename USING btree (col1); After the restore finished (with that error) I tried running that same query in pgadmin. There I got: ********** Error ********** ERROR: could not determine size of temporary file "0" SQL state: 58P01 table definition: CREATE TABLE tablename ( col1 bigint NOT NULL, col2 bigint NOT NULL, col3 bigint NOT NULL, col4 bigint NOT NULL, col5 bigint, col6 integer NOT NULL, ) I have several comparable tables (same columns). Errors occur for tables with 386726424 or 556071229 records for other smaller tables (largest has 163183465 records) there's no error postgres/postgis versions: "POSTGIS="2.5.0 r16836" [EXTENSION] PGSQL="110" GEOS="3.7.0-CAPI-1.11.0 3.7.0" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.2.4, released 2018/03/19" LIBXML="2.7.8" LIBJSON="0.12" LIBPROTOBUF="1.2.1" RASTER" "PostgreSQL 11.0, compiled by Visual C++ build 1914, 64-bit" running on a 28-core, 75GB Windows server 2008 R2 64-bit machine postgres.conf: All parallel related options are at default
On Fri, Oct 26, 2018 at 10:32 AM PG Bug reporting form <noreply@postgresql.org> wrote: > On large tables I get an error on: > > CREATE INDEX nx_tablename ON tablename USING btree (col1); col1 being an > integer > and > ALTER TABLE ONLY tablename ADD CONSTRAINT pk_tablename PRIMARY KEY (col1, > col2); col1 and col2 both integer > > First noticed that with pg_restore (from a pg_dump from PG10) > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 14005; 1259 546886907 INDEX > nx_tablename postgres > pg_restore: [archiver (db)] could not execute query: ERROR: could not > determine > size of temporary file "0" > Command was: CREATE INDEX nx_tablename ON tablename USING btree > (col1); I initially suspected that this was caused by commit 445e31bdc74. This commit made the parallel CREATE INDEX leader call BufFileSize(), rather than having each worker call it themselves. It is where the error message from Paul actually originated from. The commit also changed BufFileSize() so that it would lseek(), rather than just passing filesize metadata through shared memory. I'm not sure what the exact problem might be, because I can't see why the BufFileSize() thing would fail, even on Windows. I think that using lseek() + SEEK_END to get the size of a file may only work on POSIX, and yet I can see the same pattern in much older code (e.g., _mdnblocks()). I don't have a Windows machine, and the spec seems rather hard to read here. I have a strong suspicion that going back to passing the size through shared memory (i.e. partially reverting 445e31bdc74) would make the problem go away, but I won't do that until I actually understand what's going on. -- Peter Geoghegan
On Mon, Oct 29, 2018 at 11:41 AM Peter Geoghegan <pg@bowt.ie> wrote: > I have a strong suspicion that going back to passing the size through > shared memory (i.e. partially reverting 445e31bdc74) would make the > problem go away, but I won't do that until I actually understand > what's going on. BufFileOpenShared()/SharedFileSetOpen() is allowed to fail in the event of no file (an ENOENT) within PathNameOpenTemporaryFile(). OTOH, BufFileSize() can throw an error due to the same condition following 445e31bdc74, which is what we see here. If it's okay in the first function, why shouldn't it be okay in the second function, which is called shortly afterwards? That's where Paul's ERRCODE_UNDEFINED_FILE actually comes from. ISTM that only one of those assumptions can be correct. I thought that parallel CREATE INDEX didn't need to rely on this ENOENT-is-okay business, and that that was just there for parallel hash join, which is why I didn't complain when 445e31bdc74 went in. It now seems like logtape.c could correctly rely on ENOENT-is-okay, though. I think that temporary file "0" is almost always the leader-as-worker fileset, which may be relevant. Paul: Does varying whether an index is unique or not change the outcome of any of your failing test cases? Does anyone see a way in which the logtape.c + shared buffile stuff could come to rely on assuming that an ENOENT case relates to an empty worker BufFile? I suppose it's also possible that there is a bug that partially reverting 445e31bdc74 would only mask, though that seems less likely. -- Peter Geoghegan
It happens with both unique and non-unique index. When I disable virusscanner I also get the error, but settin g the max_parallel_workers to 0 does let me create the index -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 13:27 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Cc: Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 11:41 AM Peter Geoghegan <pg@bowt.ie> wrote: > I have a strong suspicion that going back to passing the size through > shared memory (i.e. partially reverting 445e31bdc74) would make the > problem go away, but I won't do that until I actually understand > what's going on. BufFileOpenShared()/SharedFileSetOpen() is allowed to fail in the event of no file (an ENOENT) within PathNameOpenTemporaryFile().OTOH, BufFileSize() can throw an error due to the same condition following 445e31bdc74, which is what we see here. If it's okayin the first function, why shouldn't it be okay in the second function, which is called shortly afterwards? That's wherePaul's ERRCODE_UNDEFINED_FILE actually comes from. ISTM that only one of those assumptions can be correct. I thought that parallel CREATE INDEX didn't need to rely on this ENOENT-is-okay business, and that that was just there forparallel hash join, which is why I didn't complain when 445e31bdc74 went in. It now seems like logtape.c could correctlyrely on ENOENT-is-okay, though. I think that temporary file "0" is almost always the leader-as-worker fileset, whichmay be relevant. Paul: Does varying whether an index is unique or not change the outcome of any of your failing test cases? Does anyone see a way in which the logtape.c + shared buffile stuff could come to rely on assuming that an ENOENT case relatesto an empty worker BufFile? I suppose it's also possible that there is a bug that partially reverting 445e31bdc74would only mask, though that seems less likely. -- Peter Geoghegan
On Mon, Oct 29, 2018 at 12:45 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > It happens with both unique and non-unique index. > When I disable virusscanner I also get the error, but settin g the max_parallel_workers to 0 does let me create the index Thanks for trying that out. This confirms that parallel CREATE INDEX is involved. Is it possible for you to provide me with a dump of the data, or a minimal test case? You can provided it to me off-list if that's easier, possibly using Google drive or something. I realize that the amount of data involved here isn't trivial, so this may be difficult. I'm a little suspicious of the fact that your table is probably aligned to certain boundaries, since it consists only of integers/bigints. Maybe that's just a coincidence, though. Thanks -- Peter Geoghegan
Well I also saw it on uuids and possibly enum. One of the tables is a materialized view with 550161114 records consisting of 2 uuid-colums and 2 enum columns. Total size is 34 GB... -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 13:53 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 12:45 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > It happens with both unique and non-unique index. > When I disable virusscanner I also get the error, but settin g the > max_parallel_workers to 0 does let me create the index Thanks for trying that out. This confirms that parallel CREATE INDEX is involved. Is it possible for you to provide me with a dump of the data, or a minimal test case? You can provided it to me off-listif that's easier, possibly using Google drive or something. I realize that the amount of data involved here isn'ttrivial, so this may be difficult. I'm a little suspicious of the fact that your table is probably aligned to certain boundaries, since it consists only ofintegers/bigints. Maybe that's just a coincidence, though. Thanks -- Peter Geoghegan
On Mon, Oct 29, 2018 at 12:53 PM Peter Geoghegan <pg@bowt.ie> wrote: > Is it possible for you to provide me with a dump of the data, or a > minimal test case? You can provided it to me off-list if that's > easier, possibly using Google drive or something. I realize that the > amount of data involved here isn't trivial, so this may be difficult. It might also be helpful to see if amcheck shows any problems with the indexes you can create without error. You could run: CREATE EXTENSION amcheck; SELECT bt_index_check(index => c.oid, heapallindexed => true), c.relname, c.relpages FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' -- Don't check temp tables, which may be from another session: AND c.relpersistence != 't' -- Function may throw an error when this is omitted: AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY c.relpages DESC; Hopefully this won't find any problems/raise any errors, but if it does then they'll likely be relevant. It might take several minutes to run the query, or maybe even several hour if there is lots of data. You may have to use your own judgement when it comes to determining what's worth checking, and what you don't have time to check (e.g., add something extra to the where clause, so that only some indexes get checked). I'm suggesting this because it isn't very much work, and may shed some light on the problem. Though the likelihood is that it won't find anything wrong. -- Peter Geoghegan
On Mon, Oct 29, 2018 at 1:01 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > Well I also saw it on uuids and possibly enum. > One of the tables is a materialized view with 550161114 records consisting of 2 uuid-colums and 2 enum columns. > Total size is 34 GB... Another thing that could be helpful is if you "set trace_sort = on" within your session/pgAdmin, and showed us log output from the start of the failing CREATE INDEX to the point that it fails. You may or may not want to "set client_min_messages = LOG" while you're at it, since that will send LOG output to the client, which may be more convenient. Thanks -- Peter Geoghegan
Amcheck is still running. Trace_sort is attached -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 14:26 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 1:01 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > Well I also saw it on uuids and possibly enum. > One of the tables is a materialized view with 550161114 records consisting of 2 uuid-colums and 2 enum columns. > Total size is 34 GB... Another thing that could be helpful is if you "set trace_sort = on" within your session/pgAdmin, and showed us log output from the start of the failing CREATE INDEX to the point that it fails.You may or may not want to "set client_min_messages = LOG" while you're at it, since that will send LOG output to theclient, which may be more convenient. Thanks -- Peter Geoghegan
Attachment
On Mon, Oct 29, 2018 at 1:44 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > Amcheck is still running. > Trace_sort is attached This first line looks like it might be interesting: LOG: could not rmdir directory "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty ERROR: could not determine size of temporary file "0" I suppose that this could actually just be a result of the ERROR; the exact order isn't a reliable indicator of the sequence of events across processes (A useful log_line_prefix setting might clear this up if you collect the trace_sort output again). Can you show me the contents of the directory that we fail to rmdir here? Their names, and their size. The directory should remain around until you restart the server, since the usual error cleanup path doesn't successfully finish here. Thanks -- Peter Geoghegan
On Mon, Oct 29, 2018 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > I suppose that this could actually just be a result of the ERROR; the > exact order isn't a reliable indicator of the sequence of events > across processes (A useful log_line_prefix setting might clear this up > if you collect the trace_sort output again). For example, this log_line_prefix will show process PID, and a timestamp: log_line_prefix='%p/%t ' -- Peter Geoghegan
Directory is empty (when I check after the error occurred, no idea yet (will test) if there's something there during indexcreation. And for some reason I'm not able to set the log_line_prefix in my pgAdmin session... ERROR: parameter "log_line_prefix"cannot be changed now -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 15:13 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > I suppose that this could actually just be a result of the ERROR; the > exact order isn't a reliable indicator of the sequence of events > across processes (A useful log_line_prefix setting might clear this up > if you collect the trace_sort output again). For example, this log_line_prefix will show process PID, and a timestamp: log_line_prefix='%p/%t ' -- Peter Geoghegan
On Mon, Oct 29, 2018 at 2:19 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > Directory is empty (when I check after the error occurred, no idea yet (will test) if there's something there during indexcreation. There'll definitely be something during index creation, though the exact directory name will vary per operation. Do you use temp_tablespaces (easy way to find out is to "show temp_tablespaces")? There's one directory per temp tablespace, so maybe you looked in the wrong one? > And for some reason I'm not able to set the log_line_prefix in my pgAdmin session... ERROR: parameter "log_line_prefix"cannot be changed now That needs to be set in postgresql.conf. -- Peter Geoghegan
I do see some files now (0.0, 1.0 and 2.0) in that directory. Not able to restart postgres at the moment because it's also running some other stuff... -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 15:23 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 2:19 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > Directory is empty (when I check after the error occurred, no idea yet (will test) if there's something there during indexcreation. There'll definitely be something during index creation, though the exact directory name will vary per operation. Do you use temp_tablespaces (easy way to find out is to "show temp_tablespaces")? There's one directory per temp tablespace,so maybe you looked in the wrong one? > And for some reason I'm not able to set the log_line_prefix in my > pgAdmin session... ERROR: parameter "log_line_prefix" cannot be > changed now That needs to be set in postgresql.conf. -- Peter Geoghegan
Oh, and the bt_index_check just finished without any errors -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 15:23 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 2:19 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > Directory is empty (when I check after the error occurred, no idea yet (will test) if there's something there during indexcreation. There'll definitely be something during index creation, though the exact directory name will vary per operation. Do you use temp_tablespaces (easy way to find out is to "show temp_tablespaces")? There's one directory per temp tablespace,so maybe you looked in the wrong one? > And for some reason I'm not able to set the log_line_prefix in my > pgAdmin session... ERROR: parameter "log_line_prefix" cannot be > changed now That needs to be set in postgresql.conf. -- Peter Geoghegan
On Mon, Oct 29, 2018 at 2:28 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > I do see some files now (0.0, 1.0 and 2.0) in that directory. > Not able to restart postgres at the moment because it's also running some other stuff... No need to restart. Just call pg_reload_conf() as a superuser, once postgresql.conf has the change. That won't be disruptive. -- Peter Geoghegan
On Mon, Oct 29, 2018 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > This first line looks like it might be interesting: > > LOG: could not rmdir directory > "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty > ERROR: could not determine size of temporary file "0" (Thomas Munro is CC'd here.) > I suppose that this could actually just be a result of the ERROR; the > exact order isn't a reliable indicator of the sequence of events > across processes (A useful log_line_prefix setting might clear this up > if you collect the trace_sort output again). Hmm. So apparently Windows has a habit of setting an ENOTEMPTY errcode when rmdir'ing a directory that somebody merely has a handle to. It could just be that somebody has a Windows Explorer window open -- you still get ENOTEMPTY [1]! Not sure if this is truly relevant to the problem at hand, but it seems worth being aware of. [1] https://github.com/gruntjs/grunt-contrib-clean/issues/66 -- Peter Geoghegan
Hm, the log_line_prefix didn't change a lot in the logs... And what's worse (but completely unrelated): the performance benefits of upgrading to pg11 don't work for me (having a bignumber of heavy geocalculations unioned all into one select still doesn't use more than one worker/core...) -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 15:31 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 2:28 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > I do see some files now (0.0, 1.0 and 2.0) in that directory. > Not able to restart postgres at the moment because it's also running some other stuff... No need to restart. Just call pg_reload_conf() as a superuser, once postgresql.conf has the change. That won't be disruptive. -- Peter Geoghegan
Attachment
On Mon, Oct 29, 2018 at 3:18 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > Hm, the log_line_prefix didn't change a lot in the logs... That's because log_line_prefix only appears in the server log. It'll be necessary to go into the server log to get those extra details. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Oct 26, 2018 at 10:32 AM PG Bug reporting form > <noreply@postgresql.org> wrote: >> pg_restore: [archiver (db)] could not execute query: ERROR: could not >> determine size of temporary file "0" >> Command was: CREATE INDEX nx_tablename ON tablename USING btree (col1); > I'm not sure what the exact problem might be, because I can't see why > the BufFileSize() thing would fail, even on Windows. So there are a couple of things to complain about here with respect to the error message, regardless of the underlying bug: * "0" is not a file name. Reporting it is just about useless, and identifying it as a "file" is a lie. Why aren't we reporting the actual name of the underlying file? (I realize that we might not have easy access to that right here. But now that this has been shown to be reachable in the field, it seems worth doing some extra work to make the error message useful.) * We aren't reporting the errno (no %m in the message). By luck, the errcode_for_file_access() call is enough to let us infer that it was ENOENT in this case, but in other cases that might not be enough. > I think that > using lseek() + SEEK_END to get the size of a file may only work on > POSIX, and yet I can see the same pattern in much older code (e.g., > _mdnblocks()). Yeah, that works fine on Windows AFAIK. I also note that ENOENT isn't an error code that lseek() can deliver, anyway, since it works on an already-open FD. The failure here must be coming from opening the file. I'm a little inclined to suspect that the true cause here is workers not correctly computing the name of this temp file, which is what led me to complain about the error message. Although a weak spot in this theory is that it's not clear why they'd not fail later anyway, unless maybe this particular file never got touched by workers before. > I have a strong suspicion that going back to passing the size through > shared memory (i.e. partially reverting 445e31bdc74) would make the > problem go away, but I won't do that until I actually understand > what's going on. Sounds like papering over the bug ... regards, tom lane
Paul van der Linden <paul.vanderlinden@mapcreator.eu> writes: > Trace_sort is attached > ... > LOG: performsort of 0 done: CPU: user: 250.03 s, system: 42.66 s, elapsed: 294.35 s > LOG: parallel external sort of 0 ended, 462886 disk blocks used: CPU: user: 250.03 s, system: 42.66 s, elapsed: 298.95s > LOG: begin index sort: unique = f, workMem = 2096128, randomAccess = f > LOG: performsort of -1 starting: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > LOG: could not rmdir directory "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty > ERROR: could not determine size of temporary file "0" "performsort of -1"? Seems a bit suspicious. (And my complaint about inadequate messaging holds here as well. It's really unclear in this trace which digit strings are to be read as numbers and which ones are names of files/filesets. At the very least, these messages violate our message style guidelines: "When citing the name of an object, state what kind of object it is.") regards, tom lane
Ok, here's the correct one. -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 16:24 To: Paul van der Linden <paul.vanderlinden@mapcreator.eu> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; Heikki Linnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 3:18 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > > Hm, the log_line_prefix didn't change a lot in the logs... That's because log_line_prefix only appears in the server log. It'll be necessary to go into the server log to get thoseextra details. -- Peter Geoghegan
Attachment
On Mon, Oct 29, 2018 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So there are a couple of things to complain about here with respect > to the error message, regardless of the underlying bug: I agree with you fully. > Yeah, that works fine on Windows AFAIK. I also note that ENOENT > isn't an error code that lseek() can deliver, anyway, since it works > on an already-open FD. The failure here must be coming from opening > the file. Good point. > I'm a little inclined to suspect that the true cause here is workers > not correctly computing the name of this temp file, which is what > led me to complain about the error message. Although a weak spot in > this theory is that it's not clear why they'd not fail later anyway, > unless maybe this particular file never got touched by workers before. There just isn't that much to get right there, though. Another weak spot in that theory is that it seems unlikely that the first complaint we'd hear would happen to be from a Windows user. I think that they're very much in the minority, especially among early adopters. > > I have a strong suspicion that going back to passing the size through > > shared memory (i.e. partially reverting 445e31bdc74) would make the > > problem go away, but I won't do that until I actually understand > > what's going on. > > Sounds like papering over the bug ... I may have been unclear. It would be papering over the bug if I went ahead and did that now. The advantage of getting the file size from shared memory is that it doesn't leave it up to code like BufFileOpenShared() to find everything through readdir() iteration, an approach that might not be totally portable. We'll reliably fail if all BufFile segments cannot be accounted for with the size-in-shared-memory approach, which seems more robust. I wouldn't be surprised if that actually was the correct fix in the end. -- Peter Geoghegan
Yes I did an early adoption, but I more or less regret it because the performance (more parallel queries) that I was hopingfor didn't work out for me. But as a pre: I'm more than happy with finding bugs 😊 -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: maandag 29 oktober 2018 16:44 To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Paul van der Linden <paul.vanderlinden@mapcreator.eu>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; HeikkiLinnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Mon, Oct 29, 2018 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So there are a couple of things to complain about here with respect to > the error message, regardless of the underlying bug: I agree with you fully. > Yeah, that works fine on Windows AFAIK. I also note that ENOENT isn't > an error code that lseek() can deliver, anyway, since it works on an > already-open FD. The failure here must be coming from opening the > file. Good point. > I'm a little inclined to suspect that the true cause here is workers > not correctly computing the name of this temp file, which is what led > me to complain about the error message. Although a weak spot in this > theory is that it's not clear why they'd not fail later anyway, unless > maybe this particular file never got touched by workers before. There just isn't that much to get right there, though. Another weak spot in that theory is that it seems unlikely that thefirst complaint we'd hear would happen to be from a Windows user. I think that they're very much in the minority, especiallyamong early adopters. > > I have a strong suspicion that going back to passing the size > > through shared memory (i.e. partially reverting 445e31bdc74) would > > make the problem go away, but I won't do that until I actually > > understand what's going on. > > Sounds like papering over the bug ... I may have been unclear. It would be papering over the bug if I went ahead and did that now. The advantage of getting the file size from shared memory is that it doesn't leave it up to code like BufFileOpenShared()to find everything through readdir() iteration, an approach that might not be totally portable. We'llreliably fail if all BufFile segments cannot be accounted for with the size-in-shared-memory approach, which seems morerobust. I wouldn't be surprised if that actually was the correct fix in the end. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Oct 29, 2018 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Sounds like papering over the bug ... > I may have been unclear. It would be papering over the bug if I went > ahead and did that now. > The advantage of getting the file size from shared memory is that it > doesn't leave it up to code like BufFileOpenShared() to find > everything through readdir() iteration, an approach that might not be > totally portable. We'll reliably fail if all BufFile segments cannot > be accounted for with the size-in-shared-memory approach, which seems > more robust. I wouldn't be surprised if that actually was the correct > fix in the end. Perhaps. Anyway, I think we agree that we should identify why it's failing before we decide on a fix. regards, tom lane
On Mon, Oct 29, 2018 at 3:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > LOG: performsort of 0 done: CPU: user: 250.03 s, system: 42.66 s, elapsed: 294.35 s > > LOG: parallel external sort of 0 ended, 462886 disk blocks used: CPU: user: 250.03 s, system: 42.66 s, elapsed: 298.95s > > LOG: begin index sort: unique = f, workMem = 2096128, randomAccess = f > > LOG: performsort of -1 starting: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > > LOG: could not rmdir directory "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty > > ERROR: could not determine size of temporary file "0" > > "performsort of -1"? Seems a bit suspicious. This just refers to the leader Tuplesortstate. It isn't suspicious. We follow the convention that worker -1 is the leader within tuplesort.c. trace_sort is a developer option, so this seems fine to me. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Oct 29, 2018 at 3:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> LOG: performsort of 0 done: CPU: user: 250.03 s, system: 42.66 s, elapsed: 294.35 s >>> LOG: parallel external sort of 0 ended, 462886 disk blocks used: CPU: user: 250.03 s, system: 42.66 s, elapsed: 298.95s >>> LOG: begin index sort: unique = f, workMem = 2096128, randomAccess = f >>> LOG: performsort of -1 starting: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s >>> LOG: could not rmdir directory "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty >>> ERROR: could not determine size of temporary file "0" >> "performsort of -1"? Seems a bit suspicious. > This just refers to the leader Tuplesortstate. It isn't suspicious. > We follow the convention that worker -1 is the leader within > tuplesort.c. Hmm. But the sort of "0" has already completed, according to the first couple of log lines I quoted. Why is something still trying to access it? Why is a worker trying to do anything at all with the leader's Tuplesortstate? > trace_sort is a developer option, so this seems fine to me. That's a poor excuse for ignoring the message style guidelines. There are many reasons why even experts could get confused trying to follow badly-written messages. regards, tom lane
On Mon, Oct 29, 2018 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> "performsort of -1"? Seems a bit suspicious. > > > This just refers to the leader Tuplesortstate. It isn't suspicious. > > We follow the convention that worker -1 is the leader within > > tuplesort.c. > > Hmm. But the sort of "0" has already completed, according to the first > couple of log lines I quoted. Why is something still trying to access > it? Why is a worker trying to do anything at all with the leader's > Tuplesortstate? This is almost certainly because the parallel infrastructure generally doesn't guarantee that log output will be in order. Even if something bizarre took place with the temporary files, there is no way that the worker number in trace_sort would change within a worker. It's an immutable field within Tuplesortstate, initialized once. > > trace_sort is a developer option, so this seems fine to me. > > That's a poor excuse for ignoring the message style guidelines. > There are many reasons why even experts could get confused trying > to follow badly-written messages. I'll change the trace_sort messages to refer to "worker n" -- that should bring the trace_sort output in line with our style guidelines. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Oct 29, 2018 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That's a poor excuse for ignoring the message style guidelines. >> There are many reasons why even experts could get confused trying >> to follow badly-written messages. > I'll change the trace_sort messages to refer to "worker n" -- that > should bring the trace_sort output in line with our style guidelines. WFM. regards, tom lane
On Tue, Oct 30, 2018 at 4:15 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Oct 29, 2018 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > > This first line looks like it might be interesting: > > > > LOG: could not rmdir directory > > "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty > > ERROR: could not determine size of temporary file "0" > > (Thomas Munro is CC'd here.) > > > I suppose that this could actually just be a result of the ERROR; the > > exact order isn't a reliable indicator of the sequence of events > > across processes (A useful log_line_prefix setting might clear this up > > if you collect the trace_sort output again). > > Hmm. So apparently Windows has a habit of setting an ENOTEMPTY errcode > when rmdir'ing a directory that somebody merely has a handle to. It > could just be that somebody has a Windows Explorer window open -- you > still get ENOTEMPTY [1]! Not sure if this is truly relevant to the > problem at hand, but it seems worth being aware of. We only try to remove the directory after removing everything in it, so that does seem to require a pretty strange explanation like that. I also wondered if a worker having a handle to an unlinked file (as permitted by the FILE_SHARE_DELETE flag we use) inside that directory could produce that effect. Perhaps RemovedDirectoryA[1] would be better than rmdir, since it "... marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed." The documentation for rmdir[2] just says deprecated, and _rmdir[3] mentions ENOTEMPTY, but it says you get EACCES if someone has an open handle to the directory. [1] https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-removedirectorya [2] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/rmdir?view=vs-2017 [3] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/rmdir-wrmdir?view=vs-2017 -- Thomas Munro http://www.enterprisedb.com
On Wed, Oct 31, 2018 at 2:37 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Oct 30, 2018 at 4:15 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Oct 29, 2018 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > This first line looks like it might be interesting: > > > > > > LOG: could not rmdir directory > > > "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not empty > > > ERROR: could not determine size of temporary file "0" > > > > (Thomas Munro is CC'd here.) > > > > > I suppose that this could actually just be a result of the ERROR; the > > > exact order isn't a reliable indicator of the sequence of events > > > across processes (A useful log_line_prefix setting might clear this up > > > if you collect the trace_sort output again). > > > > Hmm. So apparently Windows has a habit of setting an ENOTEMPTY errcode > > when rmdir'ing a directory that somebody merely has a handle to. It > > could just be that somebody has a Windows Explorer window open -- you > > still get ENOTEMPTY [1]! Not sure if this is truly relevant to the > > problem at hand, but it seems worth being aware of. > > We only try to remove the directory after removing everything in it, > so that does seem to require a pretty strange explanation like that. > I also wondered if a worker having a handle to an unlinked file (as > permitted by the FILE_SHARE_DELETE flag we use) inside that directory > could produce that effect. Perhaps RemovedDirectoryA[1] would be > better than rmdir, since it "... marks a directory for deletion on > close. Therefore, the directory is not removed until the last handle > to the directory is closed." The documentation for rmdir[2] just says > deprecated, and _rmdir[3] mentions ENOTEMPTY, but it says you get > EACCES if someone has an open handle to the directory. Is there a Windows developer who would like to experiment with this stuff -- perhaps with a small standalone program that tries to unlink a directory while it has another handle to the directory open -- and make a recommendation? Otherwise I'll try to do this with AppVeyor. While having the directory open in Explorer.exe might seem a bit unusual for a production database, I bet there are other things that periodically go around opening and reading our directories: backup tools and SearchIndexer.exe presumably traverse the whole filesystem from time to time. By my reading, RemovedDirectoryA() is probably the right thing to defend against this, here and any other place we remove directories. -- Thomas Munro http://www.enterprisedb.com
Well, I can test. If you'll provide me with the call (incl flags) that is done on that moment to remove the directory I can see what is neededto make that fail and possibly how to circumvent that -----Original Message----- From: Thomas Munro <thomas.munro@enterprisedb.com> Sent: vrijdag 2 november 2018 1:26 To: Peter Geoghegan <pg@bowt.ie> Cc: Paul van der Linden <paul.vanderlinden@mapcreator.eu>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>; HeikkiLinnakangas <hlinnaka@iki.fi> Subject: Re: BUG #15460: Error while creating index or constraint On Wed, Oct 31, 2018 at 2:37 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Oct 30, 2018 at 4:15 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Oct 29, 2018 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > This first line looks like it might be interesting: > > > > > > LOG: could not rmdir directory > > > "base/pgsql_tmp/pgsql_tmp5088.0.sharedfileset": Directory not > > > empty > > > ERROR: could not determine size of temporary file "0" > > > > (Thomas Munro is CC'd here.) > > > > > I suppose that this could actually just be a result of the ERROR; > > > the exact order isn't a reliable indicator of the sequence of > > > events across processes (A useful log_line_prefix setting might > > > clear this up if you collect the trace_sort output again). > > > > Hmm. So apparently Windows has a habit of setting an ENOTEMPTY > > errcode when rmdir'ing a directory that somebody merely has a handle > > to. It could just be that somebody has a Windows Explorer window > > open -- you still get ENOTEMPTY [1]! Not sure if this is truly > > relevant to the problem at hand, but it seems worth being aware of. > > We only try to remove the directory after removing everything in it, > so that does seem to require a pretty strange explanation like that. > I also wondered if a worker having a handle to an unlinked file (as > permitted by the FILE_SHARE_DELETE flag we use) inside that directory > could produce that effect. Perhaps RemovedDirectoryA[1] would be > better than rmdir, since it "... marks a directory for deletion on > close. Therefore, the directory is not removed until the last handle > to the directory is closed." The documentation for rmdir[2] just says > deprecated, and _rmdir[3] mentions ENOTEMPTY, but it says you get > EACCES if someone has an open handle to the directory. Is there a Windows developer who would like to experiment with this stuff -- perhaps with a small standalone program thattries to unlink a directory while it has another handle to the directory open -- and make a recommendation? OtherwiseI'll try to do this with AppVeyor. While having the directory open in Explorer.exe might seem a bit unusual for a production database, I bet there are otherthings that periodically go around opening and reading our directories: backup tools and SearchIndexer.exe presumablytraverse the whole filesystem from time to time. By my reading, RemovedDirectoryA() is probably the right thingto defend against this, here and any other place we remove directories. -- Thomas Munro http://www.enterprisedb.com
On Fri, Nov 2, 2018 at 9:02 PM Paul van der Linden <paul.vanderlinden@mapcreator.eu> wrote: > Well, I can test. > If you'll provide me with the call (incl flags) that is done on that moment to remove the directory I can see what is neededto make that fail and possibly how to circumvent that Thanks. Actually I think I now see what's going on with the "Directory not empty" error. I experimented a bit on AppVeyor's wonderful free Windows build bot, and finished up with a simple program[1] that spits out: Creating directory my_dir... Creating file my_dir\foo.txt... Unlinking file my_dir\foo.txt... Opening directory my_dir... Removing directory my_dir with rmdir()... Failed, error = 145, errno = 41 (Directory not empty). Although you can unlink files and directories for which there are open handles floating around as long as all handles were opened with FILE_SHARE_DELETE, this shows that you can't unlink directories if they contain[ed] files that have now been unlinked, if someone still has a handle open. In other words, the file isn't really unlinked at all, it's just hidden until all referencing handles are closed, or something like that. Back to PostgreSQL, in the non-error happy path there is no problem, because well behaved clients close handles before detaching from the reference-counted SharedFileSet, so when the last detacher unlinks there can be no handles left. In the error path, however, we rely on resowner.c to close handles, and it does that *after* releasing the DSM segment that triggers unlinking. We discussed this exact topic while working on this stuff[2] and I concluded incorrectly that you get sufficiently Unix-like behaviour if all openers use FILE_SHARE_DELETE. I was apparently right about unlinking the files themselves, but not for unlinking directories that hold them. So, it looks like we may need to reorder the cleanup code in resowner.c, after all. I already had a patch for that, but I'm certainly not comfortable making such a change for the minor release that's about to wrap, since it could have unforeseen consequences and needs more study. Of course none of this explains the root problem that led to the error ('could not determine size of temporary file "0"'), it's just a noisy, temporary directory leaking error path that we should fix. [1] https://github.com/macdice/hello-windows/blob/remove-directory/test.c [2] https://www.postgresql.org/message-id/CAEepm=1Ugp7mNFX-YpSfWr0F_6QA4jzjtavauBcoAAZGd7_tPA@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
On Tue, Oct 30, 2018 at 4:15 AM Peter Geoghegan <pg@bowt.ie> wrote: > > ERROR: could not determine size of temporary file "0" filesize = BufFileSize(file); if (filesize < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not determine size of temporary file \"%s\"", filename))); Ahh. I think we may be overflowing a 32 bit off_t on this platform. -- Thomas Munro http://www.enterprisedb.com
On Tue, Nov 13, 2018 at 10:13 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Oct 30, 2018 at 4:15 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > ERROR: could not determine size of temporary file "0" > > filesize = BufFileSize(file); > if (filesize < 0) > ereport(ERROR, > (errcode_for_file_access(), > errmsg("could not determine > size of temporary file \"%s\"", filename))); > > Ahh. I think we may be overflowing a 32 bit off_t on this platform. I think using pgoff_t would fix the problem on Windows, but on other systems where it maps directly to off_t it might also be 32 bits, so I'm not entirely sure what pgoff_t is for. Perhaps we should just use int64 directly for this? Like in the attached draft patch. I don't have a Windows system to test it on. A separate bug report that came in today[1] has repro steps that could be used to validate it. [1] https://www.postgresql.org/message-id/flat/CAHDGBJP_GsESbTt4P3FZA8kMUKuYxjg57XHF7NRBoKnR%3DCAR-g%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Nov 13, 2018 at 1:50 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I think using pgoff_t would fix the problem on Windows, but on other > systems where it maps directly to off_t it might also be 32 bits, so > I'm not entirely sure what pgoff_t is for. Perhaps we should just use > int64 directly for this? Like in the attached draft patch. I don't > have a Windows system to test it on. A separate bug report that came > in today[1] has repro steps that could be used to validate it. I think that int64 is the way to go. Should I fix up the diagnostic message to address Tom's complaints about that in a separate patch? -- Peter Geoghegan
On Wed, Nov 14, 2018 at 6:50 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Nov 13, 2018 at 1:50 AM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > I think using pgoff_t would fix the problem on Windows, but on other > > systems where it maps directly to off_t it might also be 32 bits, so > > I'm not entirely sure what pgoff_t is for. Perhaps we should just use > > int64 directly for this? Like in the attached draft patch. I don't > > have a Windows system to test it on. A separate bug report that came > > in today[1] has repro steps that could be used to validate it. > > I think that int64 is the way to go. Thanks. So, a practical matter: The change of return type as proposed is an ABI break for 32 bit off_t systems. BufFileSize() is brand new and seems pretty unlikely to be used by extensions so soon. Could we change that? A conservative alternative would be to define a second function BufFileSize64() and use that for parallel CREATE INDEX in REL_11_STABLE, leaving the off_t version as it is, but then fold them back into a single version in master. > Should I fix up the diagnostic message to address Tom's complaints > about that in a separate patch? +1 -- Thomas Munro http://www.enterprisedb.com
On Wed, Nov 14, 2018 at 9:12 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Nov 14, 2018 at 6:50 AM Peter Geoghegan <pg@bowt.ie> wrote: > > I think that int64 is the way to go. Thanks. Pushed to master and 11. If anyone with a Windows system would like to verify that the problem is fixed, I found the following test case by changing BufFileSize() and ltsConcatWorkerTapes() to use int32 instead of off_t so I see the error too: create table t as select generate_series(1, 100000000) i, 'hello world, this is a test' v; create index on t (i, v); I expect that test case to fail on Windows before today's commit. -- Thomas Munro http://www.enterprisedb.com
On Mon, Oct 29, 2018 at 8:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So there are a couple of things to complain about here with respect > to the error message, regardless of the underlying bug: Attached patch fixes the problems with BufFileSize(), while also making similar adjustments to a closely related buffile.c "ereport() + errcode_for_file_access()" in passing -- the ereport() from BufFileOpenShared(). Note that both BufFileOpenShared() and BufFileSize() are used in a closely coordinated way already -- we literally call one immediately after the other from logtape.c. I don't like the current inconsistency in error handling. My approach implies a minor API break in BufFileSize(), but I think that that's okay -- see the draft commit message for an explanation. Beyond what I say there, I think that any third party client code ought to be following the example of tuplestore_rescan(), and seeking to the end of the BufFile directly. BufFileSize() should just be a shared BufFile thing, in case it needs to be changed in the future. -- Peter Geoghegan
Attachment
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Oct 29, 2018 at 8:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So there are a couple of things to complain about here with respect >> to the error message, regardless of the underlying bug: > Attached patch fixes the problems with BufFileSize(), while also > making similar adjustments to a closely related buffile.c "ereport() + > errcode_for_file_access()" in passing -- the ereport() from > BufFileOpenShared(). Note that both BufFileOpenShared() and > BufFileSize() are used in a closely coordinated way already -- we > literally call one immediately after the other from logtape.c. I don't > like the current inconsistency in error handling. LGTM. regards, tom lane
On Wed, Nov 28, 2018 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > LGTM. Pushed. Thanks. -- Peter Geoghegan
On Thu, Nov 29, 2018 at 11:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Nov 28, 2018 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > LGTM. > > Pushed. Thanks. Thanks. So, of the three issues that came up in this thread, we have: 1. off_t overflow: fixed in aa551830. 2. Error reporting could be better: fixed in 1a990b20. 3. Windows can't unlink directories containing files that have been unlinked but are still open. This generates a WARNING and a left-behind empty directory when an error is raised. Here's a draft patch for #3. I can't test it on Windows myself; we'd need to confirm it by hacking code somewhere to raise errors while either running a parallel CREATE INDEX or a batched parallel hash join, perhaps something involving a low probability divide-by-zero, or maybe someone could try reverting commit aa551830 and using the short repro I gave in my previous message upthread. Assuming this can be confirmed to solve the problem, the real question is: is it a bad idea to back-patch code that changes the order of resowner release in this way? Specifically, could someone be upset if we start calling their DSM detach hook after we've released everything else, where previously it was before most other stuff? (Do any extensions use DSM segments, and specifically dsm_detach() hooks?) -- Thomas Munro http://www.enterprisedb.com