Thread: Fail to create PK or index for large table in Windows
PostgreSQL 11.0, 11.1 OS: Windows 7 x64 RAM: 16GB Database location was created with initdb. Scheme sc. Table contains 600 000 000 rows with structure: table sc.address ( id_address integer not null, base varchar not null, raw bytea not null, key bytea, id_key integer ) Server start command: pg_ctl -D g:\PostgreSQL11\data -l g:\PostgreSQL11\log\log.txt -o "-p 5426" start Bug: >psql -d dbname -p 5426 psql (11.1) dbname =# \set VERBOSITY verbose dbname =# alter table sc.address add primary key (id_address); ERROR: 58P01: could not determine size of temporary file "0" LOCATION: ltsConcatWorkerTapes, d:\pginstaller.auto\postgres.windows-x64\src\backend\utils\sort\logtape.c:439 dbname=# Log file (command "alter..." requested at 2018-11-12 20:21:37): 2018-11-12 20:21:37.640 MSK [6848] ERROR: canceling autovacuum task 2018-11-12 20:21:37.640 MSK [6848] CONTEXT: automatic vacuum of table "dbname.sc.address" 2018-11-12 21:31:05.182 MSK [6672] ERROR: could not determine size of temporary file "0" 2018-11-12 21:31:05.182 MSK [6672] STATEMENT: alter table sc.address add primary key (id_address); 2018-11-12 21:31:05.889 MSK [7008] LOG: could not rmdir directory "base/pgsql_tmp/pgsql_tmp6672.0.sharedfileset": Directory not empty After error the directory "base/pgsql_tmp/pgsql_tmp6672.0.sharedfileset" is empty. In the PostgreSQL 10 with the same database (copied via pg_dump and psql) there is no problems - index/key created successfully. It looks like this bug is disallow to index any large tables under Windows. Best regards, Pavel Oskin
On Tue, Nov 13, 2018 at 10:22 AM Pavel <ospavelmail@gmail.com> wrote: > PostgreSQL 11.0, 11.1 > OS: Windows 7 x64 > RAM: 16GB Thanks for the report. This is the same issue as reported in bug #15460: https://www.postgresql.org/message-id/flat/15460-b6db80de822fa0ad%40postgresql.org We are still trying to figure out what causes this, and there have been no similar reports on Unix. For a workaround, you can disable parallel index building with SET max_parallel_maintenance_workers = 0. > 2018-11-12 21:31:05.182 MSK [6672] ERROR: could not determine size of > temporary file "0" > 2018-11-12 21:31:05.182 MSK [6672] STATEMENT: alter table sc.address > add primary key (id_address); This is the thing we haven't understood yet. > 2018-11-12 21:31:05.889 MSK [7008] LOG: could not rmdir directory > "base/pgsql_tmp/pgsql_tmp6672.0.sharedfileset": Directory not empty > > After error the directory > "base/pgsql_tmp/pgsql_tmp6672.0.sharedfileset" is empty. For the later "could not rmdir directory" error, I am fairly sure I understand what's happening there (see the other bug thread) but that's only a LOG message and an empty directory left behind after an error is raised, it's not the root cause. -- Thomas Munro http://www.enterprisedb.com
Thank you for your work! You can log contents of directory before rmdir. If it empty then the reason should be in multiworkers: looks like Windows disallow to delete directory if it (or files in it) used by somebody else (like other workers) and maybe some workers didn't release it before rmdir. Linux allows as I know. Best regards, Pavel Oskin вт, 13 нояб. 2018 г. в 12:01, Thomas Munro: > > On Tue, Nov 13, 2018 at 10:22 AM Pavel wrote: > > PostgreSQL 11.0, 11.1 > > OS: Windows 7 x64 > > RAM: 16GB > > Thanks for the report. This is the same issue as reported in bug #15460: > > https://www.postgresql.org/message-id/flat/15460-b6db80de822fa0ad%40postgresql.org > > We are still trying to figure out what causes this, and there have > been no similar reports on Unix. For a workaround, you can disable > parallel index building with SET max_parallel_maintenance_workers = 0. > > > 2018-11-12 21:31:05.182 MSK [6672] ERROR: could not determine size of > > temporary file "0" > > 2018-11-12 21:31:05.182 MSK [6672] STATEMENT: alter table sc.address > > add primary key (id_address); > > This is the thing we haven't understood yet. > > > 2018-11-12 21:31:05.889 MSK [7008] LOG: could not rmdir directory > > "base/pgsql_tmp/pgsql_tmp6672.0.sharedfileset": Directory not empty > > > > After error the directory > > "base/pgsql_tmp/pgsql_tmp6672.0.sharedfileset" is empty. > > For the later "could not rmdir directory" error, I am fairly sure I > understand what's happening there (see the other bug thread) but > that's only a LOG message and an empty directory left behind after an > error is raised, it's not the root cause. > > -- > Thomas Munro > http://www.enterprisedb.com
On Tue, Nov 13, 2018 at 10:22 PM Pavel <ospavelmail@gmail.com> wrote: > You can log contents of directory before rmdir. > If it empty then the reason should be in multiworkers: looks like > Windows disallow to delete directory if it (or files in it) used by > somebody else (like other workers) and maybe some workers didn't > release it before rmdir. Linux allows as I know. Yeah, its "unlink" operation seems to be an incomplete illusion. The fix for that is simply to reorder some cleanup operations in our error path. Oh, I think I just understood the root bug. It looks like a stupid "large file" problem, 32 bit off_t being overflowed. I'll keep the discussion off that over on the other thread. I posted a patch on that thread; if you're able to recompile PostgreSQL, it'd be great if you could test it. Otherwise, no worries. -- Thomas Munro http://www.enterprisedb.com
On Tue, Nov 13, 2018 at 10:57 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Oh, I think I just understood the root bug. It looks like a stupid > "large file" problem, 32 bit off_t being overflowed. I'll keep the > discussion off that over on the other thread. I posted a patch on > that thread; if you're able to recompile PostgreSQL, it'd be great if > you could test it. Otherwise, no worries. Pavel offered privately to test the change if I could supply a build. I tried asking AppVeyor to build the current master branch as of a few minutes ago and consider the "Release" directory to be an "artifact", and amazingly it spat out a .zip file available for 6 months: https://ci.appveyor.com/project/macdice/postgres/builds/20341645/artifacts I'm not sure how easy this will be to work with if you're used to one of those installers. Hopefully all the libraries needed are present and you can just unzip it and run "initdb.exe -D pgdata", "postgres.exe -D pgdata" -- but I have no personal experience of PostgreSQL on Windows so I'm guessing here. Pavel, if you have time to try testing the bug fix that was committed, that'd be great. If it goes well, perhaps we could consider making bleeding edge builds available all the time. -- Thomas Munro http://www.enterprisedb.com
On Fri, Nov 16, 2018 at 2:10 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > <thomas.munro@enterprisedb.com> wrote: > https://ci.appveyor.com/project/macdice/postgres/builds/20341645/artifacts > > I'm not sure how easy this will be to work with if you're used to one > of those installers. Hopefully all the libraries needed are present > and you can just unzip it and run "initdb.exe -D pgdata", > "postgres.exe -D pgdata" -- but I have no personal experience of > PostgreSQL on Windows so I'm guessing here. Pavel, if you have time > to try testing the bug fix that was committed, that'd be great. If it > goes well, perhaps we could consider making bleeding edge builds > available all the time. A couple of people wrote to me off-list to say they'd like to test the 11 stable branch, not the master branch, and would like a build. Thanks for testing! Presumably they might want to test against their existing pgdata where they've seen the problem. Fair enough -- but please do use a *copy* of your pgdata directory, not your production system, if it's a database you care about! These builds are for testing only, though the the stable 11 branch offers a preview of what will probably ship in 11.2. Here's the REL_11_STABLE branch built for Windows as of a couple minutes ago: https://ci.appveyor.com/project/macdice/postgres/builds/20412803/artifacts -- Thomas Munro http://www.enterprisedb.com
On Tue, Nov 20, 2018 at 8:51 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Here's the REL_11_STABLE branch built for Windows as of a couple minutes ago: > > https://ci.appveyor.com/project/macdice/postgres/builds/20412803/artifacts I received an off-list confirmation that a large CREATE INDEX could complete normally on a database that previously showed the error, using that snapshot of 11 stable. I'm really pleased with this experiment: it seems there are plenty of people willing to test on Windows if we can supply builds, in particular people who've run into a bug. This makes me think that I should figure out how to get cfbot to spit out a builds for all commitfest entries so that proposed fixes and features are automatically available in a format usable by anyone with Windows and no special developer tools. The only problem is the space requirement; if I just turn that on naively it'll eat 2-3TB of appveyor's archive space and they might hate me, so I'll need to come up with a way to push the zip files to external storage and keep just the latest one, and make sure it's clear which master commit + patches went into the build. -- Thomas Munro http://www.enterprisedb.com