Thread: Strange Windows problem, lock_timeout test request
Hi, using the MinGW cross-compiled PostgreSQL binaries from Fedora 18, I get the following error for both 32 and 64-bit compiled executables. listen_addresses = '*' and "trust" authentication was set for both. The firewall was disabled for the tests and the server logs "incomplete startup packet". The 64-bit version was compiled with my lock_timeout patch, the 32-bit was without. 64-bit, connect to localhost: C:\Users\Ákos\Desktop\PG93>bin\psql psql: could not connect to server: Operation would block (0x00002733/10035) Is the server running on host "localhost"(::1) and accepting TCP/IP connections on port 5432? could not connect to server: Operation would block (0x00002733/10035) Is the server running on host "localhost" (127.0.0.1)and accepting TCP/IP connections on port 5432? 64-bit, connect to own IP: C:\Users\Ákos\Desktop\PG93>bin\psql -h 192.168.1.4 psql: could not connect to server: Operation would block (0x00002733/10035) Is the server running on host "192.168.1.4"and accepting TCP/IP connections on port 5432? 32-bit, connect to own IP: C:\Users\Ákos\Desktop\PG93-32>bin\psql -h 192.168.1.4 psql: could not connect to server: Operation would block (0x00002733/10035) Is the server running on host "192.168.1.4"and accepting TCP/IP connections on port 5432? Does it ring a bell for someone? Unfortunately, I won't have time to do anything with my lock_timeout patch for about 3 weeks. Does anyone have a little spare time to test it on Windows? The patch is here, it still applies to HEAD without rejects or fuzz: http://www.postgresql.org/message-id/506C0854.7090008@cybertec.at Thanks in advance, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On 01/18/2013 05:43 PM, Boszormenyi Zoltan wrote: > Hi, > > using the MinGW cross-compiled PostgreSQL binaries from Fedora 18, > I get the following error for both 32 and 64-bit compiled executables. > listen_addresses = '*' and "trust" authentication was set for both. > The firewall was disabled for the tests and the server logs > "incomplete startup packet". > The 64-bit version was compiled with my lock_timeout patch, the 32-bit > was without. > > 64-bit, connect to localhost: > > C:\Users\Ákos\Desktop\PG93>bin\psql > psql: could not connect to server: Operation would block > (0x00002733/10035) > Is the server running on host "localhost" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Operation would block (0x00002733/10035) > Is the server running on host "localhost" (127.0.0.1) and > accepting > TCP/IP connections on port 5432? > > 64-bit, connect to own IP: > > C:\Users\Ákos\Desktop\PG93>bin\psql -h 192.168.1.4 > psql: could not connect to server: Operation would block > (0x00002733/10035) > Is the server running on host "192.168.1.4" and accepting > TCP/IP connections on port 5432? > > 32-bit, connect to own IP: > > C:\Users\Ákos\Desktop\PG93-32>bin\psql -h 192.168.1.4 > psql: could not connect to server: Operation would block > (0x00002733/10035) > Is the server running on host "192.168.1.4" and accepting > TCP/IP connections on port 5432? > > Does it ring a bell for someone? > > Unfortunately, I won't have time to do anything with my lock_timeout > patch > for about 3 weeks. Does anyone have a little spare time to test it on > Windows? > The patch is here, it still applies to HEAD without rejects or fuzz: > http://www.postgresql.org/message-id/506C0854.7090008@cybertec.at Yes it rings a bell. See <http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html> Cross-compiling is not really a supported platform. Why don't you just build natively? This is know to work as shown by the buildfarm animals doing it successfully. cheers andrew
On Saturday, January 19, 2013 4:13 AM Boszormenyi Zoltan wrote: > Hi, > Unfortunately, I won't have time to do anything with my lock_timeout patch > for about 3 weeks. Does anyone have a little spare time to test it on Windows? I shall try to do it, probably next week. Others are also welcome to test the patch. With Regards, Amit Kapila.
2013-01-19 01:13 keltezéssel, Andrew Dunstan írta: > > On 01/18/2013 05:43 PM, Boszormenyi Zoltan wrote: >> Hi, >> >> using the MinGW cross-compiled PostgreSQL binaries from Fedora 18, >> I get the following error for both 32 and 64-bit compiled executables. >> listen_addresses = '*' and "trust" authentication was set for both. >> The firewall was disabled for the tests and the server logs "incomplete startup packet". >> The 64-bit version was compiled with my lock_timeout patch, the 32-bit >> was without. >> >> 64-bit, connect to localhost: >> >> C:\Users\Ákos\Desktop\PG93>bin\psql >> psql: could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "localhost" (::1) and accepting >> TCP/IP connections on port 5432? >> could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "localhost" (127.0.0.1) and accepting >> TCP/IP connections on port 5432? >> >> 64-bit, connect to own IP: >> >> C:\Users\Ákos\Desktop\PG93>bin\psql -h 192.168.1.4 >> psql: could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "192.168.1.4" and accepting >> TCP/IP connections on port 5432? >> >> 32-bit, connect to own IP: >> >> C:\Users\Ákos\Desktop\PG93-32>bin\psql -h 192.168.1.4 >> psql: could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "192.168.1.4" and accepting >> TCP/IP connections on port 5432? >> >> Does it ring a bell for someone? >> >> Unfortunately, I won't have time to do anything with my lock_timeout patch >> for about 3 weeks. Does anyone have a little spare time to test it on Windows? >> The patch is here, it still applies to HEAD without rejects or fuzz: >> http://www.postgresql.org/message-id/506C0854.7090008@cybertec.at > > Yes it rings a bell. See > <http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html> The strange thing is that I have used cross-compiled build during Fedora 16 prime time (using the mingw-w64 test repository) and early Fedora 17 using the mingw32-* and ming64-* GCC packages that were pulled from the above mentioned repository. It was the last time I tested PG on Windows and it worked. I could connect to it both locally and from my Linux PC. > > Cross-compiling is not really a supported platform. Why don't you just build natively? > This is know to work as shown by the buildfarm animals doing it successfully. Because I don't have a mingw setup on Windows. (Sorry.) -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-01-19 06:52 keltezéssel, Amit kapila írta: > On Saturday, January 19, 2013 4:13 AM Boszormenyi Zoltan wrote: >> Hi, > > >> Unfortunately, I won't have time to do anything with my lock_timeout patch >> for about 3 weeks. Does anyone have a little spare time to test it on Windows? > I shall try to do it, probably next week. > Others are also welcome to test the patch. Thanks very much in advance. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-01-19 01:13 keltezéssel, Andrew Dunstan írta: > > On 01/18/2013 05:43 PM, Boszormenyi Zoltan wrote: >> Hi, >> >> using the MinGW cross-compiled PostgreSQL binaries from Fedora 18, >> I get the following error for both 32 and 64-bit compiled executables. >> listen_addresses = '*' and "trust" authentication was set for both. >> The firewall was disabled for the tests and the server logs "incomplete startup packet". >> The 64-bit version was compiled with my lock_timeout patch, the 32-bit >> was without. >> >> 64-bit, connect to localhost: >> >> C:\Users\Ákos\Desktop\PG93>bin\psql >> psql: could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "localhost" (::1) and accepting >> TCP/IP connections on port 5432? >> could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "localhost" (127.0.0.1) and accepting >> TCP/IP connections on port 5432? >> >> 64-bit, connect to own IP: >> >> C:\Users\Ákos\Desktop\PG93>bin\psql -h 192.168.1.4 >> psql: could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "192.168.1.4" and accepting >> TCP/IP connections on port 5432? >> >> 32-bit, connect to own IP: >> >> C:\Users\Ákos\Desktop\PG93-32>bin\psql -h 192.168.1.4 >> psql: could not connect to server: Operation would block (0x00002733/10035) >> Is the server running on host "192.168.1.4" and accepting >> TCP/IP connections on port 5432? >> >> Does it ring a bell for someone? >> >> Unfortunately, I won't have time to do anything with my lock_timeout patch >> for about 3 weeks. Does anyone have a little spare time to test it on Windows? >> The patch is here, it still applies to HEAD without rejects or fuzz: >> http://www.postgresql.org/message-id/506C0854.7090008@cybertec.at > > Yes it rings a bell. See > <http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html> I wanted to add a comment to this blog entry but it wasn't accepted. Here it is: It doesn't work under Wine, see: http://www.winehq.org/pipermail/wine-users/2013-January/107008.html But pg_config works so other PostgreSQL clients can also be built using the cross compiler. > > Cross-compiling is not really a supported platform. Why don't you just build natively? > This is know to work as shown by the buildfarm animals doing it successfully. > > cheers > > andrew > > > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote: > >> >> Cross-compiling is not really a supported platform. Why don't you >> just build natively? This is know to work as shown by the buildfarm >> animals doing it successfully. > > Because I don't have a mingw setup on Windows. (Sorry.) > A long time ago I had a lot of sympathy with this answer, but these days not so much. Getting a working mingw/msys environment sufficient to build a bare bones PostgreSQL from scratch is both cheap and fairly easy. The improvements that mingw has made in its install process, and the presence of cheap or free windows instances in the cloud combine to make this pretty simple. But since it's still slightly involved here is how I constructed one such this morning: * Create an Amazon t1.micro spot instance of Windows_Server-2008-SP2-English-64Bit-Base-2012.12.12 (ami-554ac83c) (currentprice $0.006 / hour) * get the credentials and log in using: o rdesktop -g 80% -u Administrator -p 'password'amazon-hostname * turn off annoying IE security enhancements, and fire up IE * go to <http://sourceforge.net/projects/mingw/files/Installer/mingw-get-inst/> and download latest mingw-get-inst * run this -make sure to select the Msys and the developer toolkit in addition to the C compiler * navigate in explorer or a commandwindow to \Mingw\Msys\1.0 and run msys.bat * run this command to install some useful packages: o mingw-get installmsys-wget msys-rxvt msys-unzip * close that window * open a normal command window and run the following to createan unprivileged user and open an msys window as that user: o net user pgrunner SomePw1234 /add o runas /user:pgrunner"cmd /c \mingw\msys\1.0\msys.bat --rxvt" * in the rxvt window run: o wget http://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz o tar -z -xf postgresql-snapshot.tar.gz o wget "http://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Automated%20Builds/mingw-w64-bin_i686-mingw_20111220.zip/download" o mkdir /mingw64 o cd /mingw64 o unzip ~/mingw-w64-bin_i686-mingw_20111220.zip o cd ~/postgresql-9.3devel o export PATH=/mingw64/bin:$PATH o ./configure --host=x86_64-w64-mingw32 --without-zlib && make && make check + ( make some coffee and do the crossword or read War and Peace - this can takea while) Of course, for something faster you would pick a rather more expensive instance than t1.micro (say, m1.large, which usually has a spot price of $0.03 to $0.06 per hour), and if you're going to do this a lot you'll stash your stuff on an EBS volume that you can remount as needed, or zip it up and put it on S3, and then just pull it and unpack it in one hit from there. And there's barely enough room left on the root file system to do what's above anyway. But you can get the idea from this. Note that we no longer need to look elsewhere for extra tools like flex and bison as we once did - the ones that come with modern Msys should work just fine. If you want more build features (openssl, libxml, libxslt, perl, python etc) then there is extra work to do in getting hold of those. But then cross-compiling for those things might not be easy either. Somebody more adept at automating Amazon than I am should be able to automate most or all of this. Anyway that should be just about enough info for just about any competent developer to get going, even if they have never touched Windows. (No doubt one or two people will want to quibble with what I've done. That's fine - this is a description, not a prescription.) cheers andrew
On 01/19/2013 02:51 AM, Boszormenyi Zoltan wrote: >> Yes it rings a bell. See >> <http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html> > > > I wanted to add a comment to this blog entry but it wasn't accepted. The blog is closed for comments. I have moved to a new blog, and this is just there for archive purposes. > Here it is: > > It doesn't work under Wine, see: > http://www.winehq.org/pipermail/wine-users/2013-January/107008.html > > But pg_config works so other PostgreSQL clients can also be built > using the cross compiler. > If you want to target Wine I think you're totally on your own. cheers andrew
2013-01-20 00:15 keltezéssel, Andrew Dunstan írta: > > On 01/19/2013 02:51 AM, Boszormenyi Zoltan wrote: >>> Yes it rings a bell. See >>> <http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html> >> >> >> I wanted to add a comment to this blog entry but it wasn't accepted. > > > The blog is closed for comments. I have moved to a new blog, and this > is just there for archive purposes. > > >> Here it is: >> >> It doesn't work under Wine, see: >> http://www.winehq.org/pipermail/wine-users/2013-January/107008.html >> >> But pg_config works so other PostgreSQL clients can also be built >> using the cross compiler. >> > > > If you want to target Wine I think you're totally on your own. Yes, I know, it was only an attempt. The user's administrator privilege under Wine is a showstopper. > > cheers > > andrew > > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-01-19 21:15 keltezéssel, Andrew Dunstan írta: > > On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote: >> >>> >>> Cross-compiling is not really a supported platform. Why don't you >>> just build natively? This is know to work as shown by the buildfarm >>> animals doing it successfully. >> >> Because I don't have a mingw setup on Windows. (Sorry.) >> > > A long time ago I had a lot of sympathy with this answer, but these > days not so much. I didn't ask for sympathy. :-) I am just on a totally different project until 9th February and I am also far away from my desk. So I can't even attempt to install Windows+mingw inside Qemu/KVM. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote: >> > > A long time ago I had a lot of sympathy with this answer, but these days not > so much. Getting a working mingw/msys environment sufficient to build a bare > bones PostgreSQL from scratch is both cheap and fairly easy. The > improvements that mingw has made in its install process, and the presence of > cheap or free windows instances in the cloud combine to make this pretty > simple. But since it's still slightly involved here is how I constructed > one such this morning: I've used this description, skipping the Amazon part and putting it directly on my Windows computer, and it worked. Except bin/pg_ctl does not work. It just silently exits without doing anything, so I have to use bin/postgres to start the database (which is what "make check" uses anyway, so not a problem if you just want make check). Is that just me or is that a known problem? I've seen some discussion from 2004, but didn't find a conclusion. What advantages does mingw have over MSVC? Is it just that it is cost-free, or is it easier to use mingw than MSVC for someone used to building on Linux? (mingw certainly does not seem to have the advantage of being fast!) Would you like to put this somewhere on wiki.postgresql.org, or would you mind if I do so? Thanks for the primer, Jeff
On 01/24/2013 01:44 PM, Jeff Janes wrote: > On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote: >> A long time ago I had a lot of sympathy with this answer, but these days not >> so much. Getting a working mingw/msys environment sufficient to build a bare >> bones PostgreSQL from scratch is both cheap and fairly easy. The >> improvements that mingw has made in its install process, and the presence of >> cheap or free windows instances in the cloud combine to make this pretty >> simple. But since it's still slightly involved here is how I constructed >> one such this morning: > I've used this description, skipping the Amazon part and putting it > directly on my Windows computer, and it worked. > > Except bin/pg_ctl does not work. It just silently exits without doing > anything, so I have to use bin/postgres to start the database (which > is what "make check" uses anyway, so not a problem if you just want > make check). Is that just me or is that a known problem? I've seen > some discussion from 2004, but didn't find a conclusion. Did you copy libpq.dll from the lib directory to the bin directory? If not, try that and see if it fixes the problem. > > What advantages does mingw have over MSVC? Is it just that it is > cost-free, or is it easier to use mingw than MSVC for someone used to > building on Linux? (mingw certainly does not seem to have the > advantage of being fast!) See Craig's email today about problems with SDKs and availabilty of > > Would you like to put this somewhere on wiki.postgresql.org, or would > you mind if I do so? > > Thanks for the primer, > > Jeff >
On 01/24/2013 02:41 PM, Andrew Dunstan wrote: > >> >> What advantages does mingw have over MSVC? Is it just that it is >> cost-free, or is it easier to use mingw than MSVC for someone used to >> building on Linux? (mingw certainly does not seem to have the >> advantage of being fast!) > > See Craig's email today about problems with SDKs and availabilty of Not sure what happened there. ... availability of free 64 bit MSVC compilers. Also, some third party libraries are built with the Mingw compilers and it's often best not to switch if you can help it. > >> >> Would you like to put this somewhere on wiki.postgresql.org, or would >> you mind if I do so? Please go for it. cheers andrew
On 01/25/2013 02:44 AM, Jeff Janes wrote: > > What advantages does mingw have over MSVC? Is it just that it is > cost-free, or is it easier to use mingw than MSVC for someone used to > building on Linux? (mingw certainly does not seem to have the > advantage of being fast!) As far as I'm concerned, the only advantages of MinGW are: - Mostly compatible with the autotools/gmake toolchain via msys; and - Not subject to Microsoft's whims regarding zero-cost access to toolchains I don't particularly like autotools and think that using it on Windows is pretty ugly. We need msys anyway because our build process uses bison and flex and at the present time there are quality native Windows ports of these tools, but there's no fundamental reason they couldn't run without msys. The greater appeal is that MinGW means that we're not totally dependent on Microsoft's whims when it comes to free access to the toolchain. They have a spotty history there. There was no free access to Visual Studio compilers until 2004, with the release of vctoolkit2003 (which has now vanished). They then added compilers to the Windows SDK starting in (IIRC) 6.0a, through to 7.1 (Windows 7 and .NET 4.5). Those compilers have been removed from the SDK as of the Windows 8 SDK. The other way to get Microsoft compilers is from Visual Studio. This usually expensive product was released as a free-to-use Express version with VC Express 2005. It was a dog of a thing that required manual registry hacking and/or environment setup and the manual addition of an SDK to make work. VC Express 2008 provided an integrated installer, but also removed some debugging facilities. VC Express 2010 removed more debugging features. Then with VC 2012 Microsoft announced that there would be no 2012 Express edition and that everybody should convert their apps to the new Metro system for which free a toolchain would be available. They went back on that under community pressure and released VC Express 2012, which has many of the debugging features restored and works well - but who knows if it'll see any further updates. All the Visual Studio Express editions require activation to use them, though this activation is free. Once you have an activation code it's valid forever, but there's no guarantee the servers will remain available to generate new codes, so you'd better keep them written down. Microsoft don't maintain old SDKs and VC Express editions much, if at all, so installing and using older SDKs gets more complicated over time. Take VC Express 2010, until recently the newest edition available: - Uninstall any Visual Studio 2010 redists that're installed, taking note of the versions - Install VC Express 2010 - Install VC Express 2010 SP1 - Install VC Express 2010 SP1 Compiler Pack - Download and reinstall any newer redists that you uninstalled, so your apps keep working. .... and it looks like having VC Express 2010 installed might partly break VC 2012; I'm still unsure of exactly what caused that. We can not fix any of these problems, nor can we update the toolsets to run on newer Windows versions. So if we support only Visual Studio, we're extremely dependent on the uncertain future of the free Microsoft toolchain. For that reason, even though it's not a great build environment and on Windows not a great compiler, MinGW support is important. If you notice issues with the MinGW builds, please report them so regression test coverage can be improved. Hardly anyone uses them but like the fire department, one day if you stop supporting them you really regret it. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/25/2013 04:08 AM, Andrew Dunstan wrote: > > On 01/24/2013 02:41 PM, Andrew Dunstan wrote: >> >>> >>> What advantages does mingw have over MSVC? Is it just that it is >>> cost-free, or is it easier to use mingw than MSVC for someone used to >>> building on Linux? (mingw certainly does not seem to have the >>> advantage of being fast!) >> >> See Craig's email today about problems with SDKs and availabilty of >> free 64 bit MSVC compilers. > > Also, some third party libraries are built with the Mingw compilers > and it's often best not to switch if you can help it. That's a trade-off; they're often not available in 64-bit, so you usually land up needing to rebuild them for 64-bit anyway, and the mingw 64 bit toolchain seems to be a bit broken. As for 64-bit MS compilers - at the moment the current MSVC 64-bit compilers are available for free as cross-compilers. The rather bizarre situation is that you have a 64-bit host OS running 32-bit compilers under Wow64, producing 64-bit binaries as output. This seems to be quite transparent to the build process when on a 64-bit host, as the host can execute the cross-compiled binaries directly, so it isn't like a normal cross-compile where you have to differentiate between host- and target- executables. So, while no native 64-bit compilers are available for free as part of Visual Studio Express 2012 (11), it doesn't matter much. The issue is more that it's very much Microsoft's whim whether they release compilers at all and if so, which ones, when and how. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> So, while no native 64-bit compilers are available for free as part of > Visual Studio Express 2012 (11), it doesn't matter much. The issue is > more that it's very much Microsoft's whim whether they release compilers > at all and if so, which ones, when and how. I think I have a pretty vanilla Visual Studio Express 2012 for Desktop and: C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64>.\cl Microsoft (R) C/C++ Optimizing Compiler Version 17.00.51106.1 for x64 Copyright (C) Microsoft Corporation. All rights reserved. usage: cl [ option... ] filename... [ /link linkoption... ] C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64> Am I misunderstanding the discussion here? Isn't that the 64-bit tool suite? Does anyone care if the compiler is a 64 bit binary itself?
On 01/27/2013 12:04 AM, james wrote: >> So, while no native 64-bit compilers are available for free as part of >> Visual Studio Express 2012 (11), it doesn't matter much. The issue is >> more that it's very much Microsoft's whim whether they release compilers >> at all and if so, which ones, when and how. > > I think I have a pretty vanilla Visual Studio Express 2012 for Desktop > and: > > C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64>.\cl > Microsoft (R) C/C++ Optimizing Compiler Version 17.00.51106.1 for x64 > Copyright (C) Microsoft Corporation. All rights reserved. > > usage: cl [ option... ] filename... [ /link linkoption... ] > > C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64> > > Am I misunderstanding the discussion here? Yep, that's a cross-compiler, hence "x86_amd64". As I said, it doesn't actually matter very much on an x64 host as the target arch binaries run fine on the host arch. The only reason you'd need the native 64-bit compiler is if you're compiling huge programs that cause the 32-bit to 64-bit cross-compiler to run out of memory. Not an issue for Pg. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Saturday, January 19, 2013 11:23 AM Amit kapila wrote: >On Saturday, January 19, 2013 4:13 AM Boszormenyi Zoltan wrote: >> Hi, >> >> Unfortunately, I won't have time to do anything with my lock_timeout patch >> for about 3 weeks. Does anyone have a little spare time to test it on Windows? > >I shall try to do it, probably next week. >Others are also welcome to test the patch. I have carried out some windows testing of the lock timeout patch. The extra tests which are carried out on the patch are attached in the mail. Some observations on the patch: 1. Patch needs a rebase as it causing some rejections. 2. regress check failed because the expected ".out" file is not updated properly. Regards, Hari babu.
2013-01-28 15:20 keltezéssel, Hari Babu írta: > On Saturday, January 19, 2013 11:23 AM Amit kapila wrote: >> On Saturday, January 19, 2013 4:13 AM Boszormenyi Zoltan wrote: >>> Hi, >>> >>> Unfortunately, I won't have time to do anything with my lock_timeout > patch >>> for about 3 weeks. Does anyone have a little spare time to test it on > Windows? >> I shall try to do it, probably next week. >> Others are also welcome to test the patch. > I have carried out some windows testing of the lock timeout patch. Thanks very much. It means it didn't crash for you and it waited the expected amount of time as well. > The extra tests which are carried out on the patch are attached in the mail. The patch itself contained regression tests to run by itself and compete with statement_timeout as well, although it waits only 2 seconds instead of 60 as in your test. > Some observations on the patch: > > 1. Patch needs a rebase as it causing some rejections. On a fresh GIT pull, it only caused a reject for the documentation parts of the patch. No other rejects and no fuzz, only line shift warnings were indicated by the patch. Anyway, a refreshed one is attached. > 2. regress check failed because the expected ".out" file is not updated > properly. Which regress check failed? The .out file was updated in the patch for prepared_xacts.sql where the regression tests for lock_timeout were added. Or do you mean the one for the sql file you sent? Thanks, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
<div class="moz-cite-prefix">2013-01-30 15:29 keltezéssel, Zoltán Böszörményi írta:<br /></div><blockquote cite="mid:51092E30.9040405@cybertec.at"type="cite">2013-01-28 15:20 keltezéssel, Hari Babu írta: <br /><blockquote type="cite">OnSaturday, January 19, 2013 11:23 AM Amit kapila wrote: <br /><blockquote type="cite">On Saturday, January 19,2013 4:13 AM Boszormenyi Zoltan wrote: <br /><blockquote type="cite">Hi, <br /><br /> Unfortunately, I won't have timeto do anything with my lock_timeout <br /></blockquote></blockquote> patch <br /><blockquote type="cite"><blockquotetype="cite">for about 3 weeks. Does anyone have a little spare time to test it on <br /></blockquote></blockquote>Windows? <br /><blockquote type="cite">I shall try to do it, probably next week. <br /> Othersare also welcome to test the patch. <br /></blockquote> I have carried out some windows testing of the lock timeoutpatch. <br /></blockquote><br /> Thanks very much. It means it didn't crash for you and <br /> it waited the expectedamount of time as well. <br /><br /><blockquote type="cite">The extra tests which are carried out on the patch areattached in the mail. <br /></blockquote><br /> The patch itself contained regression tests to run by itself <br /> andcompete with statement_timeout as well, although <br /> it waits only 2 seconds instead of 60 as in your test. <br /><br/><blockquote type="cite">Some observations on the patch: <br /><br /> 1. Patch needs a rebase as it causing some rejections.<br /></blockquote><br /> On a fresh GIT pull, it only caused a reject for the documentation <br /> parts of thepatch. No other rejects and no fuzz, only line shift <br /> warnings were indicated by the patch. Anyway, a refreshedone <br /> is attached. <br /><br /><blockquote type="cite">2. regress check failed because the expected ".out"file is not updated <br /> properly. <br /></blockquote><br /> Which regress check failed? The .out file was updatedin the patch <br /> for prepared_xacts.sql where the regression tests for lock_timeout <br /> were added. Or do youmean the one for the sql file you sent? <br /></blockquote><br /> If the failed regression test is indeed the prepared_xacts.sqlthat is<br /> in my patch, can you attach the regression.diff file after the failed<br /> "make check"?Thanks.<br /><br /><blockquote cite="mid:51092E30.9040405@cybertec.at" type="cite"><br /> Thanks, <br /> Zoltán Böszörményi<br /><br /><br /><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre wrap=""> </pre></blockquote><br /><br /><pre class="moz-signature" cols="72">-- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: <a class="moz-txt-link-freetext" href="http://www.postgresql-support.de">http://www.postgresql-support.de</a> <aclass="moz-txt-link-freetext" href="http://www.postgresql.at/">http://www.postgresql.at/</a> </pre>
On Wednesday, January 30, 2013 7:59 PM Zoltán Böszörményi wrote: >>2013-01-28 15:20 keltezéssel, Hari Babu írta: >> 2. regress check failed because the expected ".out" file is not >> updated properly. > >Which regress check failed? The .out file was updated in the patch for prepared_xacts.sql where >the regression tests for lock_timeout were added. Or do you mean the one for the sql file you >sent? During regress test, "prepared_xacts_1.out" expected file used for comparing with the result file. Which is not updated by the patch. Because of this reason the regress check is failing. Regards, Hari babu.
2013-01-30 16:06 keltezéssel, Hari Babu írta: > On Wednesday, January 30, 2013 7:59 PM Zoltán Böszörményi wrote: >>> 2013-01-28 15:20 keltezéssel, Hari Babu írta: >>> 2. regress check failed because the expected ".out" file is not >>> updated properly. >> Which regress check failed? The .out file was updated in the patch for > prepared_xacts.sql where >the regression tests for lock_timeout were added. > Or do you mean the one for the sql file you >sent? > > During regress test, "prepared_xacts_1.out" expected file used for comparing > with the result file. Which is not updated by the patch. Because of this > reason the regress check is failing. I see, so this is a Windows-only change that needs a different. Can you send the resulting prepared_xacts_1.out file so I can integrate its changes into my patch? That way it would be complete. Thanks in advance. > > Regards, > Hari babu. > > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-01-30 17:45 keltezéssel, Zoltán Böszörményi írta: > 2013-01-30 16:06 keltezéssel, Hari Babu írta: >> On Wednesday, January 30, 2013 7:59 PM Zoltán Böszörményi wrote: >>>> 2013-01-28 15:20 keltezéssel, Hari Babu írta: >>>> 2. regress check failed because the expected ".out" file is not >>>> updated properly. >>> Which regress check failed? The .out file was updated in the patch for >> prepared_xacts.sql where >the regression tests for lock_timeout were >> added. >> Or do you mean the one for the sql file you >sent? >> >> During regress test, "prepared_xacts_1.out" expected file used for >> comparing >> with the result file. Which is not updated by the patch. Because of this >> reason the regress check is failing. > > I see, so this is a Windows-only change that needs a different. > Can you send the resulting prepared_xacts_1.out file so I can integrate > its changes into my patch? That way it would be complete. > Thanks in advance. I have found a little time to look into this problem and found a way to make pg_regress use prepared_xacts_1.out. I had to change line 2193 in pg_regress.c from fputs("max_prepared_transactions = 2\n", pg_conf); to fputs("max_prepared_transactions = 0\n", pg_conf); The patch now passed "make check" in both cases. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Zoltán Böszörményi wrote: > I have found a little time to look into this problem and > found a way to make pg_regress use prepared_xacts_1.out. > I had to change line 2193 in pg_regress.c from > > fputs("max_prepared_transactions = 2\n", pg_conf); > > to > > fputs("max_prepared_transactions = 0\n", pg_conf); > > The patch now passed "make check" in both cases. That sounds a lot more difficult than just using "make installcheck" and configure the running server with zero prepared xacts ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2013-01-31 15:22 keltezéssel, Alvaro Herrera írta: > Zoltán Böszörményi wrote: > >> I have found a little time to look into this problem and >> found a way to make pg_regress use prepared_xacts_1.out. >> I had to change line 2193 in pg_regress.c from >> >> fputs("max_prepared_transactions = 2\n", pg_conf); >> >> to >> >> fputs("max_prepared_transactions = 0\n", pg_conf); >> >> The patch now passed "make check" in both cases. > That sounds a lot more difficult than just using "make installcheck" and > configure the running server with zero prepared xacts ... It didn't occur to me to use "make installcheck" for this one. What is strange though is why prepared_xacts_1.out exists at all, since pg_regress.c / make check seems to set max_prepared_transactions on Windows, too. Is there a way to specify such settings in REGRESS_OPTS? Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On 01/31/2013 09:55 AM, Zoltán Böszörményi wrote: > 2013-01-31 15:22 keltezéssel, Alvaro Herrera írta: >> Zoltán Böszörményi wrote: >> >>> I have found a little time to look into this problem and >>> found a way to make pg_regress use prepared_xacts_1.out. >>> I had to change line 2193 in pg_regress.c from >>> >>> fputs("max_prepared_transactions = 2\n", pg_conf); >>> >>> to >>> >>> fputs("max_prepared_transactions = 0\n", pg_conf); >>> >>> The patch now passed "make check" in both cases. >> That sounds a lot more difficult than just using "make installcheck" and >> configure the running server with zero prepared xacts ... > > It didn't occur to me to use "make installcheck" for this one. > > What is strange though is why prepared_xacts_1.out exists > at all, since pg_regress.c / make check seems to set > max_prepared_transactions on Windows, too. > > Is there a way to specify such settings in REGRESS_OPTS? > > You can use the temp_config option to specify a file with non-standard settings that will be appended to the installed postgresql.conf. cheers andrew
Zoltán Böszörményi <zb@cybertec.at> writes: > 2013-01-31 15:22 keltez�ssel, Alvaro Herrera �rta: >> That sounds a lot more difficult than just using "make installcheck" and >> configure the running server with zero prepared xacts ... > It didn't occur to me to use "make installcheck" for this one. > What is strange though is why prepared_xacts_1.out exists > at all, since pg_regress.c / make check seems to set > max_prepared_transactions on Windows, too. Alvaro told you why: so that the tests wouldn't report failure in "make installcheck" against a stock-configuration server. BTW, 99% of the time you can update alternative expected files by applying the same patch to them as you did to the tested version. At least that usually works for me, and it can be a lot easier than arranging to duplicate the environment the alternative file is meant for. regards, tom lane
2013-01-31 16:39 keltezéssel, Tom Lane írta: > Zoltán Böszörményi <zb@cybertec.at> writes: >> 2013-01-31 15:22 keltezéssel, Alvaro Herrera írta: >>> That sounds a lot more difficult than just using "make installcheck" and >>> configure the running server with zero prepared xacts ... >> It didn't occur to me to use "make installcheck" for this one. >> What is strange though is why prepared_xacts_1.out exists >> at all, since pg_regress.c / make check seems to set >> max_prepared_transactions on Windows, too. > Alvaro told you why: so that the tests wouldn't report failure in > "make installcheck" against a stock-configuration server. > > BTW, 99% of the time you can update alternative expected files by > applying the same patch to them as you did to the tested version. > At least that usually works for me, and it can be a lot easier than > arranging to duplicate the environment the alternative file is > meant for. > > regards, tom lane Thanks. A question though: how does "make check" or "make installcheck" chooses between the *.out and its different *_N.out incarnations? I couldn't find traces of prepared_xacts_1.out in any file saying "this is the one to be used in this-and-this" configuration. Does the procedure check against all versions and the least different one is reported? Thanks, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Zoltán Böszörményi <zb@cybertec.at> writes: > Thanks. A question though: how does "make check" or "make installcheck" > chooses between the *.out and its different *_N.out incarnations? > I couldn't find traces of prepared_xacts_1.out in any file saying "this > is the one to be used in this-and-this" configuration. Does the procedure > check against all versions and the least different one is reported? Exactly. This is documented, see the "regression tests" chapter of the SGML manual. regards, tom lane
2013-01-31 19:38 keltezéssel, Tom Lane írta: > Zoltán Böszörményi <zb@cybertec.at> writes: >> Thanks. A question though: how does "make check" or "make installcheck" >> chooses between the *.out and its different *_N.out incarnations? >> I couldn't find traces of prepared_xacts_1.out in any file saying "this >> is the one to be used in this-and-this" configuration. Does the procedure >> check against all versions and the least different one is reported? > Exactly. This is documented, see the "regression tests" chapter of the > SGML manual. > > regards, tom lane Thanks. I tested my patch with installcheck and installcheck-parallel using max_prepared_transactions=0 in the server and it passed that way too. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-01-31 22:12 keltezéssel, Zoltán Böszörményi írta: > 2013-01-31 19:38 keltezéssel, Tom Lane írta: >> Zoltán Böszörményi <zb@cybertec.at> writes: >>> Thanks. A question though: how does "make check" or "make installcheck" >>> chooses between the *.out and its different *_N.out incarnations? >>> I couldn't find traces of prepared_xacts_1.out in any file saying "this >>> is the one to be used in this-and-this" configuration. Does the procedure >>> check against all versions and the least different one is reported? >> Exactly. This is documented, see the "regression tests" chapter of the >> SGML manual. >> >> regards, tom lane > > Thanks. > > I tested my patch with installcheck and installcheck-parallel > using max_prepared_transactions=0 in the server and > it passed that way too. # ping -c 3 craig.ringer PING craig.ringer (192.168.1.80) 56(84) bytes of data.From craig.ringer (192.168.1.80) icmp_seq=1 Destination Host UnreachableFromcraig.ringer (192.168.1.80) icmp_seq=2 Destination Host UnreachableFrom craig.ringer (192.168.1.80) icmp_seq=3Destination Host Unreachable --- craig.ringer ping statistics --- 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 1999ms All details were fixed: - rebased to recent GIT - updated regression test for max_prepared_transactions=0 - tested on Windows (by Hari Babu, thanks!) Why is this patch still in "Waiting on Author" state in the CF app? Thanks in advance, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > Why is this patch still in "Waiting on Author" state in the CF app? If you're the author, you should change it back to "Needs Review" when you've responded to the review. regards, tom lane
2013-02-12 19:35 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> Why is this patch still in "Waiting on Author" state in the CF app? > If you're the author, you should change it back to "Needs Review" when > you've responded to the review. > > regards, tom lane Maybe it's just me, but I can't see any facility (link or button) to do that. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > 2013-02-12 19:35 keltez�ssel, Tom Lane �rta: >> If you're the author, you should change it back to "Needs Review" when >> you've responded to the review. > Maybe it's just me, but I can't see any facility (link or button) to do that. Go to the patch page, click "edit patch", adjust the popdown list for "patch status" ... regards, tom lane
2013-02-12 19:45 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> 2013-02-12 19:35 keltezéssel, Tom Lane írta: >>> If you're the author, you should change it back to "Needs Review" when >>> you've responded to the review. >> Maybe it's just me, but I can't see any facility (link or button) to do that. > Go to the patch page, click "edit patch", adjust the popdown list for > "patch status" ... > > regards, tom lane This is what I am saying, I am logged in and there is no popdown list for the "Patch Status", only text... Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On 2013-02-12 19:51:45 +0100, Boszormenyi Zoltan wrote: > 2013-02-12 19:45 keltezéssel, Tom Lane írta: > >Boszormenyi Zoltan <zb@cybertec.at> writes: > >>2013-02-12 19:35 keltezéssel, Tom Lane írta: > >>>If you're the author, you should change it back to "Needs Review" when > >>>you've responded to the review. > >>Maybe it's just me, but I can't see any facility (link or button) to do that. > >Go to the patch page, click "edit patch", adjust the popdown list for > >"patch status" ... > This is what I am saying, I am logged in and there is no popdown list > for the "Patch Status", only text... Thats strange. I changed it for now just to see if there's any problem, worked fine for me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Boszormenyi Zoltan <zb@cybertec.at> writes: > 2013-02-12 19:45 keltez�ssel, Tom Lane �rta: >> Go to the patch page, click "edit patch", adjust the popdown list for >> "patch status" ... > This is what I am saying, I am logged in and there is no popdown list > for the "Patch Status", only text... That sure sounds like the https://commitfest.postgresql.org/action/patch_view?id=896 page, not the edit-patch page which is https://commitfest.postgresql.org/action/patch_form?id=896 There should be popdown lists for both topic and status ... regards, tom lane
On 02/12/2013 01:51 PM, Boszormenyi Zoltan wrote: > 2013-02-12 19:45 keltezéssel, Tom Lane írta: >> Boszormenyi Zoltan <zb@cybertec.at> writes: >>> 2013-02-12 19:35 keltezéssel, Tom Lane írta: >>>> If you're the author, you should change it back to "Needs Review" when >>>> you've responded to the review. >>> Maybe it's just me, but I can't see any facility (link or button) to >>> do that. >> Go to the patch page, click "edit patch", adjust the popdown list for >> "patch status" ... >> >> regards, tom lane > > This is what I am saying, I am logged in and there is no popdown list > for the "Patch Status", only text... > > I bet you didn't go to "edit patch" like Tom said to. cheers andrew
2013-02-12 19:58 keltezéssel, Andrew Dunstan írta: > > On 02/12/2013 01:51 PM, Boszormenyi Zoltan wrote: >> 2013-02-12 19:45 keltezéssel, Tom Lane írta: >>> Boszormenyi Zoltan <zb@cybertec.at> writes: >>>> 2013-02-12 19:35 keltezéssel, Tom Lane írta: >>>>> If you're the author, you should change it back to "Needs Review" when >>>>> you've responded to the review. >>>> Maybe it's just me, but I can't see any facility (link or button) to do that. >>> Go to the patch page, click "edit patch", adjust the popdown list for >>> "patch status" ... >>> >>> regards, tom lane >> >> This is what I am saying, I am logged in and there is no popdown list >> for the "Patch Status", only text... >> >> > > I bet you didn't go to "edit patch" like Tom said to. Yes, that was the problem. I missed that link. Thanks. Where can I get brown paper bags in bulk? :-D -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Zoltán, * Zoltán Böszörményi (zb@cybertec.at) wrote: > The patch now passed "make check" in both cases. Is v29 the latest version of this patch..? Looking through the patch, I've noticed a couple of things: First off, it's not in context diff format, which is the PG standard for patch submission. Next, the documentation has a few issues: - "Heavy-weight" should really be defined in terms of specific lock types or modes. We don't use the 'heavyweight' termanywhere else in the documentation that I've found. - I'd reword this: Abort any statement that tries to acquire a heavy-weight lock on rows, pages, tables, indices or other objects and the lock(s)has to wait more than the specified number of milliseconds. as: Abort any statement which waits longer than the specified number of milliseconds while attempting to acquire a lock on ... - I don't particularly like "lock_timeout_option", for a couple of reasons. First is simply the name is terrible, but beyondthat, it strikes me that wanting to set both a 'per-lock timeout' and a 'overall waiting-for-locks timeout' at thesame time would be a reasonable use-case. If we're going to have 2 GUCs and we're going to support each of those options,why not just let the user specify values for each? - This is a bit disingenuous: If <literal>NOWAIT</> option is not specified and <varname>lock_timeout</varname> is set and the lock or statement (dependingon <varname>lock_timeout_option</varname>) needs to wait more than the specified value in milliseconds, the commandreports an error after timing out, rather than waiting indefinitely. The SELECT would simply continue to wait until the lock is available. That's a bit more specific than 'indefinitely'. Also,we might add a sentence about statement_timeout as well, if we're going to document what can happen if you don't useNOWAIT with your SELECT-FOR-UPDATE. Should we add documentation to the other commands that wait for locks? - Looks like this was ended mid-thought...: + * Lock a semaphore (decrement count), blocking if count would be < 0 + * until a timeout triggers. Returns true if - Not a big fan of this: + * See notes in PGSemaphoreLock. - I'm not thrilled with the new API for defining the timeouts. Particularly, I believe the more common convention for passingaround arrays of structures is to have an empty array at the end, which avoids having to remember to update the #of entries every time it gets changed. Of course, another option would be to use our existing linked list implementationand its helper macros such as our foreach() construct. - As I've mentioned in other places/times, comments should be about why we're doing something, not what we're doing- thecode tells you that. As such, comments like this really aren't great: /* Assert request is sane */ /* Now re-enable thetimer, if necessary. */ - Do we really need TimestampTzPlusMicroseconds..? In general, I like this feature and a number of things above are pretty small issues. The main questions, imv, are if we really need both 'options', and, if so, how they should work, and the API for defining timers. Thanks, Stephen
2013-02-23 02:55 keltezéssel, Stephen Frost írta: > Zoltán, > > * Zoltán Böszörményi (zb@cybertec.at) wrote: >> The patch now passed "make check" in both cases. > Is v29 the latest version of this patch..? Yes, is was until you came up with your review... ;-) > Looking through the patch, I've noticed a couple of things: > > First off, it's not in context diff format, which is the PG standard for > patch submission. Since moving to GIT, this expectation is obsolete. All PG hackers became comfortable with the unified diff format, see references from the list. A lot of hackers post "git diff" patches which cannot produce context diff, either. > Next, the documentation has a few issues: > > - "Heavy-weight" should really be defined in terms of specific lock > types or modes. We don't use the 'heavyweight' term anywhere else in > the documentation that I've found. That's not strictly true but it's not widely used either: $ find . -type f | xargs grep weight | grep heavy ./monitoring.sgml: <entry>Probe that fires when a request for a heavyweight lock (lmgr lock) ./monitoring.sgml: <entry>Probe that fires when a request for a heavyweight lock (lmgr lock) > > - I'd reword this: > > Abort any statement that tries to acquire a heavy-weight lock on rows, > pages, tables, indices or other objects and the lock(s) has to wait > more than the specified number of milliseconds. > > as: > > Abort any statement which waits longer than the specified number of > milliseconds while attempting to acquire a lock on ... OK. > > - I don't particularly like "lock_timeout_option", for a couple of > reasons. First is simply the name is terrible, but beyond that, it > strikes me that wanting to set both a 'per-lock timeout' and a > 'overall waiting-for-locks timeout' at the same time would be a > reasonable use-case. If we're going to have 2 GUCs and we're going to > support each of those options, why not just let the user specify > values for each? OK, suggest names for the 2 GUCS, please. > - This is a bit disingenuous: > > If <literal>NOWAIT</> option is not specified and > <varname>lock_timeout</varname> is set and the lock or statement > (depending on <varname>lock_timeout_option</varname>) needs to wait > more than the specified value in milliseconds, the command reports > an error after timing out, rather than waiting indefinitely. > > The SELECT would simply continue to wait until the lock is available. > That's a bit more specific than 'indefinitely'. Also, we might add a > sentence about statement_timeout as well, if we're going to document > what can happen if you don't use NOWAIT with your SELECT-FOR-UPDATE. > Should we add documentation to the other commands that wait for locks? > > - Looks like this was ended mid-thought...: > > + * Lock a semaphore (decrement count), blocking if count would be < 0 > + * until a timeout triggers. Returns true if Right. > > - Not a big fan of this: > > + * See notes in PGSemaphoreLock. Why? Copy&paste the *long* comment (a different one in each *_sema.c) or omitting the comments is better? Suggest a better comment, and it will be included. > - I'm not thrilled with the new API for defining the timeouts. > Particularly, I believe the more common convention for passing around > arrays of structures is to have an empty array at the end, which > avoids having to remember to update the # of entries every time it > gets changed. Of course, another option would be to use our existing > linked list implementation and its helper macros such as our > foreach() construct. A linked list might be better, actually I like it. > - As I've mentioned in other places/times, comments should be about why > we're doing something, not what we're doing- the code tells you that. > As such, comments like this really aren't great: > /* Assert request is sane */ > /* Now re-enable the timer, if necessary. */ > > - Do we really need TimestampTzPlusMicroseconds..? Well, my code is the only user for this macro but it's cleaner than explicitly doing #ifdef HAVE_INT64_TIMESTAMP fin_time = timeout_start + delay_ms * (int64)1000; #else fin_time = timeout_start + delay_ms / 1000000.0; #endif > > In general, I like this feature and a number of things above are pretty > small issues. The main questions, imv, are if we really need both > 'options', and, if so, how they should work, and the API for defining > timers. > > Thanks, > > Stephen -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Zoltán, * Boszormenyi Zoltan (zb@cybertec.at) wrote: > 2013-02-23 02:55 keltezéssel, Stephen Frost írta: > >First off, it's not in context diff format, which is the PG standard for > >patch submission. > > Since moving to GIT, this expectation is obsolete. No, it isn't. Patches posted to the list should be in context diff format (and uncompressed unless it's too large) for easier reading. That avoids having to download it, apply it to a git tree and then create the diff ourselves. Indeed, the move to git had impact on this at all. > All PG hackers > became comfortable with the unified diff format, see references > from the list. A lot of hackers post "git diff" patches which cannot > produce context diff, either. It's quite possible to create a context diff from git- indeed, it's documented on the http://wiki.postgresql.org/wiki/Working_with_Git portion of the wiki, specifically to help people understand how to take the changes they've made in their git forks and create context diffs to post to the mailing lists. The preference for context diffs is also still documented at http://wiki.postgresql.org/wiki/Submitting_a_Patch. And, to be honest, I'm not sure how familiar you are with the history of PG in this area, but I've been through all of it from the pre-git CVS days, through the migration to git, to the exclusive use of git. I feel quite confident that the preference for context diffs hasn't changed. > That's not strictly true but it's not widely used either: > > $ find . -type f | xargs grep weight | grep heavy > ./monitoring.sgml: <entry>Probe that fires when a request for a > heavyweight lock (lmgr lock) > ./monitoring.sgml: <entry>Probe that fires when a request for a > heavyweight lock (lmgr lock) Interesting. That didn't show up in the searches that I was doing through the web interface, though it does now. If we're going to use that term, we should define it in the lock documentation. If not, then we should avoid it everywhere. I'm fine with either. > >- I don't particularly like "lock_timeout_option", for a couple of > > reasons. First is simply the name is terrible, but beyond that, it > > strikes me that wanting to set both a 'per-lock timeout' and a > > 'overall waiting-for-locks timeout' at the same time would be a > > reasonable use-case. If we're going to have 2 GUCs and we're going to > > support each of those options, why not just let the user specify > > values for each? > > OK, suggest names for the 2 GUCS, please. lock_timeout_wait and lock_timeout_stmt_wait ? Though I've been really wondering to myself as to if we need both of these options as well as statement_timeout. Perhaps you can explain the use case for each option and how they're distinct from each other? Indeed, the use-case that I'm envisioning is focused around "don't wait more than 'X' for the relation-level locks" which would allow you to distinguish queries that are, most likely anyway, making progress, from those which have been caught behind a lock. That would match your 'per-statement' lock timeout concept, iiuc, though I think it might be more simply implemented. > >- Not a big fan of this: > > > >+ * See notes in PGSemaphoreLock. > > Why? Copy&paste the *long* comment (a different one in each *_sema.c) > or omitting the comments is better? Suggest a better comment, and > it will be included. How about, for starters, there's more than one PGSemaphoreLock? Second, as you'll note in posix_sema.c, it's useful to say more than just "look over there". I'm not suggesting a mass copy/paste, but more than 4 words would be valuable. > >- Do we really need TimestampTzPlusMicroseconds..? > > Well, my code is the only user for this macro but it's cleaner > than explicitly doing To be honest, I was really wondering why TimestampTzPlusMilliseconds isn't sufficient, not suggesting that we litter the code with #ifdef's. Thanks, Stephen
On 02/23/2013 01:15 PM, Boszormenyi Zoltan wrote: >> >> First off, it's not in context diff format, which is the PG standard for >> patch submission. > > > Since moving to GIT, this expectation is obsolete. All PG hackers > became comfortable with the unified diff format, see references > from the list. A lot of hackers post "git diff" patches which cannot > produce context diff, either. I am not aware that project policy has changed in the slightest in this regard. Every unified diff can be turned into a context diff by passing it though "filterdiff --format=context". cheers andrew
* Andrew Dunstan (andrew@dunslane.net) wrote: > Every unified diff can be turned into a context diff by passing it > though "filterdiff --format=context". On that point, annoyingly, it's not accurate for every diff. In particular, when I tried it with the diff for 'v29' of this lock_timeout patch, it only returns the context diff for the first two files: zcat 2-lock_timeout-v29.patch.gz | grep -- '--- postgresql' | wc -l 22 zcat 2-lock_timeout-v29.patch.gz | \ filterdiff --format=context | \ grep '\*\*\* postgresql' | wc -l 2 In the end, I did create a local git branch, commit the patch, then diff it back against master using my context-diff git helper to get something easier to read through. Rather annoying. Thanks, Stephen
2013-02-24 03:23 keltezéssel, Stephen Frost írta: > Zoltán, > > * Boszormenyi Zoltan (zb@cybertec.at) wrote: >> 2013-02-23 02:55 keltezéssel, Stephen Frost írta: >>> First off, it's not in context diff format, which is the PG standard for >>> patch submission. >> Since moving to GIT, this expectation is obsolete. > No, it isn't. Patches posted to the list should be in context diff > format (and uncompressed unless it's too large) for easier reading. > That avoids having to download it, apply it to a git tree and then > create the diff ourselves. Indeed, the move to git had impact on this > at all. I remembered this mail from The Master(tm): http://www.postgresql.org/message-id/21555.1339866293@sss.pgh.pa.us > >> All PG hackers >> became comfortable with the unified diff format, see references >> from the list. A lot of hackers post "git diff" patches which cannot >> produce context diff, either. > It's quite possible to create a context diff from git- indeed, it's > documented on the http://wiki.postgresql.org/wiki/Working_with_Git > portion of the wiki, specifically to help people understand how to take > the changes they've made in their git forks and create context diffs to > post to the mailing lists. The preference for context diffs is also > still documented at http://wiki.postgresql.org/wiki/Submitting_a_Patch. > > And, to be honest, I'm not sure how familiar you are with the history of > PG in this area, but I've been through all of it from the pre-git CVS > days, through the migration to git, to the exclusive use of git. I started with Postgres95, though only using the source tarballs. I started following CVS at around 2003 or so. > I feel > quite confident that the preference for context diffs hasn't changed. With stats for the February, 2013: Name ctx ctx/gitunif/git unif Dimitry Fontaine 40 1 0 Ian Lawrence Barwick 0 1 0 0 Alvaro Herrera 10 0 5 0 Pavel Stehule 9 0 0 0 Heikki Linnakangas 0 0 7 0 Michael Paquier 0 0 6 0 Andres Freund 0 0 5 0 Alexander Law 0 0 1 0 Etsuro Fujita 0 0 4 0 Amit Kapila 4 0 0 0 Kevin Grittner 2 0 3 0 Gurjeet Singh 0 0 2 0 Erik Rijkers 0 0 0 1 Zoltan Boszormenyi 0 0 2 0 Tomas Vondra 0 0 2 0 Bruce Momjian 0 4 0 0 Fujii Masao 1 0 0 0 David Fetter 5 0 0 0 Jonathan Rogers 0 0 1 0 Andrew Dunstan 2 0 0 0 Manlio Perillo 0 0 1 0 Dean Rasheed 0 1 2 0 Jeff Janes 0 2 1 0 Phil Sorber 0 3 2 0 Mark Wong 0 1 0 0 Ivan Lezhnjov IV 0 0 1 0 Kohei Gagai 0 0 2 0 MauMau 1 0 0 0 Tom Lane 0 1 0 0 Sean Chittenden 0 0 2 0 Albe Laurenz 0 1 0 0 Daniel Farina 1 0 0 0 Simon Riggs 0 0 3 0 Peter Eisentraut 0 0 4 0 Plain context diffs: 39 Context diffs generated from git: 14 "git diff" patches: 57 Plain unified diff: 1 Some mails contained more than one patch, these were counted as-is. One patch is one patch. So, more than halfof the recently posted patches come directly from "git diff". The preference has changed. But indeed, plain "diff -u" is cleanly in the minority. I can repost my patch from "git diff", too, to be in the majority. :-P >> That's not strictly true but it's not widely used either: >> >> $ find . -type f | xargs grep weight | grep heavy >> ./monitoring.sgml: <entry>Probe that fires when a request for a >> heavyweight lock (lmgr lock) >> ./monitoring.sgml: <entry>Probe that fires when a request for a >> heavyweight lock (lmgr lock) > Interesting. That didn't show up in the searches that I was doing > through the web interface, though it does now. If we're going to use > that term, we should define it in the lock documentation. If not, then > we should avoid it everywhere. I'm fine with either. Me too, the documentation should be consistent. I will remove the "heavyweight" term. > >>> - I don't particularly like "lock_timeout_option", for a couple of >>> reasons. First is simply the name is terrible, but beyond that, it >>> strikes me that wanting to set both a 'per-lock timeout' and a >>> 'overall waiting-for-locks timeout' at the same time would be a >>> reasonable use-case. If we're going to have 2 GUCs and we're going to >>> support each of those options, why not just let the user specify >>> values for each? >> OK, suggest names for the 2 GUCS, please. > lock_timeout_wait and lock_timeout_stmt_wait ? Hm. How about without the "_wait" suffix? Or lock_timeout vs statement_lock_timeout? > Though I've been really > wondering to myself as to if we need both of these options as well as > statement_timeout. > Perhaps you can explain the use case for each > option and how they're distinct from each other? statement_timeout is the legacy behaviour, it should be kept. It's behaviour is well understood: the statement finishes under the specified time or it throws an error. The problem from the application point of view is that the error can happen while the tuples are being transferred to the client, so the recordset can be cut in half. "lock_timeout_stmt" (or lock_timeout_option = per_statement) is somewhat extending the statement_timeout as only the time waiting on locks are counted, so SELECT FOR UPDATE/etc. may throw an error but if all rows are collected already, or plain SELECT is run, the application gets them all. This seems to follow the Microsoft SQL Server semantics: http://msdn.microsoft.com/en-us/library/ms189470.aspx The per-lock lock_timeout was the result of a big Informix project ported to PostgreSQL, this follows Informix semantics: http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls788.htm > Indeed, the use-case > that I'm envisioning is focused around "don't wait more than 'X' for the > relation-level locks" which would allow you to distinguish queries that > are, most likely anyway, making progress, from those which have been > caught behind a lock. That would match your 'per-statement' lock > timeout concept, iiuc, though I think it might be more simply > implemented. > >>> - Not a big fan of this: >>> >>> + * See notes in PGSemaphoreLock. >> Why? Copy&paste the *long* comment (a different one in each *_sema.c) >> or omitting the comments is better? Suggest a better comment, and >> it will be included. > How about, for starters, there's more than one PGSemaphoreLock? Second, > as you'll note in posix_sema.c, it's useful to say more than just "look > over there". I'm not suggesting a mass copy/paste, but more than 4 > words would be valuable. OK. > >>> - Do we really need TimestampTzPlusMicroseconds..? >> Well, my code is the only user for this macro but it's cleaner >> than explicitly doing > To be honest, I was really wondering why TimestampTzPlusMilliseconds > isn't sufficient, not suggesting that we litter the code with #ifdef's. Because Timestamp[Tz] is microsecond resolution, and the timeout GUCs are in milliseconds. Adding up time differences (and rounding or truncating them to millisecond in the process) would make the per-statement lock timeout lose precision... > > Thanks, > > Stephen -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Heikki Linnakangas
Date:
On 24.02.2013 05:07, Andrew Dunstan wrote: > On 02/23/2013 01:15 PM, Boszormenyi Zoltan wrote: >>> >>> First off, it's not in context diff format, which is the PG standard for >>> patch submission. >> >> Since moving to GIT, this expectation is obsolete. All PG hackers >> became comfortable with the unified diff format, see references >> from the list. A lot of hackers post "git diff" patches which cannot >> produce context diff, either. > > I am not aware that project policy has changed in the slightest in this > regard. I can't speak for others, but I personally don't care whether a patch is posted in unified or context diff format. Not as a general rule, anyway; patches that modify a few lines here and there are generally more readable in unified format, as the old and new lines are lined up right next to each other: @@ -329,7 +346,7 @@ foreign_expr_walker(Node *node, foreign_expr_cxt *context) static bool is_builtin(Oid oid) { - return (oid < FirstNormalObjectId); + return (oid < FirstBootstrapObjectId); } Then again, more complicated patches that modify large chunks of code are often unreadable when the - and + lines are mixed up in the same chunk; those are more readable in context format. So if you want to be kind to readers, look at the patch and choose the format depending on which one makes it look better. But there's no need to make a point of it when someone posts in "wrong" format. > Every unified diff can be turned into a context diff by passing it > though "filterdiff --format=context". Yep. And in emacs, there's "diff-unified->context" command. I bet most editors have a similar functionality these days. - Heikki
Stephen, 2013-02-23 02:55 keltezéssel, Stephen Frost írta: > Zoltán, > > * Zoltán Böszörményi (zb@cybertec.at) wrote: >> The patch now passed "make check" in both cases. > Is v29 the latest version of this patch..? attached is v30, I hope with everything fixed. - List based enable/disable_multiple_timeouts() - separate per-lock and per-statement lock_timeout variants - modified comments and documentation Patch is "git diff" format. Please, review. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
* Boszormenyi Zoltan (zb@cybertec.at) wrote: > 2013-02-24 03:23 keltezéssel, Stephen Frost írta: > >No, it isn't. Patches posted to the list should be in context diff > >format (and uncompressed unless it's too large) for easier reading. > >That avoids having to download it, apply it to a git tree and then > >create the diff ourselves. Indeed, the move to git had impact on this > >at all. > > I remembered this mail from The Master(tm): > http://www.postgresql.org/message-id/21555.1339866293@sss.pgh.pa.us Which matches exactly what I was saying- context diff provides easier readind for review purposes, which is presumably what you were looking to have happen when you posted this patch to the mailing list. And it's still the documented expectation and project policy, regardless of any individual's email. If we're going to change it, then the documentation, et al, should be updated as well. For my 2c, I still prefer context diffs posted to the mailing lists, but would also like to see more people posting pull requests. That doesn't make it project policy though. > So, more than halfof the recently posted patches come > directly from "git diff". The preference has changed. No, more people have ignored the project policy than not- that doesn't say anything about the preference or what the policy is. > >lock_timeout_wait and lock_timeout_stmt_wait ? > > Hm. How about without the "_wait" suffix? > Or lock_timeout vs statement_lock_timeout? I could live without the _wait suffix, but it occurs to me that none of these really indicate that statement_lock_timeout is an accumulated timeout. Perhaps it should say 'total lock wait timeout' or similar? > statement_timeout is the legacy behaviour, it should be kept. > It's behaviour is well understood: the statement finishes under > the specified time or it throws an error. The problem from the > application point of view is that the error can happen while > the tuples are being transferred to the client, so the recordset > can be cut in half. > > "lock_timeout_stmt" (or lock_timeout_option = per_statement) > is somewhat extending the statement_timeout as only the > time waiting on locks are counted, so SELECT FOR UPDATE/etc. > may throw an error but if all rows are collected already, or > plain SELECT is run, the application gets them all. > This seems to follow the Microsoft SQL Server semantics: > http://msdn.microsoft.com/en-us/library/ms189470.aspx The documentation at that link appears to match what 'lock_timeout' would do. Note that it says 'a lock' here: "When a wait for a lock exceeds the time-out value, an error is returned.", or have you tested the actual behavior and seen it treat this value as an accumulated time across all lock waits for an entire statement? > The per-lock lock_timeout was the result of a big Informix > project ported to PostgreSQL, this follows Informix semantics: > http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls788.htm I agree that the Informix command looks to be per-lock, but based on my simple reading of both documentation links, I think they're actually the same behavior and neither is about an accumulated time. > Because Timestamp[Tz] is microsecond resolution, and the timeout > GUCs are in milliseconds. Adding up time differences (and rounding > or truncating them to millisecond in the process) would make the > per-statement lock timeout lose precision... Removing the accumulation-based option would fix that.. :D Thanks, Stephen
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Stephen Frost
Date:
* Heikki Linnakangas (hlinnakangas@vmware.com) wrote: > So if you want to be kind to readers, look at the patch and choose > the format depending on which one makes it look better. But there's > no need to make a point of it when someone posts in "wrong" format. To be more precise- my main complaint about this is that this patch is making changes to multi-line comments and to documentation, both of which get very annoying to try and read in uniform diff format. Patches that don't do one or the other of those are likely incomplete anyway. As another point, it's also the very first thing that we document in http://wiki.postgresql.org/wiki/Reviewing_a_Patch to check for. > >Every unified diff can be turned into a context diff by passing it > >though "filterdiff --format=context". > > Yep. And in emacs, there's "diff-unified->context" command. I bet > most editors have a similar functionality these days. And it probably doesn't work for every patch either, just as filterdiff doesn't (see my other email). Thanks, Stephen
2013-02-24 15:03 keltezéssel, Stephen Frost írta: > * Boszormenyi Zoltan (zb@cybertec.at) wrote: >> 2013-02-24 03:23 keltezéssel, Stephen Frost írta: >>> No, it isn't. Patches posted to the list should be in context diff >>> format (and uncompressed unless it's too large) for easier reading. >>> That avoids having to download it, apply it to a git tree and then >>> create the diff ourselves. Indeed, the move to git had impact on this >>> at all. >> I remembered this mail from The Master(tm): >> http://www.postgresql.org/message-id/21555.1339866293@sss.pgh.pa.us > Which matches exactly what I was saying- context diff provides easier > readind for review purposes, which is presumably what you were looking > to have happen when you posted this patch to the mailing list. And it's > still the documented expectation and project policy, regardless of any > individual's email. If we're going to change it, then the > documentation, et al, should be updated as well. > > For my 2c, I still prefer context diffs posted to the mailing lists, > but would also like to see more people posting pull requests. That > doesn't make it project policy though. > >> So, more than halfof the recently posted patches come >> directly from "git diff". The preference has changed. > No, more people have ignored the project policy than not- that doesn't > say anything about the preference or what the policy is. > >>> lock_timeout_wait and lock_timeout_stmt_wait ? >> Hm. How about without the "_wait" suffix? >> Or lock_timeout vs statement_lock_timeout? > I could live without the _wait suffix, but it occurs to me that none of > these really indicate that statement_lock_timeout is an accumulated > timeout. Perhaps it should say 'total lock wait timeout' or similar? > >> statement_timeout is the legacy behaviour, it should be kept. >> It's behaviour is well understood: the statement finishes under >> the specified time or it throws an error. The problem from the >> application point of view is that the error can happen while >> the tuples are being transferred to the client, so the recordset >> can be cut in half. >> >> "lock_timeout_stmt" (or lock_timeout_option = per_statement) >> is somewhat extending the statement_timeout as only the >> time waiting on locks are counted, so SELECT FOR UPDATE/etc. >> may throw an error but if all rows are collected already, or >> plain SELECT is run, the application gets them all. >> This seems to follow the Microsoft SQL Server semantics: >> http://msdn.microsoft.com/en-us/library/ms189470.aspx > The documentation at that link appears to match what 'lock_timeout' > would do. Note that it says 'a lock' here: "When a wait for a lock > exceeds the time-out value, an error is returned.", or have you tested > the actual behavior and seen it treat this value as an accumulated > time across all lock waits for an entire statement? (Note to myself: don't read with headache.) I may have misread the MSDN link. > >> The per-lock lock_timeout was the result of a big Informix >> project ported to PostgreSQL, this follows Informix semantics: >> http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls788.htm > I agree that the Informix command looks to be per-lock, but based on my > simple reading of both documentation links, I think they're actually the > same behavior and neither is about an accumulated time. You seem to be right. >> Because Timestamp[Tz] is microsecond resolution, and the timeout >> GUCs are in milliseconds. Adding up time differences (and rounding >> or truncating them to millisecond in the process) would make the >> per-statement lock timeout lose precision... > Removing the accumulation-based option would fix that.. :D To call out the wrath of Tom? No, thanks. :-D IIRC, he was the one who didn't like the per-lock behaviour the first time he looked at this patch in an earlier form back in 2010/2011 and he proposed this use case instead. If not him, then someone else. I got the idea from this list... Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Claudio Freire
Date:
On Sun, Feb 24, 2013 at 11:08 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Heikki Linnakangas (hlinnakangas@vmware.com) wrote: >> So if you want to be kind to readers, look at the patch and choose >> the format depending on which one makes it look better. But there's >> no need to make a point of it when someone posts in "wrong" format. > > To be more precise- my main complaint about this is that this patch is > making changes to multi-line comments and to documentation, both of > which get very annoying to try and read in uniform diff format. > Patches that don't do one or the other of those are likely incomplete > anyway. > > As another point, it's also the very first thing that we document in > http://wiki.postgresql.org/wiki/Reviewing_a_Patch to check for. TBH, that wiki link seems to suggest that *having context* is the point of the requirement (to be able to merge with fuzz). Both unified and context formats have context.
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Andrew Dunstan
Date:
On 02/24/2013 12:39 PM, Claudio Freire wrote: > On Sun, Feb 24, 2013 at 11:08 AM, Stephen Frost <sfrost@snowman.net> wrote: >> * Heikki Linnakangas (hlinnakangas@vmware.com) wrote: >>> So if you want to be kind to readers, look at the patch and choose >>> the format depending on which one makes it look better. But there's >>> no need to make a point of it when someone posts in "wrong" format. >> To be more precise- my main complaint about this is that this patch is >> making changes to multi-line comments and to documentation, both of >> which get very annoying to try and read in uniform diff format. >> Patches that don't do one or the other of those are likely incomplete >> anyway. >> >> As another point, it's also the very first thing that we document in >> http://wiki.postgresql.org/wiki/Reviewing_a_Patch to check for. > > TBH, that wiki link seems to suggest that *having context* is the > point of the requirement (to be able to merge with fuzz). > > Both unified and context formats have context. > No, you're missing the point. Some people find reading context diffs much easier than reading unified diffs. That's why context diffs are the project's stated preference. cheers andrew
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Peter Geoghegan
Date:
On 24 February 2013 08:44, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I can't speak for others, but I personally don't care whether a patch is > posted in unified or context diff format. Not as a general rule, anyway; > patches that modify a few lines here and there are generally more readable > in unified format, as the old and new lines are lined up right next to each > other: I don't care either. My personal preference is context diff format, but then that's what I usually see anyway. I don't use filterdiff or anything like that. I just have a strong habit of using feature branches extensively, even for patches that I'm reviewing, and my setup makes that easy to create from a patch file. It's quite a rare occurrence for me to care enough about a patch to want to eyeball the code (and not just read the author's summary), and yet not care about it enough to make a feature branch for it. I can see how other people's habits might differ from my own here, and that they might reasonably state a preference for unified, which is fine. I developed a preference for unified over time, having originally just used the format on the advice of the wiki. -- Regards, Peter Geoghegan
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Stephen Frost
Date:
* Claudio Freire (klaussfreire@gmail.com) wrote: > > As another point, it's also the very first thing that we document in > > http://wiki.postgresql.org/wiki/Reviewing_a_Patch to check for. > > TBH, that wiki link seems to suggest that *having context* is the > point of the requirement (to be able to merge with fuzz). The PG wiki link states "Is the patch in context diff format?" and provides a link to the wikipedia article about *that specific format*. There's absolutely zero confusion over what "context diff format" means. Thanks, Stephen
2013-02-24 16:14 keltezéssel, Boszormenyi Zoltan írta: > 2013-02-24 15:03 keltezéssel, Stephen Frost írta: >> * Boszormenyi Zoltan (zb@cybertec.at) wrote: >>> 2013-02-24 03:23 keltezéssel, Stephen Frost írta: >>>> No, it isn't. Patches posted to the list should be in context diff >>>> format (and uncompressed unless it's too large) for easier reading. >>>> That avoids having to download it, apply it to a git tree and then >>>> create the diff ourselves. Indeed, the move to git had impact on this >>>> at all. >>> I remembered this mail from The Master(tm): >>> http://www.postgresql.org/message-id/21555.1339866293@sss.pgh.pa.us >> Which matches exactly what I was saying- context diff provides easier >> readind for review purposes, which is presumably what you were looking >> to have happen when you posted this patch to the mailing list. And it's >> still the documented expectation and project policy, regardless of any >> individual's email. If we're going to change it, then the >> documentation, et al, should be updated as well. >> >> For my 2c, I still prefer context diffs posted to the mailing lists, >> but would also like to see more people posting pull requests. That >> doesn't make it project policy though. >> >>> So, more than halfof the recently posted patches come >>> directly from "git diff". The preference has changed. >> No, more people have ignored the project policy than not- that doesn't >> say anything about the preference or what the policy is. >> >>>> lock_timeout_wait and lock_timeout_stmt_wait ? >>> Hm. How about without the "_wait" suffix? >>> Or lock_timeout vs statement_lock_timeout? >> I could live without the _wait suffix, but it occurs to me that none of >> these really indicate that statement_lock_timeout is an accumulated >> timeout. Perhaps it should say 'total lock wait timeout' or similar? >> >>> statement_timeout is the legacy behaviour, it should be kept. >>> It's behaviour is well understood: the statement finishes under >>> the specified time or it throws an error. The problem from the >>> application point of view is that the error can happen while >>> the tuples are being transferred to the client, so the recordset >>> can be cut in half. >>> >>> "lock_timeout_stmt" (or lock_timeout_option = per_statement) >>> is somewhat extending the statement_timeout as only the >>> time waiting on locks are counted, so SELECT FOR UPDATE/etc. >>> may throw an error but if all rows are collected already, or >>> plain SELECT is run, the application gets them all. >>> This seems to follow the Microsoft SQL Server semantics: >>> http://msdn.microsoft.com/en-us/library/ms189470.aspx >> The documentation at that link appears to match what 'lock_timeout' >> would do. Note that it says 'a lock' here: "When a wait for a lock >> exceeds the time-out value, an error is returned.", or have you tested >> the actual behavior and seen it treat this value as an accumulated >> time across all lock waits for an entire statement? > > (Note to myself: don't read with headache.) > I may have misread the MSDN link. > >> >>> The per-lock lock_timeout was the result of a big Informix >>> project ported to PostgreSQL, this follows Informix semantics: >>> http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls788.htm >>> >> I agree that the Informix command looks to be per-lock, but based on my >> simple reading of both documentation links, I think they're actually the >> same behavior and neither is about an accumulated time. > > You seem to be right. > >>> Because Timestamp[Tz] is microsecond resolution, and the timeout >>> GUCs are in milliseconds. Adding up time differences (and rounding >>> or truncating them to millisecond in the process) would make the >>> per-statement lock timeout lose precision... >> Removing the accumulation-based option would fix that.. :D > > To call out the wrath of Tom? No, thanks. :-D > IIRC, he was the one who didn't like the per-lock behaviour > the first time he looked at this patch in an earlier form > back in 2010/2011 and he proposed this use case instead. > If not him, then someone else. I got the idea from this list... Another question just popped up. Now, that bool enable_multiple_timeouts(List *timeouts); exists, do we really need the singular versions? Since the "timeout after N msec" have the per-lock and per-stmt versions, enable_timeout_after() gained a new argument to distinguish between the two cases and every occurrences of this function happen to just use "0" here. The only usage of the per-stmt variant is used with enable_multiple_timeouts(). Wouldn't it be better to have a single bool enable_timeouts(List *timeouts); instead? > > Best regards, > Zoltán Böszörményi > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Robert Haas
Date:
On Sun, Feb 24, 2013 at 4:31 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: >> > As another point, it's also the very first thing that we document in >> > http://wiki.postgresql.org/wiki/Reviewing_a_Patch to check for. >> >> TBH, that wiki link seems to suggest that *having context* is the >> point of the requirement (to be able to merge with fuzz). > > The PG wiki link states "Is the patch in context diff format?" and > provides a link to the wikipedia article about *that specific format*. > There's absolutely zero confusion over what "context diff format" means. True, but I'm with Heikki: it's a pedantic and unhelpful guideline. Everyone here who reviews patches regularly knows how to, and probably does, convert between those formats with regularity. Making patch submitters feel badly because they've used the "wrong" format does not advance the goals of the project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Zoltan, * Boszormenyi Zoltan (zb@cybertec.at) wrote: > Another question just popped up. Now, that > bool enable_multiple_timeouts(List *timeouts); > exists, do we really need the singular versions? > > Since the "timeout after N msec" have the per-lock and per-stmt > versions, enable_timeout_after() gained a new argument to > distinguish between the two cases and every occurrences of > this function happen to just use "0" here. The only usage of the > per-stmt variant is used with enable_multiple_timeouts(). For my 2c, I didn't partciularly care for changing enable_timeout_after() by adding an extra parameter that ended up being passed as ',0'.. Perhaps make it a wrapper instead of changing the definition and leaving the invocations of it alone? > Wouldn't it be better to have a single > bool enable_timeouts(List *timeouts); > instead? This might also work though, if everything is updated to use it and it's relatively clean. I realize for the aggregate case, you have to have it, but I really don't like the changes to have to reset the counter either. Tom, can you comment on your thoughts around this notion of an aggregate time constraint for waiting on locks? As mentioned upthread, I like the idea of having an upper-limit on waiting for relation-level locks, but once you're past that, I'm not sure that an aggregate waiting-on-locks is any better than the overall statement-level timeout and it seems somewhat messy, to me anyway. Thanks, Stephen
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote: > True, but I'm with Heikki: it's a pedantic and unhelpful guideline. Then let's change it, drop the preference, and update the documentation. I'd certainly prefer that to getting shot for pointing out to patch submitters that they're not following our documented guidelines. > Everyone here who reviews patches regularly knows how to, and probably > does, convert between those formats with regularity. Making patch > submitters feel badly because they've used the "wrong" format does not > advance the goals of the project. For my part, I'd rather put the onus on the submitter to submit a readable patch in the first part than ask the reviewer and anyone else interested in looking at the patch to fix it. That's even more true when you consider the archives and reading patches through the web interface (though downloading the original mail message has gotten better with the new archive code). Thanks, Stephen
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Andres Freund
Date:
On 2013-02-25 09:11:27 -0500, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > True, but I'm with Heikki: it's a pedantic and unhelpful guideline. > > Then let's change it, drop the preference, and update the documentation. +1 > > Everyone here who reviews patches regularly knows how to, and probably > > does, convert between those formats with regularity. Making patch > > submitters feel badly because they've used the "wrong" format does not > > advance the goals of the project. > > For my part, I'd rather put the onus on the submitter to submit a > readable patch in the first part than ask the reviewer and anyone else > interested in looking at the patch to fix it. That's even more true > when you consider the archives and reading patches through the web > interface (though downloading the original mail message has gotten > better with the new archive code). Well, the point is that you cannot satisfy enough people with either choice anyway. I for one feel much more comfortable sending patches in a format that I can actually read without thinking too much. Which is the case for unified but definitely not for context. But its different for others. I gave in and made my mailreader convert all patches to unified for reading, that way I don't care about other peoples preferences for one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes: > * Robert Haas (robertmhaas@gmail.com) wrote: >> True, but I'm with Heikki: it's a pedantic and unhelpful guideline. > Then let's change it, drop the preference, and update the documentation. I think we should drop the hard requirement for context-format, and instead say that it must not be plain (context-free) diff, since that clearly *is* a hard requirement. However, I liked the upthread suggestion (I think it was from Heikki) that we recommend that submitters actually take a moment to think about which format is more readable for their particular patch. Readability is important not only to help people who just give the patch a quick eyeball, but also to help the inevitable situations where hunks have to be applied by hand because the underlying code has changed. The less readable the patch, the more likely an error in doing that. (And I trust we've all learned by now that git isn't so good at merging that this isn't a problem.) regards, tom lane
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> True, but I'm with Heikki: it's a pedantic and unhelpful guideline. > > > Then let's change it, drop the preference, and update the documentation. > > I think we should drop the hard requirement for context-format, and > instead say that it must not be plain (context-free) diff, since that > clearly *is* a hard requirement. Alright, I'll start making those changes. > However, I liked the upthread suggestion (I think it was from Heikki) > that we recommend that submitters actually take a moment to think about > which format is more readable for their particular patch. Readability > is important not only to help people who just give the patch a quick > eyeball, but also to help the inevitable situations where hunks have > to be applied by hand because the underlying code has changed. The > less readable the patch, the more likely an error in doing that. > (And I trust we've all learned by now that git isn't so good at merging > that this isn't a problem.) I'll include that point in my changes but I consider the chances of that actually happening with any regularity to be essentially zero. Reducing our requirement to a level where the default passes means that nearly everyone, who hasn't already changed things to use a non-default automatically, is going to just use the default. Thanks, Stephen
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Boszormenyi Zoltan
Date:
2013-02-25 15:25 keltezéssel, Tom Lane írta: > Stephen Frost <sfrost@snowman.net> writes: >> * Robert Haas (robertmhaas@gmail.com) wrote: >>> True, but I'm with Heikki: it's a pedantic and unhelpful guideline. >> Then let's change it, drop the preference, and update the documentation. > I think we should drop the hard requirement for context-format, and > instead say that it must not be plain (context-free) diff, since that > clearly *is* a hard requirement. > > However, I liked the upthread suggestion (I think it was from Heikki) > that we recommend that submitters actually take a moment to think about > which format is more readable for their particular patch. Readability > is important not only to help people who just give the patch a quick > eyeball, but also to help the inevitable situations where hunks have > to be applied by hand because the underlying code has changed. The > less readable the patch, the more likely an error in doing that. +1 I think the readability is mostly the de-facto threshold. Considering that quite a few plain "git diff" patches were committed during this CF but those were readable in that format, even biggies like "teach receivexlog to switch timelines" which (I just browsed it) had parts that rewrote code in a way that the diff had pieces with different indentation intermixed. This can constitute to being unreadable at times but it also depends on the reader. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Re: unified vs context diffs (was Re: Strange Windows problem, lock_timeout test request)
From
Stephen Frost
Date:
* Stephen Frost (sfrost@snowman.net) wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > I think we should drop the hard requirement for context-format, and > > instead say that it must not be plain (context-free) diff, since that > > clearly *is* a hard requirement. > > Alright, I'll start making those changes. I've updated the wiki pages accordingly, though there's now a number of translated pages which also need updating to reflect this change. Thanks, Stephen
Zoltan, * Boszormenyi Zoltan (zb@cybertec.at) wrote: > attached is v30, I hope with everything fixed. Making progress, certainly. Given the hack to the API of enable_timeout_after() and the need for timeout_reset_base_time(), I'm just going to voice my objection to the "accumulated wait time on locks" portion again. I still like the idea of a timeout for waiting on relation-level locks, as we acquire those all up-front and we'd be able to just set a timeout at the appropriate point and then release it when we're past acquiring the relation-level locks. Seems like that would be much cleaner. On the other hand, if we're going to go down this route, I'm really starting to wonder if it should be the timeout systems responsibility to keep track of such accumulated time. Other than that.. > - List based enable/disable_multiple_timeouts() That looks good, like the use of foreach(), etc, but I don't like how you've set up delay_ms as a pointer..? Looks to be to allow you to initialize the TimeoutParams structs early in proc.c..? Is there another reason it needs to be a pointer that I'm missing? Why not build the TimeoutParam strcutures in the if() blocks that check if the GUCs are set..? > - separate per-lock and per-statement lock_timeout variants > - modified comments and documentation Thanks. Stephen
2013-02-27 04:06 keltezéssel, Stephen Frost írta: > Zoltan, > > * Boszormenyi Zoltan (zb@cybertec.at) wrote: >> attached is v30, I hope with everything fixed. > Making progress, certainly. > > Given the hack to the API of enable_timeout_after() and the need for > timeout_reset_base_time(), I'm just going to voice my objection to the > "accumulated wait time on locks" portion again. I still like the idea > of a timeout for waiting on relation-level locks, as we acquire those > all up-front and we'd be able to just set a timeout at the appropriate > point and then release it when we're past acquiring the relation-level > locks. Seems like that would be much cleaner. Yes, it would be cleaner. But this way the total timeout the statement would wait for locks is N * lock_timeout - <some delta> in the worst case when all the locks can be acquired just under the set timeout interval, N being the number of locks that statement wants to acquire. For some use cases, it's better to have known limit in the total amount of wait time. But unlike statement_timeout, with lock_timeout_stmt the statement can still finish after this limit as it does useful work besides waiting for locks. > On the other hand, if we're going to go down this route, I'm really > starting to wonder if it should be the timeout systems responsibility to > keep track of such accumulated time. Thinking about it a bit more, I start to agree with it. It's not likely that any new timeout sources will get added to proc.c that has anything to do with waiting across multiple locks.From this POV, this accumulated time can be done byproc.c itself. But this makes it necessary to reschedule the timer from the ProcSleep() loop so it increases the number of setitimer() calls. But with clever coding, the "it_interval" part of struct itimerval can be used to decrease the number of setitimer calls, so it may be balanced out. Another thought is that there is no need to have an extra function to set the start time for the timeouts. It can be folded into enable_timeout_after(), enable_timeout_at() and enable_multiple_timeouts() and it simplifies the API, too. Since setitimer() has microsecond resolution, I wonder whether the timeout.c code shouldn't accept that, too. Especially if we want to keep the per-statement accumulated version for the lock timeout. Then time to wait that can be represented using int32 would be about 35.8 minutes at most, we will need to use int64 if the maximum number of millisecs is to stay as INT_MAX, which I guess it should. Comments? > Other than that.. > >> - List based enable/disable_multiple_timeouts() > That looks good, like the use of foreach(), etc, but I don't like how > you've set up delay_ms as a pointer..? Looks to be to allow you to > initialize the TimeoutParams structs early in proc.c..? Is there > another reason it needs to be a pointer that I'm missing? Why not build > the TimeoutParam strcutures in the if() blocks that check if the GUCs > are set..? It's fixed in my tree and I also added an Assert to enable_multiple_timeouts() to make sure all timeout sources use either TIMEOUT_AT or TIMEOUT_AFTER. Some more comments were also fixed. But I would like to send out the new patch after the above questions are settled. I also want to thank you for your comments. It's good to have a fresh look on this code. >> - separate per-lock and per-statement lock_timeout variants >> - modified comments and documentation > Thanks. > > Stephen -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
* Boszormenyi Zoltan (zb@cybertec.at) wrote: > But unlike statement_timeout, > with lock_timeout_stmt the statement can still finish after this limit > as it does useful work besides waiting for locks. It's still entirely possible to get 99% done and then hit that last tuple that you need a lock on and just tip over the lock_timeout_stmt limit due to prior waiting and ending up wasting a bunch of work, hence why I'm not entirely sure that this is that much better than statement_timeout. > Thinking about it a bit more, I start to agree with it. > It's not likely that any new timeout sources will get added > to proc.c that has anything to do with waiting across multiple locks. > From this POV, this accumulated time can be done by proc.c itself. > But this makes it necessary to reschedule the timer from the > ProcSleep() loop so it increases the number of setitimer() calls. > But with clever coding, the "it_interval" part of struct itimerval > can be used to decrease the number of setitimer calls, so it may > be balanced out. We're not even going down this code path until we're already waiting on a lock from someone, right? Not sure that we need to stress out too much about calling setitimer(). > Another thought is that there is no need to have an extra function > to set the start time for the timeouts. It can be folded into > enable_timeout_after(), enable_timeout_at() and > enable_multiple_timeouts() and it simplifies the API, too. Right, back to how the API was originally, for the most part, no? :) > Since setitimer() has microsecond resolution, I wonder whether > the timeout.c code shouldn't accept that, too. Especially if we want > to keep the per-statement accumulated version for the lock timeout. > Then time to wait that can be represented using int32 would be > about 35.8 minutes at most, we will need to use int64 if the maximum > number of millisecs is to stay as INT_MAX, which I guess it should. > > Comments? Wouldn't that impact statement_timeout also..? I can definitely see someone wanting to set that at, say, an hour. If anything, I continue to feel that going the *other* way makes more sense- keep everything at millisecond and just floor() down to the ms when we're doing accounting based on microsecond information. Sure, we might end up waiting a bit (a very small bit..) longer than we were supposed to, but I hardly see that as being a major complaint. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Tom, can you comment on your thoughts around this notion of an aggregate > time constraint for waiting on locks? As mentioned upthread, I like the > idea of having an upper-limit on waiting for relation-level locks, but > once you're past that, I'm not sure that an aggregate waiting-on-locks > is any better than the overall statement-level timeout and it seems > somewhat messy, to me anyway. I think anything that makes this patch simpler is a good idea. I don't like any of the accum_time stuff: it complicates the timeout API unreasonably and slows down existing use cases. Some other thoughts: * timeout_reset_base_time() seems like an ugly and unnecessary API wart. I don't like the timeout_start state variable at all; if you need several timeouts to be scheduled relative to the exact same starting point, can't you just do that in a single enable_multiple_timeouts() call? And I think the optional TMPARAM_ACCUM action is completely unacceptable, because it supposes that every user of a timeout, now and in the future, is okay with having their accumulated time reset at the same point. The whole point of having invented this timeout API is to serve timeout uses we don't currently foresee, so actions that affect every timeout seem pretty undesirable. * I don't care for changing the API of enable_timeout_after when there is in fact not a single caller using the flags argument (probably because the only defined flag is too baroque to be useful). If there were a use case for the "accum" action it'd be better to have a separate API function for it, probably. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I think anything that makes this patch simpler is a good idea. I don't > like any of the accum_time stuff: it complicates the timeout API > unreasonably and slows down existing use cases. Right, I think we're on the same page there- I had just commented to Zoltan that tracking the accumulated time shouldn't be the timeout system's responsibility and that the timout API really shouldn't need to be changed. I'm not convinced that the lock-time-accumulation-timeout capability is really all that valuable in the first place though, but perhaps I'm in the minority on that. Thanks, Stephen
2013-02-27 19:07 keltezéssel, Tom Lane írta: > Stephen Frost <sfrost@snowman.net> writes: >> Tom, can you comment on your thoughts around this notion of an aggregate >> time constraint for waiting on locks? As mentioned upthread, I like the >> idea of having an upper-limit on waiting for relation-level locks, but >> once you're past that, I'm not sure that an aggregate waiting-on-locks >> is any better than the overall statement-level timeout and it seems >> somewhat messy, to me anyway. > I think anything that makes this patch simpler is a good idea. I don't > like any of the accum_time stuff: it complicates the timeout API > unreasonably and slows down existing use cases. Perfect. :-) > Some other thoughts: > > * timeout_reset_base_time() seems like an ugly and unnecessary API wart. > I don't like the timeout_start state variable at all; if you need > several timeouts to be scheduled relative to the exact same starting > point, can't you just do that in a single enable_multiple_timeouts() > call? And I think the optional TMPARAM_ACCUM action is completely > unacceptable, If we get rid of the per-statement variant, there is no need for that either. > because it supposes that every user of a timeout, now and > in the future, is okay with having their accumulated time reset at the > same point. The whole point of having invented this timeout API is to > serve timeout uses we don't currently foresee, so actions that affect > every timeout seem pretty undesirable. > > * I don't care for changing the API of enable_timeout_after when there > is in fact not a single caller using the flags argument (probably > because the only defined flag is too baroque to be useful). If there > were a use case for the "accum" action it'd be better to have a separate > API function for it, probably. This way, enable_timeout_after() wouldn't have this extra argument either. Thanks for your kind words. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Zoltan, * Boszormenyi Zoltan (zb@cybertec.at) wrote: > If we get rid of the per-statement variant, there is no need for that either. For my 2c, I didn't see Tom's comments as saying that we shouldn't have that capability, just that the implementation was ugly. :) That said, perhaps we should just drop it for now, get the lock_timeout piece solid, and then come back to the question about lock_timeout_stmt. Thanks, Stephen
2013-02-27 20:06 keltezéssel, Stephen Frost írta: > Zoltan, > > * Boszormenyi Zoltan (zb@cybertec.at) wrote: >> If we get rid of the per-statement variant, there is no need for that either. > For my 2c, I didn't see Tom's comments as saying that we shouldn't have > that capability, just that the implementation was ugly. :) But I am happy to drop it. ;-) > That said, perhaps we should just drop it for now, get the lock_timeout > piece solid, and then come back to the question about lock_timeout_stmt. OK, let's do it this way. > > Thanks, > > Stephen -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-02-27 20:38 keltezéssel, Boszormenyi Zoltan írta: > 2013-02-27 20:06 keltezéssel, Stephen Frost írta: >> Zoltan, >> >> * Boszormenyi Zoltan (zb@cybertec.at) wrote: >>> If we get rid of the per-statement variant, there is no need for that either. >> For my 2c, I didn't see Tom's comments as saying that we shouldn't have >> that capability, just that the implementation was ugly. :) > > But I am happy to drop it. ;-) > >> That said, perhaps we should just drop it for now, get the lock_timeout >> piece solid, and then come back to the question about lock_timeout_stmt. > > OK, let's do it this way. Dropped the per-statement lock timeout for now. The patch is now obviously simpler and shorter. I renamed enable/disable_multiple_timeouts() to simply enable/disable_timeouts() since the List* argument implies more than one of them and you need to type less. The comments and the documentation needs another review, to make sure I left no traces of the per-statements variant. I can't see any but I stared at this patch for so long that I can't be sure anymore. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
<br /> -----BEGIN PGP SIGNED MESSAGE-----<br /> Hash: SHA1<br /><br /> On 02/27/2013 09:58 PM, Stephen Frost wrote:<br /><spanstyle="white-space: pre;">> * Boszormenyi Zoltan (<a class="moz-txt-link-abbreviated" href="mailto:zb@cybertec.at">zb@cybertec.at</a>)wrote:<br /> >> But unlike statement_timeout,<br /> >> with lock_timeout_stmtthe statement can still finish after this limit<br /> >> as it does useful work besides waiting forlocks.<br /> ><br /> > It's still entirely possible to get 99% done and then hit that last<br /> > tuple thatyou need a lock on and just tip over the lock_timeout_stmt<br /> > limit due to prior waiting and ending up wastinga bunch of work, hence<br /> > why I'm not entirely sure that this is that much better than<br /> > statement_timeout.</span><br/><br /> There are questions about whether this is a good idea, and there's still discussionongoing. It doesn't look like it's in a state where we can confidently say "let's include this for 9.3" to me,but I'd like other viewpoints.<br /><br /> Should we bump this to the next CF? It's clearly still a viable idea, justpossibly not ready yet.<br /><br /> - -- <br /> Craig Ringer <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br/> PostgreSQL Development, 24x7 Support, Training &Services<br /> -----BEGIN PGP SIGNATURE-----<br /> Version: GnuPG v1.4.13 (GNU/Linux)<br /> Comment: Using GnuPG withThunderbird - <a class="moz-txt-link-freetext" href="http://www.enigmail.net/">http://www.enigmail.net/</a><br /><br/> iQEcBAEBAgAGBQJRNBTkAAoJELBXNkqjr+S2qd0H+gMdDFmoWLJbqw1IvlopTTiz<br /> LtYr/lkmiRVFFOPgcAMwrDrTzT1AkGIHbkYd0erXqRUNsSrFY9O3FabyQYfo9QG2<br/> 5HhvZkmNxf+WFyaqpg1gq/L1pm+2gr0o0N3GabmJTmg9JO7sf1BUBv/EdImaq1CT<br/> lARJXXNC5vI/sVr2P/GpazCzl2120t+ZM9QGyqqqrz6e5t3BjpkGR4Y7MxyVkcfs<br/> hDOpVIoXMDwOZVJTojLHLqeBdjOljRhCjgkqHKXii9ZUBCs5jFGBT/yOQCTwA2xo<br/> YyHbJt+7VJm/lTvG379Q/vXMvIAZkbWtENOwKokwPThlq2HDAAEluZ8U7h5/8i4=<br/> =49I0<br /> -----END PGP SIGNATURE-----<br /><br />
Craig, * Craig Ringer (craig@2ndquadrant.com) wrote: > There are questions about whether this is a good idea, and there's still > discussion ongoing. It doesn't look like it's in a state where we can > confidently say "let's include this for 9.3" to me, but I'd like other > viewpoints. For my part, I think the straight-up 'lock_timeout' piece, which is in the latest patch, is in pretty good shape. It's much less invasive and provides a capability which other RDBMS's have and is reasonably straight forward. > Should we bump this to the next CF? It's clearly still a viable idea, > just possibly not ready yet. Unless a committer steps up to take on the statement-level lock-wait timeout, it's not going to get into 9.3, imv. Waiting to add that (whatever it ends up being) until post-9.3 makes sense to me, but I'd hate to miss getting the simpler lock_timeout capability into 9.3. Thanks, Stephen
On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Boszormenyi Zoltan (zb@cybertec.at) wrote: >> But unlike statement_timeout, >> with lock_timeout_stmt the statement can still finish after this limit >> as it does useful work besides waiting for locks. > > It's still entirely possible to get 99% done and then hit that last > tuple that you need a lock on and just tip over the lock_timeout_stmt > limit due to prior waiting and ending up wasting a bunch of work, hence > why I'm not entirely sure that this is that much better than > statement_timeout. I tend to agree that this should be based on the length of any individual lock wait, not the cumulative duration of lock waits. Otherwise, it seems like it'll be very hard to set this to a meaningful value. For example, if you set this to 1 minute, and that means the length of any single wait, then you basically know that it'll only kick in if there is some other, long-running transaction that's holding the lock. But if it means the cumulative length of all waits, it's not so clear, because now you might also have this kick in if you wait for 100ms on 600 different occasions. In other words, complex queries that lock or update many tuples may get killed even if they never wait very long at all for any single lock. That seems like it will be almost indistinguishable from random, unprincipled query cancellations. Now, you could try to work around that by varying the setting based on the complexity of the query you're about to run, but that seems like a pain in the neck, to put it mildly. And it might still not give you the behavior that you want. Personally, I'd think a big part of the appeal of this is to make sure that you don't hang waiting for a RELATION level lock for too long before giving up. And for that, scaling with the complexity of the query would be exactly the wrong thing to do, even if you could figure out some system for it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost <sfrost@snowman.net> wrote: >> It's still entirely possible to get 99% done and then hit that last >> tuple that you need a lock on and just tip over the lock_timeout_stmt >> limit due to prior waiting and ending up wasting a bunch of work, hence >> why I'm not entirely sure that this is that much better than >> statement_timeout. > I tend to agree that this should be based on the length of any > individual lock wait, not the cumulative duration of lock waits. > Otherwise, it seems like it'll be very hard to set this to a > meaningful value. For example, if you set this to 1 minute, and that > means the length of any single wait, then you basically know that > it'll only kick in if there is some other, long-running transaction > that's holding the lock. But if it means the cumulative length of all > waits, it's not so clear, because now you might also have this kick in > if you wait for 100ms on 600 different occasions. In other words, > complex queries that lock or update many tuples may get killed even if > they never wait very long at all for any single lock. That seems like > it will be almost indistinguishable from random, unprincipled query > cancellations. Yeah. I'm also unconvinced that there's really much use-case territory here that statement_timeout doesn't cover well enough. To have a case that statement-level lock timeout covers and statement_timeout doesn't, you need to suppose that you know how long the query can realistically wait for all locks together, but *not* how long it's going to run in the absence of lock delays. That seems a bit far-fetched, particularly when thinking of row-level locks, whose cumulative timeout would presumably need to scale with the number of rows the query will visit. If statement-level lock timeouts were cheap to add, that would be one thing; but given that they're complicating the code materially, I think we need a more convincing argument for them. regards, tom lane
2013-03-06 19:53 keltezéssel, Tom Lane írta: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost <sfrost@snowman.net> wrote: >>> It's still entirely possible to get 99% done and then hit that last >>> tuple that you need a lock on and just tip over the lock_timeout_stmt >>> limit due to prior waiting and ending up wasting a bunch of work, hence >>> why I'm not entirely sure that this is that much better than >>> statement_timeout. >> I tend to agree that this should be based on the length of any >> individual lock wait, not the cumulative duration of lock waits. >> Otherwise, it seems like it'll be very hard to set this to a >> meaningful value. For example, if you set this to 1 minute, and that >> means the length of any single wait, then you basically know that >> it'll only kick in if there is some other, long-running transaction >> that's holding the lock. But if it means the cumulative length of all >> waits, it's not so clear, because now you might also have this kick in >> if you wait for 100ms on 600 different occasions. In other words, >> complex queries that lock or update many tuples may get killed even if >> they never wait very long at all for any single lock. That seems like >> it will be almost indistinguishable from random, unprincipled query >> cancellations. > Yeah. I'm also unconvinced that there's really much use-case territory > here that statement_timeout doesn't cover well enough. To have a case > that statement-level lock timeout covers and statement_timeout doesn't, > you need to suppose that you know how long the query can realistically > wait for all locks together, but *not* how long it's going to run in the > absence of lock delays. That seems a bit far-fetched, particularly when > thinking of row-level locks, whose cumulative timeout would presumably > need to scale with the number of rows the query will visit. > > If statement-level lock timeouts were cheap to add, that would be one > thing; but given that they're complicating the code materially, I think > we need a more convincing argument for them. OK, so it's not wanted. Surprise, surprise, it was already dropped from the patch. Can you _please_ review the last patch and comment on it instead of the state of past? Thanks and best regards, Zoltán Böszörményi > > regards, tom lane > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > [ 2-lock_timeout-v33.patch ] I looked at this patch a bit. I don't understand why you've chosen to alter the API of the enable_timeout variants to have a bool result that says "I didn't bother to process the timeout because it would have fired immediately". That is not an advantage for any call site that I can see: it just means that the caller needs an extra, very difficult to test code path to handle what would normally be handled by a timeout interrupt. Even if it were a good API choice, you've broken a number of existing call sites that the patch fails to touch (eg in postmaster.c and postgres.c). It's not acceptable to define a failure return from enable_timeout_after and then let callers assume that the failure can't happen. Please undo that. Also, I'm not really enamored of the choice to use List* infrastructure for enable_timeouts(). That doesn't appear to be especially convenient for any caller, and what it does do is create memory-leak risks for most of them (if they forget to free the list cells, perhaps as a result of an error exit). I think a local-variable array of TimeoutParams[] structs would serve better for most use-cases. Another thought is that the PGSemaphoreTimedLock implementations all share the same bug, which is that if the "condition" callback returns true immediately after we acquire the semaphore, they will all wrongly return false indicating that the semaphore wasn't acquired. (BTW, I do not like either the variable name "condition" or the typedef name "TimeoutCondition" for something that's actually a callback function pointer.) In the same vein, this comment change: * NOTE: Think not to put any shared-state cleanup after the call to * ProcSleep, in either the normal or failurepath. The lock state must ! * be fully set by the lock grantor, or by CheckDeadLock if we give up ! * waiting for the lock. This is necessary because of the possibility ! * that a cancel/die interrupt will interrupt ProcSleep after someone else ! * grants us the lock, but before we've noticed it. Hence, after granting, ! * the locktable state must fully reflect the fact that we own the lock; ! * we can't do additional work on return. * * We can and do use a PG_TRY block to try to clean up after failure,but * this still has a major limitation: elog(FATAL) can occur while waiting --- 1594,1606 ---- /* * NOTE: Think not to put any shared-state cleanup after the call to * ProcSleep, in eitherthe normal or failure path. The lock state must ! * be fully set by the lock grantor, or by CheckDeadLock or by ProcSleep ! * itself in case a timeout is detected or if we give up waiting for the lock. ! * This is necessary because of the possibility that a cancel/die interrupt ! * will interrupt ProcSleep after someone else grants us the lock, but ! * before we've noticed it. Hence, after granting, the locktable state must ! * fully reflect the fact that we own the lock; we can't do additional work ! * on return. * suggests that you've failed to grasp the fundamental race-condition problem here. ProcSleep can't do cleanup after the fact any more than its callers can, because otherwise it has a race condition with some other process deciding to grant it the lock at about the same time its timeout goes off. I think the changes in ProcSleep that alter the state-cleanup behavior are just plain wrong, and what you need to do is make that look more like the existing mechanisms that clean up when statement timeout occurs. regards, tom lane
2013-03-15 18:53 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan<zb@cybertec.at> writes: >> [ 2-lock_timeout-v33.patch ] > I looked at this patch a bit. I don't understand why you've chosen to > alter the API of the enable_timeout variants to have a bool result that > says "I didn't bother to process the timeout because it would have fired > immediately". I don't know either... > That is not an advantage for any call site that I can > see: it just means that the caller needs an extra, very difficult to > test code path to handle what would normally be handled by a timeout > interrupt. Even if it were a good API choice, you've broken a number of > existing call sites that the patch fails to touch (eg in postmaster.c > and postgres.c). It's not acceptable to define a failure return from > enable_timeout_after and then let callers assume that the failure can't > happen. > > Please undo that. Done. > Also, I'm not really enamored of the choice to use List* infrastructure > for enable_timeouts(). That doesn't appear to be especially convenient > for any caller, and what it does do is create memory-leak risks for most > of them (if they forget to free the list cells, perhaps as a result of > an error exit). I think a local-variable array of TimeoutParams[] > structs would serve better for most use-cases. Changed. However, the first member of the structure is "TimeoutId id" and a sensible end-of-array value can be -1. Some versions of GCC (maybe other compilers, too) complain if a constant is assigned to an enum which is outside of the declared values of the enum. So I added a "NO_TIMEOUT = -1" to enum TimeoutId. Comments? > Another thought is that the PGSemaphoreTimedLock implementations all > share the same bug, which is that if the "condition" callback returns > true immediately after we acquire the semaphore, they will all wrongly > return false indicating that the semaphore wasn't acquired. You are right. Fixed. > (BTW, > I do not like either the variable name "condition" or the typedef name > "TimeoutCondition" for something that's actually a callback function > pointer.) How about "TimeoutCallback" and "callback_fn"? > In the same vein, this comment change: > > * NOTE: Think not to put any shared-state cleanup after the call to > * ProcSleep, in either the normal or failure path. The lock state must > ! * be fully set by the lock grantor, or by CheckDeadLock if we give up > ! * waiting for the lock. This is necessary because of the possibility > ! * that a cancel/die interrupt will interrupt ProcSleep after someone else > ! * grants us the lock, but before we've noticed it. Hence, after granting, > ! * the locktable state must fully reflect the fact that we own the lock; > ! * we can't do additional work on return. > * > * We can and do use a PG_TRY block to try to clean up after failure, but > * this still has a major limitation: elog(FATAL) can occur while waiting > --- 1594,1606 ---- > /* > * NOTE: Think not to put any shared-state cleanup after the call to > * ProcSleep, in either the normal or failure path. The lock state must > ! * be fully set by the lock grantor, or by CheckDeadLock or by ProcSleep > ! * itself in case a timeout is detected or if we give up waiting for the lock. > ! * This is necessary because of the possibility that a cancel/die interrupt > ! * will interrupt ProcSleep after someone else grants us the lock, but > ! * before we've noticed it. Hence, after granting, the locktable state must > ! * fully reflect the fact that we own the lock; we can't do additional work > ! * on return. > * > > suggests that you've failed to grasp the fundamental race-condition > problem here. ProcSleep can't do cleanup after the fact any more than > its callers can, because otherwise it has a race condition with some > other process deciding to grant it the lock at about the same time its > timeout goes off. I think the changes in ProcSleep that alter the > state-cleanup behavior are just plain wrong, and what you need to do > is make that look more like the existing mechanisms that clean up when > statement timeout occurs. This was the most enlightening comment up to now. It seems no one else understood the timer code but you... Thanks very much. I hope the way I did it is right. I factored out the core of StatementCancelHandler() into a common function that can be used by the lock_timeout interrupt as its timeout callback function. Now the code doesn't need PGSemaphoreTimedLock(). While I was thinking about how this thing works, I recognized that I also need to set the timeout indicator to false after checking it in ProcessInterrupts(). The reason is that it's needed is this scenario: 1. lock_timeout is set and the transaction throws its error 2. lock_timeout is unset before the next transaction 3. the user presses Ctrl-C during the next transaction In this case, the second transaction would report the lock timeout error, since the indicator was still set. The other way would be to call disable_all_timeouts(false) unconditionally in ProcessInterrupts() but setting only the indicator is faster and it doesn't interfere with unknown registered timeout. Also, the reason is that it's disable_timeout_indicator() instead of a generic set_timeout_indicator() is that the generic one may be abused to produce false positives. Best regards, Zoltán Böszörményi > regards, tom lane > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web:http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Boszormenyi Zoltan <zb@cybertec.at> writes: > 2013-03-15 18:53 keltez�ssel, Tom Lane �rta: >> Also, I'm not really enamored of the choice to use List* infrastructure >> for enable_timeouts(). > Changed. However, the first member of the structure is > "TimeoutId id" and a sensible end-of-array value can be -1. > Some versions of GCC (maybe other compilers, too) complain > if a constant is assigned to an enum which is outside of the > declared values of the enum. So I added a "NO_TIMEOUT = -1" > to enum TimeoutId. Comments? I was thinking more of having array pointer and count parameters, ieenable_timeouts(TimeoutParams *timeouts, int n) I guess we could do it with sentinels instead but not sure that's better. The sentinel approach might be all right if there was another reason to have an "invalid" value in the enum, but I'm not seeing one ATM. > I hope the way I did it is right. I factored out the core of > StatementCancelHandler() into a common function that can > be used by the lock_timeout interrupt as its timeout callback > function. Now the code doesn't need PGSemaphoreTimedLock(). Hm, not needing PGSemaphoreTimedLock at all is an improvement for sure. Don't have time right now to look closer though. regards, tom lane
2013-03-16 17:42 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> 2013-03-15 18:53 keltezéssel, Tom Lane írta: >>> Also, I'm not really enamored of the choice to use List* infrastructure >>> for enable_timeouts(). >> Changed. However, the first member of the structure is >> "TimeoutId id" and a sensible end-of-array value can be -1. >> Some versions of GCC (maybe other compilers, too) complain >> if a constant is assigned to an enum which is outside of the >> declared values of the enum. So I added a "NO_TIMEOUT = -1" >> to enum TimeoutId. Comments? > I was thinking more of having array pointer and count parameters, ie > enable_timeouts(TimeoutParams *timeouts, int n) > I guess we could do it with sentinels instead but not sure that's > better. > > The sentinel approach might be all right if there was another reason > to have an "invalid" value in the enum, but I'm not seeing one ATM. Stephen Frost was against the array pointer/count variant, it was done that way earlier. Let me redo it again. :-) > >> I hope the way I did it is right. I factored out the core of >> StatementCancelHandler() into a common function that can >> be used by the lock_timeout interrupt as its timeout callback >> function. Now the code doesn't need PGSemaphoreTimedLock(). > Hm, not needing PGSemaphoreTimedLock at all is an improvement for > sure. Don't have time right now to look closer though. > > regards, tom lane > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
* Boszormenyi Zoltan (zb@cybertec.at) wrote: > Stephen Frost was against the array pointer/count variant, > it was done that way earlier. Let me redo it again. :-) I still don't particularly like the array approach, and see the array+count approach as worse (seems like a higher chance that the count will end up being wrong at some point than having an array termination identifier). I still like the List approach, as that builds on a structure we've already got and can take advantage of the existing infrastructure. but Tom's got a good point regarding the potential for memory leaks with that solution. I havn't had a chance to look, but I would have expected the Lists for these to be allocated in a per-statement context, which would address the memory leak issue. Perhaps that isn't possible though. I agree that the List construct doesn't particularly help the callers, though I do think it makes the enable_timeouts() function cleaner. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Boszormenyi Zoltan (zb@cybertec.at) wrote: >> Stephen Frost was against the array pointer/count variant, >> it was done that way earlier. Let me redo it again. :-) > I still don't particularly like the array approach, and see the > array+count approach as worse (seems like a higher chance that the count > will end up being wrong at some point than having an array termination > identifier). I still like the List approach, as that builds on a > structure we've already got and can take advantage of the existing > infrastructure. but Tom's got a good point regarding the potential for > memory leaks with that solution. > I havn't had a chance to look, but I would have expected the Lists for > these to be allocated in a per-statement context, which would address > the memory leak issue. Perhaps that isn't possible though. I agree > that the List construct doesn't particularly help the callers, though I > do think it makes the enable_timeouts() function cleaner. I'm not sure about that. I think locks will typically get taken in query-lifespan or transaction-lifespan contexts, so that repeated leaks would be significant. Possibly my fear of leaks during error exits is unfounded, but I think forgetting to pfree the list cells in normal execution would be a problem. Anyway, the bottom line for me was that the List way didn't seem to be either convenient for the callers or especially efficient, because of the need to palloc list cells and then remember to pfree them again. Another way that we perhaps should consider is to follow the example of XLogInsert and use internally-threaded lists that are typically stored in local arrays in the callers. I've never thought that way was especially beautiful, but it does have the advantage of being an idiom that's already in use in other low-level code. On the whole though, I don't see anything wrong with pointer-and-count. I don't really believe that there's ever going to be a need to enable more than a couple of timeouts simultaneously, so I don't want an overly complicated data structure for it. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > On the whole though, I don't see anything wrong with pointer-and-count. > I don't really believe that there's ever going to be a need to enable > more than a couple of timeouts simultaneously, so I don't want an overly > complicated data structure for it. Alright, fair enough. Zoltan, sorry for the back-and-forth Zoltan and thanks for being persistent; I'd really like to see this capability added. Thanks again, Stephen
Tom Lane wrote: > Another way that we perhaps should consider is to follow the example of > XLogInsert and use internally-threaded lists that are typically stored > in local arrays in the callers. I've never thought that way was > especially beautiful, but it does have the advantage of being an idiom > that's already in use in other low-level code. FWIW you could use an slist from ilist.c. It means each node would need a "next" pointer, but there's no separately allocated list cell. There are many places that could use slist/dlist. For instance while reading the SET PERSISTENT patch I noticed it has head and tail pointers being passed all over the place, which looks rather ugly. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Boszormenyi Zoltan <zb@cybertec.at> writes: > [ 2-lock_timeout-v37.patch ] Applied after a fair amount of additional hacking. I was disappointed to find that the patch introduced a new race condition into timeout.c, or at least broke a safety factor that had been there. The argument why enable_timeout() could skip disabling the timer interrupt on entry, if num_active_timeouts is zero, doesn't work for enable_timeouts(): as soon as you've incremented num_active_timeouts for the first new timeout, the signal handler would mess around with the data structure if it were to fire. What I did about that was to modify disable_alarm() to forcibly disable the interrupt if we're adding multiple timeouts in enable_timeouts(), even if we think no interrupt is pending. This might be overly paranoid, but since all of this is new code for 9.3 and hasn't been through any beta testing, I felt it best to preserve that safety feature. We can revisit it later if it proves to be an issue. (It's conceivable for instance that we could avoid incrementing the stored value of num_active_timeouts until we're done adding all the new timeouts; but that seemed pretty messy.) For the current usage pattern I'm not too sure that it matters anyway: a savings is only possible if you have enabled lock_timeout but not statement_timeout, and I'm doubtful that that will be a common use-case. I also rearranged the handling of the LOCK_TIMEOUT interrupt some more, since I didn't see a need for it to be different from STATEMENT_TIMEOUT, and got rid of some non-C89 coding practices that didn't seem to me to be improving readability anyway. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Another way that we perhaps should consider is to follow the example of >> XLogInsert and use internally-threaded lists that are typically stored >> in local arrays in the callers. I've never thought that way was >> especially beautiful, but it does have the advantage of being an idiom >> that's already in use in other low-level code. > FWIW you could use an slist from ilist.c. It means each node would need > a "next" pointer, but there's no separately allocated list cell. Yeah, if the usage patterns were more complicated it'd be worth thinking about that. Right now there's nothing more complex than this: *************** ResolveRecoveryConflictWithBufferPin(voi *** 428,435 **** * Wake up at ltime, and check for deadlocks as well if we will be * waiting longer thandeadlock_timeout */ ! enable_timeout_after(STANDBY_DEADLOCK_TIMEOUT, DeadlockTimeout); ! enable_timeout_at(STANDBY_TIMEOUT, ltime); } /* Wait to be signaled by UnpinBuffer() */ --- 428,442 ---- * Wake up at ltime, and check for deadlocks as well if we will be * waiting longer thandeadlock_timeout */ ! EnableTimeoutParams timeouts[2]; ! ! timeouts[0].id = STANDBY_TIMEOUT; ! timeouts[0].type = TMPARAM_AT; ! timeouts[0].fin_time = ltime; ! timeouts[1].id = STANDBY_DEADLOCK_TIMEOUT; ! timeouts[1].type = TMPARAM_AFTER; ! timeouts[1].delay_ms = DeadlockTimeout; ! enable_timeouts(timeouts, 2); } and you really can't improve that by complicating the data structure. regards, tom lane
2013-03-17 04:48 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> [ 2-lock_timeout-v37.patch ] > Applied after a fair amount of additional hacking. Thank you, thank you, thank you! :-) > I was disappointed to find that the patch introduced a new race > condition into timeout.c, or at least broke a safety factor that had > been there. The argument why enable_timeout() could skip disabling > the timer interrupt on entry, if num_active_timeouts is zero, doesn't > work for enable_timeouts(): as soon as you've incremented > num_active_timeouts for the first new timeout, the signal handler would > mess around with the data structure if it were to fire. What I did > about that was to modify disable_alarm() to forcibly disable the > interrupt if we're adding multiple timeouts in enable_timeouts(), even > if we think no interrupt is pending. This might be overly paranoid, > but since all of this is new code for 9.3 and hasn't been through any > beta testing, I felt it best to preserve that safety feature. We can > revisit it later if it proves to be an issue. (It's conceivable for > instance that we could avoid incrementing the stored value of > num_active_timeouts until we're done adding all the new timeouts; > but that seemed pretty messy.) Your reasoning seems to be correct. However, if we take it to the extreme, enable_timeout_at/enable_timeout_after/enable_timeouts should also disable the interrupt handler at the beginning of the function and enabled at the end with pqsignal(). An evil soul may send SIGALRM externally to the backend processes at the proper moment and create a race condition while enable_timeout*() is in progress and the itimer is about to trigger at the same time (but doesn't since it was disabled). > For the current usage pattern I'm not > too sure that it matters anyway: a savings is only possible if you > have enabled lock_timeout but not statement_timeout, and I'm doubtful > that that will be a common use-case. I am not too sure about it. With lock_timeout in place, code migrated from Informix, MSSQL, etc. can have the same semantic behaviour. > > I also rearranged the handling of the LOCK_TIMEOUT interrupt some more, > since I didn't see a need for it to be different from STATEMENT_TIMEOUT, > and got rid of some non-C89 coding practices that didn't seem to me to > be improving readability anyway. Thanks for that, too. I was too blind to see that choice, even after thinking about why the statement_timeout cannot be done from SIGALRM context and why does the code need to also send SIGINT to the process group. (To kill children, too, like scripts executed via system()...) Thanks again, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > 2013-03-17 04:48 keltez�ssel, Tom Lane �rta: >> [ worries about stray SIGALRM events ] > Your reasoning seems to be correct. However, if we take it to the > extreme, enable_timeout_at/enable_timeout_after/enable_timeouts > should also disable the interrupt handler at the beginning of the > function and enabled at the end with pqsignal(). An evil soul may > send SIGALRM externally to the backend processes at the proper > moment and create a race condition while enable_timeout*() is in > progress and the itimer is about to trigger at the same time (but > doesn't since it was disabled). Well, a malefactor with the ability to send signals to a backend process could also send SIGKILL, or any number of other signals that would mess things up. I'm not too concerned about that scenario. I *am* concerned about leftover timer events, both because this code isn't very well tested, and because third-party code loaded into the backend (think pl/perl for instance) could easily set up timer events we weren't expecting. It suddenly occurs to me though that there's more than one way to skin this cat. We could easily add another static flag variable called "sigalrm_allowed" or some such, and have the signal handler test that and immediately return without doing anything if it's off. Clearing and setting such a variable would be a lot cheaper than an extra setitimer call, as well as more robust since it would protect us from all sources of SIGALRM not just ITIMER_REAL. regards, tom lane
2013-03-17 16:07 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> 2013-03-17 04:48 keltezéssel, Tom Lane írta: >>> [ worries about stray SIGALRM events ] >> Your reasoning seems to be correct. However, if we take it to the >> extreme, enable_timeout_at/enable_timeout_after/enable_timeouts >> should also disable the interrupt handler at the beginning of the >> function and enabled at the end with pqsignal(). An evil soul may >> send SIGALRM externally to the backend processes at the proper >> moment and create a race condition while enable_timeout*() is in >> progress and the itimer is about to trigger at the same time (but >> doesn't since it was disabled). > Well, a malefactor with the ability to send signals to a backend > process could also send SIGKILL, or any number of other signals that > would mess things up. I'm not too concerned about that scenario. > I *am* concerned about leftover timer events, both because this code > isn't very well tested, and because third-party code loaded into the > backend (think pl/perl for instance) could easily set up timer events > we weren't expecting. I see. > It suddenly occurs to me though that there's more than one way to skin > this cat. We could easily add another static flag variable called > "sigalrm_allowed" or some such, and have the signal handler test that > and immediately return without doing anything if it's off. Clearing > and setting such a variable would be a lot cheaper than an extra > setitimer call, as well as more robust since it would protect us from > all sources of SIGALRM not just ITIMER_REAL. Something like the attached patch? Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Boszormenyi Zoltan <zb@cybertec.at> writes: > 2013-03-17 16:07 keltez�ssel, Tom Lane �rta: >> It suddenly occurs to me though that there's more than one way to skin >> this cat. We could easily add another static flag variable called >> "sigalrm_allowed" or some such, and have the signal handler test that >> and immediately return without doing anything if it's off. Clearing >> and setting such a variable would be a lot cheaper than an extra >> setitimer call, as well as more robust since it would protect us from >> all sources of SIGALRM not just ITIMER_REAL. > Something like the attached patch? Well, something like that, but it still needed some improvements. In particular I thought it best to leave the signal handler still releasing the procLatch unconditionally --- that behavior is independent of what this module is doing. Also you seem to have some odd ideas about what do-while will accomplish. AFAIK, in this usage it's purely a syntactic trick without much semantic content. It's the marking of the variable as "volatile" that counts for telling the compiler not to re-order operations. Updated and committed. regards, tom lane
2013-03-18 03:52 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> 2013-03-17 16:07 keltezéssel, Tom Lane írta: >>> It suddenly occurs to me though that there's more than one way to skin >>> this cat. We could easily add another static flag variable called >>> "sigalrm_allowed" or some such, and have the signal handler test that >>> and immediately return without doing anything if it's off. Clearing >>> and setting such a variable would be a lot cheaper than an extra >>> setitimer call, as well as more robust since it would protect us from >>> all sources of SIGALRM not just ITIMER_REAL. >> Something like the attached patch? > Well, something like that, but it still needed some improvements. In > particular I thought it best to leave the signal handler still releasing > the procLatch unconditionally --- that behavior is independent of what > this module is doing. Also you seem to have some odd ideas about what > do-while will accomplish. AFAIK, in this usage it's purely a syntactic > trick without much semantic content. It's the marking of the variable > as "volatile" that counts for telling the compiler not to re-order > operations. The volatile marking shouldn't even be necessary there. The signal handler doesn't writes to it, only the main code. I just put it there saying "why not?" to myself. IIRC, volatile is needed if both the signal handler and the main code changes the same variable. I got the reordering idea from here: http://yarchive.net/comp/linux/compiler_barriers.html Thanks for committing, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2013-03-18 06:22 keltezéssel, Boszormenyi Zoltan írta: > 2013-03-18 03:52 keltezéssel, Tom Lane írta: >> Boszormenyi Zoltan <zb@cybertec.at> writes: >>> 2013-03-17 16:07 keltezéssel, Tom Lane írta: >>>> It suddenly occurs to me though that there's more than one way to skin >>>> this cat. We could easily add another static flag variable called >>>> "sigalrm_allowed" or some such, and have the signal handler test that >>>> and immediately return without doing anything if it's off. Clearing >>>> and setting such a variable would be a lot cheaper than an extra >>>> setitimer call, as well as more robust since it would protect us from >>>> all sources of SIGALRM not just ITIMER_REAL. >>> Something like the attached patch? >> Well, something like that, but it still needed some improvements. In >> particular I thought it best to leave the signal handler still releasing >> the procLatch unconditionally --- that behavior is independent of what >> this module is doing. Also you seem to have some odd ideas about what >> do-while will accomplish. AFAIK, in this usage it's purely a syntactic >> trick without much semantic content. It's the marking of the variable >> as "volatile" that counts for telling the compiler not to re-order >> operations. > > The volatile marking shouldn't even be necessary there. > The signal handler doesn't writes to it, only the main code. > I just put it there saying "why not?" to myself. > IIRC, volatile is needed if both the signal handler and the > main code changes the same variable. Also, since the the variable assignment doesn't depend on other code in the same function (or vice-versa) the compiler is still free to reorder it. Volatile is about not caching the variable in a CPU register since it's "volatile"... > > I got the reordering idea from here: > http://yarchive.net/comp/linux/compiler_barriers.html > > Thanks for committing, > Zoltán Böszörményi > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: >> The volatile marking shouldn't even be necessary there. >> The signal handler doesn't writes to it, only the main code. Well, (a) that's not the case in the patch as committed, and (b) even if it were true, the volatile marking is still *necessary*, because that's what tells the compiler it can't optimize away changes to the variable, say on the grounds of there being another store with no intervening fetches in the main-line code. Without that, a compiler that had aggressively inlined the smaller functions could well deduce that the disable_alarm() assignment was useless. > Also, since the the variable assignment doesn't depend on other code > in the same function (or vice-versa) the compiler is still free to > reorder it. > Volatile is about not caching the variable in a CPU register since > it's "volatile"... I don't believe you understand what volatile is about at all. Go read the C standard: it's about requiring objects' values to actually match the nominal program-specified values at sequence points. A more accurate heuristic is that volatile tells the compiler there may be other forces reading or writing the variable besides the ones it can see in the current function's source code, and so it can't drop or reorder accesses to the variable. regards, tom lane
2013-03-18 06:47 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >>> The volatile marking shouldn't even be necessary there. >>> The signal handler doesn't writes to it, only the main code. > Well, (a) that's not the case in the patch as committed, and (b) even if > it were true, the volatile marking is still *necessary*, because that's > what tells the compiler it can't optimize away changes to the variable, > say on the grounds of there being another store with no intervening > fetches in the main-line code. Without that, a compiler that had > aggressively inlined the smaller functions could well deduce that the > disable_alarm() assignment was useless. > >> Also, since the the variable assignment doesn't depend on other code >> in the same function (or vice-versa) the compiler is still free to >> reorder it. >> Volatile is about not caching the variable in a CPU register since >> it's "volatile"... > I don't believe you understand what volatile is about at all. Go read > the C standard: it's about requiring objects' values to actually match > the nominal program-specified values at sequence points. A more > accurate heuristic is that volatile tells the compiler there may be > other forces reading or writing the variable besides the ones it can see > in the current function's source code, and so it can't drop or reorder > accesses to the variable. > > regards, tom lane After reading up on a lot of volatile and memory barriers, I am still not totally convinced. For the record, sig_atomit_t is a plain int without any special treatment from the compiler. It's atomic by nature on every 32-bit and 64-bit CPU. How about the attached patch over current GIT? In other words, why I am wrong with this idea? The problem that may arise if it's wrong is that transactions are left waiting for the lock when the interrupt triggers and the variable is still seen as false from the interrupt handler. But this doesn't happen. FWIW, this small patch seems to give a 0,7 percent performance boost for my settings: shared_buffers = 256MB work_mem = 8MB effective_io_concurrency = 8 wal_level = hot_standby checkpoint_segments = 64 random_page_cost = 1.8 Everything else is the default. I tested the patch on a 8-core AMD FX-8120 CPU with this pgbench script below. Basically, it's the default transaction prepended with "SET lock_timeout = 1;" I have used the attached quick-and-dirty patch to pgbench to ignore SQL errors coming from statements. "-s 100" was used to initialize the test database. \set nbranches 1 * :scale \set ntellers 10 * :scale \set naccounts 100000 * :scale \setrandom aid 1 :naccounts \setrandom bid 1 :nbranches \setrandom tid 1 :ntellers \setrandom delta -5000 5000 BEGIN; SET lock_timeout = 1; UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); END; Results of "pgbench -c 32 -j 32 -t 10000 -e -f script.sql" are for the GIT version: tps = 3366.844023 (including connections establishing) tps = 3367.645454 (excluding connections establishing) tps = 3379.784707 (including connections establishing) tps = 3380.622317 (excluding connections establishing) tps = 3385.198238 (including connections establishing) tps = 3386.132433 (excluding connections establishing) and with the barrier patch: tps = 3412.799044 (including connections establishing) tps = 3413.670832 (excluding connections establishing) tps = 3389.796287 (including connections establishing) tps = 3390.602187 (excluding connections establishing) tps = 3405.924548 (including connections establishing) tps = 3406.726997 (excluding connections establishing) Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Boszormenyi Zoltan <zb@cybertec.at> writes: > How about the attached patch over current GIT? In other words, > why I am wrong with this idea? Because it's wrong. Removing "volatile" means that the compiler is permitted to optimize away stores (and fetches!) on the basis of their being unnecessary according to straight-line analysis of the code. Write barriers don't fix that, they only say that stores that the compiler chooses to issue at all have to be ordered a certain way. (There are also pretty serious questions as to whether pg_write_barrier can be trusted yet, but it doesn't really matter here. Removing volatile would break the code.) regards, tom lane
On Mon, Mar 18, 2013 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> How about the attached patch over current GIT? In other words, >> why I am wrong with this idea? > > Because it's wrong. Removing "volatile" means that the compiler is > permitted to optimize away stores (and fetches!) on the basis of their > being unnecessary according to straight-line analysis of the code. > Write barriers don't fix that, they only say that stores that the > compiler chooses to issue at all have to be ordered a certain way. I don't think this is correct. The read and write barriers as implemented are designed to function as compiler barriers also, just as they do in the Linux kernel and every other piece of software I've found that implements anything remotely like this, with the lone exception of PostgreSQL. In PostgreSQL, spinlock acquisition and release are defined as CPU barriers but not a compiler barrier, and this necessitates extensive use of volatile all over the code base which would be unnecessary if we did this the way it's done in Linux and elsewhere. However, Zoltan's patch probably isn't right either. First, a write barrier isn't ever the correct solution when there's only one process involved - which is the case here, because the relevant variables are in backend-private memory. A compiler barrier - which is generally far cheaper - might be the right thing, though. However, the position of the barriers in his proposed patch is suspect. As implemented, he's proposing to force each change to alarm_enabled to be scheduled by the compiler before any other writes to memory are completed. But there's no guard against the compiler sliding the change backward, only forward. Now maybe that doesn't matter, if we're only concerned about the timing of updates of alarm_enabled relative to other updates of that same variable. But if there are multiple variables involved, then it matters. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 18, 2013 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Because it's wrong. Removing "volatile" means that the compiler is >> permitted to optimize away stores (and fetches!) on the basis of their >> being unnecessary according to straight-line analysis of the code. >> Write barriers don't fix that, they only say that stores that the >> compiler chooses to issue at all have to be ordered a certain way. > I don't think this is correct. The read and write barriers as > implemented are designed to function as compiler barriers also, just > as they do in the Linux kernel and every other piece of software I've > found that implements anything remotely like this, with the lone > exception of PostgreSQL. In PostgreSQL, spinlock acquisition and > release are defined as CPU barriers but not a compiler barrier, and > this necessitates extensive use of volatile all over the code base > which would be unnecessary if we did this the way it's done in Linux > and elsewhere. I think you're just as mistaken as Zoltan. Barriers enforce ordering of operations, not whether an operation occurs at all. regards, tom lane
On Thu, Mar 21, 2013 at 8:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Mar 18, 2013 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Because it's wrong. Removing "volatile" means that the compiler is >>> permitted to optimize away stores (and fetches!) on the basis of their >>> being unnecessary according to straight-line analysis of the code. >>> Write barriers don't fix that, they only say that stores that the >>> compiler chooses to issue at all have to be ordered a certain way. > >> I don't think this is correct. The read and write barriers as >> implemented are designed to function as compiler barriers also, just >> as they do in the Linux kernel and every other piece of software I've >> found that implements anything remotely like this, with the lone >> exception of PostgreSQL. In PostgreSQL, spinlock acquisition and >> release are defined as CPU barriers but not a compiler barrier, and >> this necessitates extensive use of volatile all over the code base >> which would be unnecessary if we did this the way it's done in Linux >> and elsewhere. > > I think you're just as mistaken as Zoltan. Barriers enforce ordering > of operations, not whether an operation occurs at all. Surely not. Suppose the user does this: some_global_var = 1; some_function(); some_global_var = 2; I hope we can both agree that any compiler which thinks it doesn't need to store 1 in some_global_var is completely nuts, because some_function() might perform any arbitrary computation, including one that depends on some_global_var being 1 rather than whatever value it had before that. Of course, if a global optimizer can prove that some_function() can't do anything that can possibly care about some_global_var, then the store could be omitted, but not otherwise. Should the compiler omit the store and should there then be a bug in the program, we wouldn't say "oh, some_global_var ought to be declared volatile". We would say "that compiler is buggy". Now, conversely, in this situation, it seems to me (and I think you'll agree) that the compiler could be forgiven for omitting one of the stores: some_global_var = 1; local_var = 42; some_global_var = 2; If we declare some_global_var as volatile, it will force the compiler to perform both stores regardless of what the optimizer thinks, and moreover, the second store is required to happen after the first one, because the definition of volatile is that volatile references are globally sequenced with respect TO EACH OTHER. However, the store to local_var could be moved around with respect to the other two or omitted altogether unless local_var is also declared volatile. Now, consider this: some_global_var = 1; pg_compiler_barrier(); /* or in Linux, barrier() */ some_global_var = 2; The compiler barrier is exactly equivalent to the unknown function in the first example, except that no function is actually called. The semantics are precisely that the compiler must assume that, at the point the barrier intervenes, an unknown operation will occur which may depend on the contents of any word in memory and which may modify any word in memory. Thus, the compiler may not postpone stores requested before the barrier until after the barrier on the grounds that the values will be overwritten after the barrier; and if any global variables have been loaded into registers before the barrier it must be assumed that, after the barrier, the registers may no longer match those global variables. It is true that any barrier, including a compiler barrier, serves only to separate instructions. But I don't believe it follows in any way that the barrier therefore prohibits reordering operations but not omitting them altogether. Such a definition would have no practical utility. The compiler is not free to willy-nilly leave out things that the user asks it to do any more than it is free to willy-nilly reorder them. If it were, programming would be chaos, and everything would have to be volatile. What the compiler is free to do is to reorder and omit instructions where the programmer won't, in a single-thread execution context, be able to notice the difference. And a compiler barrier is an explicit notification that, in essence, the programmer will notice if things are not just the way he wrote them when that point in the code is reached. To see the difference between this and volatile, consider the following: a = 1; b = 1; c = 1; a = 2; b = 2; c = 2; Absent any special precautions, the compiler may well optimize the first set of stores away completely and perform the second set in any order it likes. If we make all the variables volatile, it will do all 6 stores in precisely the order specified, omitting nothing and reordering nothing. If we instead stick a compiler barrier just after the assignment "c = 1", then the compiler must store 1 in all three variables (in any order that it likes), and then store 2 in all three variables (in any order that it likes). The "barrier" essentially divides up the code into chunks and requires that those chunks be optimized independently by the compiler without knowledge of what earlier or later chunks are doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 22, 2013 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The "barrier" essentially > divides up the code into chunks and requires that those chunks be > optimized independently by the compiler without knowledge of what > earlier or later chunks are doing While all this sounds sensible I would love to see a gcc programmer or llvm programmer actually comment on what they think volatile does and what they want to implement in the compiler. I'm a bit worried that we're making assumptions like "things happen in a specific order" that aren't really justified. In these days of superscalar execution and multi-level caches things may be weirder than we're imagining. -- greg