Thread: Remaining 'needs review' patchs in July commitfest
21 patches remain in Needs Review state, in the July commitfest. Some of them have a reviewer signed up. I have highlighted some of them below that worry me the most. What are we going to do about these? For each of them, I'd like the authors to have some idea on what they need to do to get the patch into committable state (or if the whole approach is going to be rejected), but I don't know what that advise should be. > pgbench - allow backslash-continuations in custom scripts Everyone wants the feature, using multi-line SELECTs in pgbench scripts, but we don't seem to be reaching a consensus on how it should work. I think we'll need to integrate the lexer, but it would be nice to still support multi-statements as well, with some syntax. > multivariate statistics This has been a long discussion. Are we getting close to a committable state? > COPY RAW No consensus on whether to add this to the server's COPY command, or as a new psql backslash option. > Unique Joins Kyotaro HORIGUCHI commented on this back in March, but no-one's reviewed the latest version. It's a fairly big patch. I guess I'll review it if no-one else does, but it's going to take me some time to really dig into it... > checkpoint continuous flushing This does a big memory allocation at checkpoint, which Tom vehemently objects to. I don't much like it either, although I would be OK with a more moderately-sized allocation. It's not clear on what criteria this should be accepted or rejected. What workloads need to be tested? > plpgsql raise statement with context Impasse. Everyone wants this feature in some form, but no consensus on whether to do this client-side or server-side. > Configurable location for extension .control files Do we want this? In its current form? I feel that this isn't ready as it is, but I'm not sure what to suggest instead. > dblink: add polymorphic result functions Seems pretty ugly to me to add a dummy argument to functions, just so that you can specify the result type. The problem it's trying to solve is real, though. Should we take it as it is, or wait for some cleaner approach? > Improving test coverage of extensions with pg_dump Do we want to have this in src/test/modules or src/bin/pg_dump/t? > Asynchronous execution on postgres-fdw If we're going to have some sort of general-purpose Parallel node support soon, should we just forget about this? Or is it still useful? This adds a fair amount of new infrastructure, for a fairly niche feature.. - Heikki -- - Heikki
On 07/28/2015 12:51 PM, Heikki Linnakangas wrote: > Everyone wants the feature, using multi-line SELECTs in pgbench scripts, > but we don't seem to be reaching a consensus on how it should work. I > think we'll need to integrate the lexer, but it would be nice to still > support multi-statements as well, with some syntax. Seems like a perfect candidate for "returned with feedback". We'll straighten it out by the next commitfest. Also note that this has actually spiralled out into 2 or 3 separate features. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2015-07-28 22:51:55 +0300, Heikki Linnakangas wrote: > >checkpoint continuous flushing > > This does a big memory allocation at checkpoint, which Tom vehemently > objects to. Uh. Didn't he just object to failing in that case? IIRC he even indicated tentative assent, a year or so back, with my idea of just pre-allocating all the required memory in shared memory, so that we don't need to allocate anything at some point. I've not read the last version of the patch, but in my old version the allocation wasn't actually that large in comparison to the size of shared buffers itself. > I don't much like it either, although I would be OK with a more > moderately-sized allocation. That'll disallow some mighty fine optimizations like e.g. being able to do the fsyncs of files early, directly after we wrote out all their buffer, thereby reducing how much dirty data (written out by backend) that needs to be flushed to disk. Greetings, Andres Freund
Heikki Linnakangas wrote: > 21 patches remain in Needs Review state, in the July commitfest. Some of > them have a reviewer signed up. I have highlighted some of them below that > worry me the most. What are we going to do about these? For each of them, > I'd like the authors to have some idea on what they need to do to get the > patch into committable state (or if the whole approach is going to be > rejected), but I don't know what that advise should be. > > >pgbench - allow backslash-continuations in custom scripts > > Everyone wants the feature, using multi-line SELECTs in pgbench scripts, but > we don't seem to be reaching a consensus on how it should work. I think > we'll need to integrate the lexer, but it would be nice to still support > multi-statements as well, with some syntax. Excuse me -- what's a multi-statement? I thought this effort was all about multi-line single statements only. I think the proposed patch to integrate psql's lexer in pgbench is a reasonable step forward, but we need it to use our standard technique of symlinking the .l file in place via some additional lines in pgbench's Makefile rather than copying it. > >dblink: add polymorphic result functions > > Seems pretty ugly to me to add a dummy argument to functions, just so that > you can specify the result type. The problem it's trying to solve is real, > though. Should we take it as it is, or wait for some cleaner approach? Put like that, it does sound quite ugly. I take it we have no better alternative proposed? > >Improving test coverage of extensions with pg_dump > > Do we want to have this in src/test/modules or src/bin/pg_dump/t? Are we testing pg_dump here, or are we testing extensions? If the former, src/bin/pg_dump/t seems best. > >Asynchronous execution on postgres-fdw > > If we're going to have some sort of general-purpose Parallel node support > soon, should we just forget about this? Or is it still useful? This adds a > fair amount of new infrastructure, for a fairly niche feature.. -0.1 on this one. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/28/2015 11:01 PM, Alvaro Herrera wrote: > Heikki Linnakangas wrote: >>> pgbench - allow backslash-continuations in custom scripts >> >> Everyone wants the feature, using multi-line SELECTs in pgbench scripts, but >> we don't seem to be reaching a consensus on how it should work. I think >> we'll need to integrate the lexer, but it would be nice to still support >> multi-statements as well, with some syntax. > > Excuse me -- what's a multi-statement? Sending "SELECT 1; SELECT 2;" to the server in one Query message. I.e. PQexec(conn, "SELECT 1; SELECT 2;"). If you put that on a single line in pgbench today, it will send it as one message to the server. If we start using a lexer, and split that into two, that's a change in behaviour. Not the end of the world, but it would still be nice to be able to use multi-statements in pgbench, too. >>> dblink: add polymorphic result functions >> >> Seems pretty ugly to me to add a dummy argument to functions, just so that >> you can specify the result type. The problem it's trying to solve is real, >> though. Should we take it as it is, or wait for some cleaner approach? > > Put like that, it does sound quite ugly. I take it we have no better > alternative proposed? Joe Conway suggested a more generic approach here: http://www.postgresql.org/message-id/559A9643.9070409@joeconway.com. I'm not sure why that was not pursued. It certainly seems better to me. >>> Improving test coverage of extensions with pg_dump >> >> Do we want to have this in src/test/modules or src/bin/pg_dump/t? > > Are we testing pg_dump here, or are we testing extensions? If the > former, src/bin/pg_dump/t seems best. I think that's the crux of the disagreement :-). - Heikki
On 28 July 2015 at 20:51, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
--
multivariate statistics
This has been a long discussion. Are we getting close to a committable state?
This is important, but big.
COPY RAW
No consensus on whether to add this to the server's COPY command, or as a new psql backslash option.
I think there is something salvageable there.
I've added my name as committer to a few things, but won't be able to work on them until at least next week when I've finished 9.5 stuff. Happy to step back if anyone else wants to claim those.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/28/2015 11:30 PM, Simon Riggs wrote: > I've added my name as committer to a few things, but won't be able to work > on them until at least next week when I've finished 9.5 stuff. Happy to > step back if anyone else wants to claim those. Thanks, every little helps! - Heikki
Andres Freund <andres@anarazel.de> writes: > On 2015-07-28 22:51:55 +0300, Heikki Linnakangas wrote: >>> checkpoint continuous flushing >> This does a big memory allocation at checkpoint, which Tom vehemently >> objects to. > Uh. Didn't he just object to failing in that case? Right. If it can fall back to "stupid" mode when there's not spare memory, I'd be ok with that. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/28/2015 01:22 PM, Heikki Linnakangas wrote: >>>> dblink: add polymorphic result functions >>> >>> Seems pretty ugly to me to add a dummy argument to functions, >>> just so that you can specify the result type. The problem it's >>> trying to solve is real, though. Should we take it as it is, or >>> wait for some cleaner approach? >> >> Put like that, it does sound quite ugly. I take it we have no >> better alternative proposed? > > Joe Conway suggested a more generic approach here: > http://www.postgresql.org/message-id/559A9643.9070409@joeconway.com. > I'm not sure why that was not pursued. It certainly seems better to > me. I really think the second approach in that email is the right way to go. Is there any benefit we get from the dblink specific hack that we don't get via the cast? I guess it is just be a matter of someone getting around to making it happen. I can put it on my list of things to do, but probably will not get to looking at it for at least a few more weeks. - -- Joe Conway -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVt+xpAAoJEDfy90M199hlIM8P/iRq3nZo6Q/u14D9Vqvd8qZ1 cPaBgGoHvCCuM3rS/z0DobBMjKll0oEerjXeuBKTjWBwilPrdHMZahueFZ1896Mh atcj7kz8O3nBj27sBAHI1tXCTPRkjOIpt6nNu8BEsG8mUmX8wC2odfyGFZIKzw2j dtLGsczSMwrXnE1yTzgCYWVl0ej8y/CtAmL6uEQOIA5XDHmYe6EqqPd3rcNBNg+I endHrAXGm/VVsKGUJfisWItVbPJy4VBx1y1iXfApxePceHjW6/U0Ivds5Qu+K3wd 0pwsBmqlHNBGzwril1jBpm9gecIYY6916Hbru7Fafae4bbFHRTDw0ExURybrlhGo MjOhWmfpsCutGmBktgWxqiOY9UDnG1+aXREKIv7bIavPhBYbBRkvqT7qrAKureex y2XZz86tMV2bUZxRyp6tFo8aL5hheV7UoaPKGlOxfgUNFv3U2uRUopVPH6uR1Z80 +JQoC2hDadI3wQ6/2egL62pQR3l/Ts8zF0sCFEGAZVrFklEneHsUS7PAbqIfY+TI Ffc5q8Vm0QplWXr5vvxL1JKepXGgKSidMEnylAwIxKAnPtXyosPe1sZ+ocA+3gbP Uj/bElWemX8b8YF/F1mecHqoLAvJ+3NKUdwS805VmdCLO1nxictKEOHAJFeQomif OQquxdkcGtA9dmCt8JJ9 =SaUZ -----END PGP SIGNATURE-----
Hi, On 07/28/2015 09:51 PM, Heikki Linnakangas wrote: > >> multivariate statistics > > This has been a long discussion. Are we getting close to a committable > state? Certainly not - the discussion may seem long, but it only deals with some aspects of the patch so far. There was very little discussion of low-level implementation details (e.g. how exactly the histograms are built, data structures etc.) - we haven't got that far yet. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 29, 2015 at 5:22 AM, Heikki Linnakangas wrote: > On 07/28/2015 11:01 PM, Alvaro Herrera wrote: >>>> Improving test coverage of extensions with pg_dump >>> >>> >>> Do we want to have this in src/test/modules or src/bin/pg_dump/t? >> >> >> Are we testing pg_dump here, or are we testing extensions? If the >> former, src/bin/pg_dump/t seems best. All the tests are using pg_dump, but it is testing dumpable tables in an extension. At this point I am not sure which one is better honestly X/. Putting it in pg_dump/t will require two lines in the make target prove_check such as modules in this path are installed as well, and the argument against having it in src/test/modules is that it would bloat it in the future if we do the same for all binaries, particularly if we have multiple modules for each one. > I think that's the crux of the disagreement :-). Yep. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Wed, Jul 29, 2015 at 5:22 AM, Heikki Linnakangas wrote: >> On 07/28/2015 11:01 PM, Alvaro Herrera wrote: >>> Do we want to have this in src/test/modules or src/bin/pg_dump/t? >> Are we testing pg_dump here, or are we testing extensions? If the >> former, src/bin/pg_dump/t seems best. > All the tests are using pg_dump, but it is testing dumpable tables in > an extension. At this point I am not sure which one is better honestly > X/. ISTM we're testing pg_dump, but I don't especially want to clutter the pg_dump directory with the code for a dummy extension, so I'd vote for putting the extension code under src/test/modules. (Or you could hide it inside src/bin/pg_dump/t/, but that seems pretty weird too.) In the end though, this is the sort of thing we generally leave to the discretion of the responsible committer. I won't whinge too hard about whatever the decision is ... regards, tom lane
2015-07-28 21:51 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
21 patches remain in Needs Review state, in the July commitfest. Some of them have a reviewer signed up. I have highlighted some of them below that worry me the most. What are we going to do about these? For each of them, I'd like the authors to have some idea on what they need to do to get the patch into committable state (or if the whole approach is going to be rejected), but I don't know what that advise should be.pgbench - allow backslash-continuations in custom scripts
Everyone wants the feature, using multi-line SELECTs in pgbench scripts, but we don't seem to be reaching a consensus on how it should work. I think we'll need to integrate the lexer, but it would be nice to still support multi-statements as well, with some syntax.multivariate statistics
This has been a long discussion. Are we getting close to a committable state?COPY RAW
No consensus on whether to add this to the server's COPY command, or as a new psql backslash option.
the psql backslash option based patches was years old proposals - and years ago it was rejected. There is not any advantage of psql based solution. COPY based solution is simple and use current infrastructure and it works for input/output. For psql based solution we have to create infrastructure for import from zero. Some years ago I proposed to teach psql parametrized queries - this proposal was rejected.
Unique Joins
Kyotaro HORIGUCHI commented on this back in March, but no-one's reviewed the latest version. It's a fairly big patch. I guess I'll review it if no-one else does, but it's going to take me some time to really dig into it...checkpoint continuous flushing
This does a big memory allocation at checkpoint, which Tom vehemently objects to. I don't much like it either, although I would be OK with a more moderately-sized allocation. It's not clear on what criteria this should be accepted or rejected. What workloads need to be tested?plpgsql raise statement with context
Impasse. Everyone wants this feature in some form, but no consensus on whether to do this client-side or server-side.
I sent two patches as example of two possible ways. The unfortunate thing is fact, so disagreement is on relative not important question. There are a precedents for both implementations - for first way - we have client_min_messages and log_min_messages, for second way we have VERBOSE levels in psql. At the end I don't think so there are any bigger difference for any user if we use one or second implementation.
Configurable location for extension .control files
Do we want this? In its current form? I feel that this isn't ready as it is, but I'm not sure what to suggest instead.dblink: add polymorphic result functions
Seems pretty ugly to me to add a dummy argument to functions, just so that you can specify the result type. The problem it's trying to solve is real, though. Should we take it as it is, or wait for some cleaner approach?Improving test coverage of extensions with pg_dump
Do we want to have this in src/test/modules or src/bin/pg_dump/t?Asynchronous execution on postgres-fdw
If we're going to have some sort of general-purpose Parallel node support soon, should we just forget about this? Or is it still useful? This adds a fair amount of new infrastructure, for a fairly niche feature..
- Heikki
--
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, Heikki. You wrote: HL> 21 patches remain in Needs Review state, in the July commitfest. Some of HL> them have a reviewer signed up. I have highlighted some of them below HL> that worry me the most. What are we going to do about these? For each of HL> them, I'd like the authors to have some idea on what they need to do to HL> get the patch into committable state (or if the whole approach is going HL> to be rejected), but I don't know what that advise should be. >> COPY RAW HL> No consensus on whether to add this to the server's COPY command, or as HL> a new psql backslash option. I did quick review for this, however I need some more time for tests. May be I will do it next week. There is a consensus, only detailed review needed. HL> -- HL> - Heikki -- With best wishes,Pavel mailto:pavel@gf.microolap.com
Hello Heikki, About two patches I submitted: >> pgbench - allow backslash-continuations in custom scripts > > Everyone wants the feature, using multi-line SELECTs in pgbench scripts, > but we don't seem to be reaching a consensus on how it should work. I > think we'll need to integrate the lexer, but it would be nice to still > support multi-statements as well, with some syntax. I was willing to do a small job of that one, but it drifted to solving the meta problem of pgbench lexing for doing it "the right way". Kyotaro HORIGUCHI has taken up the challenge and submitted a 3-part patch to modify psql lexer, then reuse it "as is" in pgbench. I'm impressed:-) I think that the approach is a little overkill for the simple targetted feature, and I'm not sure that it allows for multi-statements, but I have not checked. Anyway I may try to review it, although not in the short term. I'm not too optimistic because if passed, it means that the psql lexer would be used in 2 contexts, so changes in any of them would require ensuring that it does not break anything in the other one, and I'm not sure that it is such constraint is desirable. Duplicating the lexer is not a realistic option either, I do not think that pgbench is worth it. Anyway, in the mean time, the patch may be switched to "returned with feedback" or "rejected". >> checkpoint continuous flushing > > This does a big memory allocation at checkpoint, which Tom vehemently > objects to. I don't much like it either, although I would be OK with a > more moderately-sized allocation. AFAICS Tom has not expressed any view on the patch in its current form (there is no message from Tom on the thread). ISTM that Tom complained about a year ago about OOM risks on a 2007 version of the sorting part by Takahiro ITAGAKI which was dynamically allocating and freeing about 24 bytes per buffers on each checkpoint. The current v5 version uses 4 bytes per buffer at the first run and reuse the memory, it is allocated once and thus there is no risk of returning it and failing to get it back, so no significant OOM risk. Maybe the allocation should be moved earlier when starting the checkpointer process, though. A second objection a year ago from Tom was about proof of performance gains. I've spent quite some time to collect a lot of data to measure benefits under different loads, representing X00 hours of pgbench runs, as can be seen in the thread. ISTM that the performance (tps & latency) figures, for instance: http://www.postgresql.org/message-id/raw/alpine.DEB.2.10.1506170803210.9794@sto are overwhelmingly in favor of the patch. If the memory requirement of 4 bytes per buffer is still too much, it is eventually possible to reduce it with a guc to specify the allowed amount of memory and some shuffling in the code to do things by chunk. My opinion is that 4 bytes per buffer is reasonable enough given the measured benefits. Also, there is more benefits if the whole checkpoint is sorted instead of just part of it. I'm really willing to improve the write stall issues which freezes postgresql when checkpointing on HDD, so if it has to be chunked because this is a blocker I'll make the effort, but I do not think that it is useful as such. > It's not clear on what criteria this should be accepted or rejected. Given the overall performance gains and reduction in latency, where a lot of write stalls are avoided or at least greatly reduced, I would be sad for pg to get it rejected. That does not mean that it cannot be improved. > What workloads need to be tested? If you tell me, and provide the matching dedicated host for testing, I can run tests... -- Fabien.
On Tue, Jul 28, 2015 at 3:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> plpgsql raise statement with context > > Impasse. Everyone wants this feature in some form, but no consensus on > whether to do this client-side or server-side. +1 for server-side. Does anyone other than you even think that the client side is a reasonable way to go? >> dblink: add polymorphic result functions > > > Seems pretty ugly to me to add a dummy argument to functions, just so that > you can specify the result type. The problem it's trying to solve is real, > though. Should we take it as it is, or wait for some cleaner approach? +1 for take it. If we wait for something better, we may be waiting a long time. >> Asynchronous execution on postgres-fdw > > If we're going to have some sort of general-purpose Parallel node support > soon, should we just forget about this? Or is it still useful? This adds a > fair amount of new infrastructure, for a fairly niche feature.. IMHO, that's an extremely important feature. I believe that it will not be obsoleted by other parallelism work. Actually, I think that some of the infrastructure that it introduces may be quite helpful for parallelism as well, but I haven't looked at in detail yet. The core design concern that I have is what to do about this: Append -> Scan -> Scan -> Scan Right now, we start each scan when the previous scan finishes. But what if - either through parallelism or through this patch - we could launch scans 2, 3, etc. before scan 1 completes? If there's a Limit node above this somewhere we may regret our decision bitterly. But if there isn't, we may do an enormous amount of good by starting that plan early on the speculation that we will in fact need the results eventually. I haven't applied a whole lot of brainpower to this problem yet; I'm sort of hoping someone else will have a good idea. Kyotaro Horiguchi's approach seems to be to say that we don't need to solve this problem at this stage and that we should just not worry about the possible waste of resources on the remote machine for now; fix it later, as part of a future patch. I'm not terribly confident in that approach, but I don't currently have anything better to propose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 28, 2015 at 3:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> plpgsql raise statement with context >> Impasse. Everyone wants this feature in some form, but no consensus on >> whether to do this client-side or server-side. > +1 for server-side. Does anyone other than you even think that the > client side is a reasonable way to go? Yes. This is presupposing on the server side what the client will want to display. >>> dblink: add polymorphic result functions >> Seems pretty ugly to me to add a dummy argument to functions, just so that >> you can specify the result type. The problem it's trying to solve is real, >> though. Should we take it as it is, or wait for some cleaner approach? > +1 for take it. If we wait for something better, we may be waiting a long time. -1. We have a clear design spec for a solution that fixes this for much more than just dblink. I don't feel a need to paste on an ugly, single-use band-aid in such a hurry. If we get close to 9.6 and there is no better fix, we could reconsider; but now is not the time to give up on pursuing the better solution. >>> Asynchronous execution on postgres-fdw >> If we're going to have some sort of general-purpose Parallel node support >> soon, should we just forget about this? Or is it still useful? This adds a >> fair amount of new infrastructure, for a fairly niche feature.. > IMHO, that's an extremely important feature. I believe that it will > not be obsoleted by other parallelism work. Again, this seems like rushing through a single-use band-aid while work is still active on more general solutions. I think we should put this off and see whether it makes sense *after* the general parallel query stuff is (more nearly) functional. Even if it still makes sense then, it may well need a rewrite. > Actually, I think that > some of the infrastructure that it introduces may be quite helpful for > parallelism as well, but I haven't looked at in detail yet. Then steal that stuff as and when you need it. regards, tom lane
On Wed, Jul 29, 2015 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jul 28, 2015 at 3:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>>> plpgsql raise statement with context >>> Impasse. Everyone wants this feature in some form, but no consensus on >>> whether to do this client-side or server-side. > >> +1 for server-side. Does anyone other than you even think that the >> client side is a reasonable way to go? > > Yes. This is presupposing on the server side what the client will want > to display. Fair enough. I'm still not convinced we're doing anything other than complicating what ought to be a simple matter. It is just a fact that logging tracing messages in PL/pgsql functions is a pain in the butt right now in some situations because you get a huge number of CONTEXT lines that you don't want. Can we agree on some solution to that problem without over-engineering this to infinity? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company