Thread: pgbench - allow backslash-continuations in custom scripts
Add backslash continuations to pgbench custom scripts. The benefit of this approach is that it is upward compatible, and it is also pretty simple to implement. The downside is that backslash continuation is not the best syntax ever invented, but then you do not have to use it if you do not like it. The alternative would be to make semi-colon a mandatory end-of-line marker, which would introduce an incompatibility and requires more efforts to implement, including some kind of SQL-compatible lexer. IMHO this approach is the best compromise. -- Fabien.
On 05/14/2015 12:10 PM, Fabien COELHO wrote: > > Add backslash continuations to pgbench custom scripts. > > The benefit of this approach is that it is upward compatible, and it is > also pretty simple to implement. The downside is that backslash > continuation is not the best syntax ever invented, but then you do not > have to use it if you do not like it. > > The alternative would be to make semi-colon a mandatory end-of-line > marker, which would introduce an incompatibility and requires more > efforts to implement, including some kind of SQL-compatible lexer. > > IMHO this approach is the best compromise. I don't personally agree. I believe that it it worth breaking backwards compatibility to support line breaks in pgbench statements, and that if we're not going to do that, supporting \ continuations is of little value. As someone who actively uses pgbench to write custom benchmarks, I need to write queries which I can test. \ continuation does NOT work on the psql command line, so that's useless for testing my queries; I still have to reformat and troubleshoot. If we added \ continuation, I wouldn't use it. I think we should support line breaks, and require semicolons for end-of-statement. Backwards-compatability in custom pgbench scripts is not critical; pgbench scripts are neither used in produciton, nor used in automated systems much that I know of. I'm not clear on why we'd need a full SQL lexer. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 05/14/2015 12:10 PM, Fabien COELHO wrote: >> Add backslash continuations to pgbench custom scripts. > I don't personally agree. I believe that it it worth breaking backwards > compatibility to support line breaks in pgbench statements, and that if > we're not going to do that, supporting \ continuations is of little value. I tend to agree on that bottom line; having this be inconsistent with psql does not seem like a win. > I'm not clear on why we'd need a full SQL lexer. So you don't get fooled by semicolons embedded in string literals or comments. regards, tom lane
On 06/19/2015 02:51 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> On 05/14/2015 12:10 PM, Fabien COELHO wrote: >>> Add backslash continuations to pgbench custom scripts. > >> I don't personally agree. I believe that it it worth breaking backwards >> compatibility to support line breaks in pgbench statements, and that if >> we're not going to do that, supporting \ continuations is of little value. > > I tend to agree on that bottom line; having this be inconsistent with psql > does not seem like a win. > >> I'm not clear on why we'd need a full SQL lexer. > > So you don't get fooled by semicolons embedded in string literals or > comments. I take it we ignore those now? I mean, personally, it wouldn't break anything for me but since some other benhcmarks involve random text generators .... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
>> I tend to agree on that bottom line; having this be inconsistent with psql >> does not seem like a win. >> >>> I'm not clear on why we'd need a full SQL lexer. >> >> So you don't get fooled by semicolons embedded in string literals or >> comments. > > I take it we ignore those now? I mean, personally, it wouldn't break > anything for me but since some other benhcmarks involve random text > generators .... If backward compatibility is not an issue (I'm surprised:-), and failure is acceptable in contrived cases, a simple implementation would be to accumulate lines till one ends with ";\s*$", Otherwise maybe the "states" management or the lexer are enough (in simple quotes, in double quotes, in comment, in stuff), so this can implemented without actually requiring another lexer in pgbench and be robust. -- Fabien.
Hello Josh, >> Add backslash continuations to pgbench custom scripts. >> [...] >> IMHO this approach is the best compromise. > > I don't personally agree. I believe that it it worth breaking backwards > compatibility to support line breaks in pgbench statements, and that if > we're not going to do that, supporting \ continuations is of little value. > > As someone who actively uses pgbench to write custom benchmarks, I need > to write queries which I can test. \ continuation does NOT work on the > psql command line, so that's useless for testing my queries; I still > have to reformat and troubleshoot. If we added \ continuation, I > wouldn't use it. > > I think we should support line breaks, and require semicolons for > end-of-statement. Backwards-compatability in custom pgbench scripts is > not critical; pgbench scripts are neither used in produciton, nor used > in automated systems much that I know of. > > I'm not clear on why we'd need a full SQL lexer. Attached is a "without lexer" version which does ;-terminated SQL commands and \-continuated meta commands (may be useful for \shell and long \set expressions). Also attached is a small pgbench script to test the feature. Without a lexer it is possible to fool pgbench with somehow contrived examples, say with: SELECT 'hello; world'; The ";" within the string will be considered as end-of-line. Also, comments intermixed with sql on the same line would generate errors. SELECT 1 -- one + 3; Would fail, but comments on lines of their own are ok. It may be argued that these are not a likely scripts and that this behavior could be declared as a "feature" for keeping the code simple. ISTM that it would be an overall improvement, but also the ;-termination requirement breaks backward compatibility. -- Fabien.
Fabien, > Without a lexer it is possible to fool pgbench with somehow contrived > examples, say with: > > SELECT 'hello; > world'; > > The ";" within the string will be considered as end-of-line. > > Also, comments intermixed with sql on the same line would generate errors. > > SELECT 1 -- one > + 3; > > Would fail, but comments on lines of their own are ok. > > It may be argued that these are not a likely scripts and that this > behavior could be declared as a "feature" for keeping the code simple. Yeah, these seem pretty contrived. I would personally be OK with breaking them. > ISTM that it would be an overall improvement, but also the ;-termination > requirement breaks backward compatibility. Look, how many people in the world develop their own pgbench scripts? And how many of those aren't on this list right now, reading this thread? I expect I could count them on my fingers and toes. Backwards-compatability for pgdump, pg_basebackup, initdb, etc. matters.The worst case with pgbench is that we break twopeople's test scripts, they read the release notes, and fix them. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> The worst case with pgbench is that we break two people's test scripts, > they read the release notes, and fix them. Sure, I agree that breaking pgbench custom scripts compatibility is no big deal, and having pgbench consistent with psql is useful. -- Fabien.
On 06/21/2015 01:37 PM, Fabien COELHO wrote: > >> The worst case with pgbench is that we break two people's test scripts, >> they read the release notes, and fix them. > > Sure, I agree that breaking pgbench custom scripts compatibility is no > big deal, and having pgbench consistent with psql is useful. ... apparently nobody disagrees ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> Look, how many people in the world develop their own pgbench scripts? > And how many of those aren't on this list right now, reading this > thread? I expect I could count them on my fingers and toes. I'm not against you break the backward compatibility of pgbench custom scripts. However I just want to let you know that PostgreSQL Enterprise Consortium has been published couple of scripts along with reports on evaluating PostgreSQL, which have been downloaded considerable numbers. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Wed, Jun 24, 2015 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 06/21/2015 01:37 PM, Fabien COELHO wrote:
>
>> The worst case with pgbench is that we break two people's test scripts,
>> they read the release notes, and fix them.
>
> Sure, I agree that breaking pgbench custom scripts compatibility is no
> big deal, and having pgbench consistent with psql is useful.
... apparently nobody disagrees ...
I'm fine re-punctuating my current scripts. And since pgbench doesn't have to be version-matched to the server it connects to, people can just keep using the older version if they want to against a new server.
Are there other breaking changes we have been wanting to make? If so, should we try to get them all in during the same release?
Cheers,
Jeff
On Thu, Jun 25, 2015 at 10:51 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Look, how many people in the world develop their own pgbench scripts? >> And how many of those aren't on this list right now, reading this >> thread? I expect I could count them on my fingers and toes. > > I'm not against you break the backward compatibility of pgbench custom > scripts. > > However I just want to let you know that PostgreSQL Enterprise > Consortium has been published couple of scripts along with reports on > evaluating PostgreSQL, which have been downloaded considerable > numbers. pgbench is a tool for dedicated to developers. Hence people who should be able to fix scripts properly as long as we inform them by documenting it in the release notes so I am not sure that this is actually a problem. -- Michael
>> I'm not against you break the backward compatibility of pgbench custom >> scripts. >> >> However I just want to let you know that PostgreSQL Enterprise >> Consortium has been published couple of scripts along with reports on >> evaluating PostgreSQL, which have been downloaded considerable >> numbers. > > pgbench is a tool for dedicated to developers. Hence people who should > be able to fix scripts properly as long as we inform them by > documenting it in the release notes so I am not sure that this is > actually a problem. I'm not sure what you mean by "developers" here but if that means people who are developing PostgreSQL itself, I am strongly against "pgbench is a tool for dedicated to developers" concept. Pgbench has been heavily used by users who want to measure PostgreSQL's performance. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, Jun 26, 2015 at 9:01 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> I'm not against you break the backward compatibility of pgbench custom >>> scripts. >>> >>> However I just want to let you know that PostgreSQL Enterprise >>> Consortium has been published couple of scripts along with reports on >>> evaluating PostgreSQL, which have been downloaded considerable >>> numbers. >> >> pgbench is a tool for dedicated to developers. Hence people who should >> be able to fix scripts properly as long as we inform them by >> documenting it in the release notes so I am not sure that this is >> actually a problem. > > I'm not sure what you mean by "developers" here but if that means > people who are developing PostgreSQL itself, I am strongly against > "pgbench is a tool for dedicated to developers" concept. Pgbench has > been heavily used by users who want to measure PostgreSQL's > performance. I meant "people who can write SQL". Sorry for the misunderstanding. Please do not take any offense ;) -- Michael
On 06/21/2015 11:12 AM, Fabien COELHO wrote: > > Hello Josh, > >>> Add backslash continuations to pgbench custom scripts. >>> [...] >>> IMHO this approach is the best compromise. >> >> I don't personally agree. I believe that it it worth breaking backwards >> compatibility to support line breaks in pgbench statements, and that if >> we're not going to do that, supporting \ continuations is of little value. >> >> As someone who actively uses pgbench to write custom benchmarks, I need >> to write queries which I can test. \ continuation does NOT work on the >> psql command line, so that's useless for testing my queries; I still >> have to reformat and troubleshoot. If we added \ continuation, I >> wouldn't use it. >> >> I think we should support line breaks, and require semicolons for >> end-of-statement. Backwards-compatability in custom pgbench scripts is >> not critical; pgbench scripts are neither used in produciton, nor used >> in automated systems much that I know of. >> >> I'm not clear on why we'd need a full SQL lexer. > > Attached is a "without lexer" version which does ;-terminated SQL commands > and \-continuated meta commands (may be useful for \shell and long \set > expressions). As Tom pointed out, you need the full lexer to do this correctly. You can argue that something that handles the most common cases is enough, but realistically, by the time you've handled all the common cases correctly, you've just re-invented the lexer. The home-grown lexer is missing e.g. dollar-quoting support, so this is not be parsed correctly: do $$ begin ... end; $$; That would be very nice to handle correctly, I've used DO-blocks in pgbench scripts many times, and it's a pain to have to write them in a single line. Also worth noting that you can currently test so-called multi-statements with pgbench, by putting multiple statements on a single line. Your patch seems to still do that, but if we went with a full-blown SQL lexer, they would probably be split into two statements. I think we should either bite the bullet and include the full SQL lexer in pgbench, or come up with some new syntax for marking the beginning and end of a statement. We could do something like bash here-documents or Postgres dollar-quoting, for example: \set ... select 1234; -- A statement on a single line, no change here -- Begin a multi-line statement \multi-line-statement END_TOKEN select * from complicated; END_TOKEN - Heikki
On 2015-07-03 13:50:02 +0300, Heikki Linnakangas wrote: > As Tom pointed out, you need the full lexer to do this correctly. You can > argue that something that handles the most common cases is enough, but > realistically, by the time you've handled all the common cases correctly, > you've just re-invented the lexer. Yes. > I think we should either bite the bullet and include the full SQL lexer in > pgbench, or come up with some new syntax for marking the beginning and end > of a statement. I'm pretty clearly in favor of doing correct lexing. I think we should generalize that and make it reusable. psql has it's own hacked up version already, there seems little point in having variedly good copies around. > We could do something like bash here-documents or Postgres > dollar-quoting, for example: > > \set ... > select 1234; -- A statement on a single line, no change here > > -- Begin a multi-line statement > \multi-line-statement END_TOKEN > select * > from complicated; > END_TOKEN Not pretty imo. I could see including something esimpler, in addition to the lexer, to allow sending multiple statements in one go.
Hello Heikki, >>> I'm not clear on why we'd need a full SQL lexer. >> >> Attached is a "without lexer" version which does ;-terminated SQL commands >> and \-continuated meta commands (may be useful for \shell and long \set >> expressions). > > As Tom pointed out, you need the full lexer to do this correctly. You can > argue that something that handles the most common cases is enough, but > realistically, by the time you've handled all the common cases correctly, > you've just re-invented the lexer. Sure. I understand that part of Josh argument is that we are discussing pgbench test scripts here, not real full-blown applications, and these are expected to be quite basic, plain mostly SQL things. > The home-grown lexer is missing e.g. dollar-quoting support, so this is not > be parsed correctly: > > do $$ > begin > ... > end; > $$; Hmmm, good one, if indeed you want to use PL/pgSQL or even any arbitrary language in a pgbench scripts... I would rather have created functions (once, outside of pgbench) and would call them from the script, so that would be a simple SELECT. > That would be very nice to handle correctly, I've used DO-blocks in pgbench > scripts many times, and it's a pain to have to write them in a single line. Yep. With some languages I'm not sure that it is even possible. > Also worth noting that you can currently test so-called multi-statements > with pgbench, by putting multiple statements on a single line. Yes indeed, behind the hood pgbench expects just one line, or you have to change significantly the way statements are handled, which is way beyond my initial intentions on this one, and this would mean quite a lot of changes for more or less corner cases. > Your patch seems to still do that, but if we went with a full-blown SQL > lexer, they would probably be split into two statements. > I think we should either bite the bullet and include the full SQL lexer in > pgbench, or come up with some new syntax for marking the beginning and end of > a statement. We could do something like bash here-documents or Postgres > dollar-quoting, for example: > > \set ... > select 1234; -- A statement on a single line, no change here > > -- Begin a multi-line statement > \multi-line-statement END_TOKEN > select * > from complicated; > END_TOKEN I do not like the aesthetic of the above much. I really liked the idea of simply writing SQL queries just as in psql. So maybe just handling $$-quoting would be enough to handle reasonable use-cases without troubling pgbench internal working too much? That would be a simple local changes in the line reader. -- Fabien.
> I'm pretty clearly in favor of doing correct lexing. I think we should > generalize that and make it reusable. psql has it's own hacked up > version already, there seems little point in having variedly good copies > around. I must admit that I do not know how to share lexer rules but have different actions on them (psql vs sql parser vs ...), as the action code is intrinsically intertwined with expressions. Maybe some hack is possible. Having yet another SQL-lexer to maintain seems highly undesirable, especially just for pgbench. > I could see including something esimpler, in addition to the lexer, to > allow sending multiple statements in one go. Currently, probably SELECT 1; SELECT 1; Does 2 statements in one go, but it is on one line. May by allowing both continuations and ";" at the same time: -- two statements in one go SELECT 1; \ SELECT 1; -- next statement on it's own SELECT 1; Which could be reasonnably neat, and easy to implement. -- Fabien.
> The home-grown lexer is missing e.g. dollar-quoting support, so this is not > be parsed correctly: > > do $$ > begin > ... > end; > $$; > That would be very nice to handle correctly, I've used DO-blocks in pgbench > scripts many times, and it's a pain to have to write them in a single line. Attached is a version which does that (I think), and a test script. - backslash-commands can be \-continuated - sql-commands may include $$-quotes and must end with a ';' - double-dash commentsand blank line are skipped. Obviously it is still a non-lexer hack which can be easily defeated, but ISTM that it handles non-contrived cases well. Anyway ISTM that dollar quoting cannot be handle as such and simply by a lexer, it is really an exception mechanism. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> I'm pretty clearly in favor of doing correct lexing. I think we should >> generalize that and make it reusable. psql has it's own hacked up >> version already, there seems little point in having variedly good copies >> around. > I must admit that I do not know how to share lexer rules but have > different actions on them (psql vs sql parser vs ...), as the action code > is intrinsically intertwined with expressions. Obviously this is scope creep of the first magnitude, but ISTM that it would be possible to share a lexer between psql and pgbench, since in both of them the basic requirement is "break SQL commands apart and identify newline-terminated backslash commands". If we're gonna break pgbench's backwards compatibility anyway, there would be a whole lot to be said for just going over to psql's input parsing rules, lock stock 'n barrel; and this would be a good way to achieve that. As it stands, psqlscan.l has some external dependencies on the rest of psql, but we could perhaps refactor some of those away, and provide dummy implementations to satisfy others (eg pgbench could provide a dummy GetVariable() that just always returns NULL). So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it as-is (possibly after refactoring in psql). A possible issue is avoiding unnecessary invocations of flex, though. Maybe symlinking the .c file would work better. regards, tom lane
I wrote: > As it stands, psqlscan.l has some external dependencies on the rest of > psql, but we could perhaps refactor some of those away, and provide dummy > implementations to satisfy others (eg pgbench could provide a dummy > GetVariable() that just always returns NULL). > So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it > as-is (possibly after refactoring in psql). A possible issue is avoiding > unnecessary invocations of flex, though. Maybe symlinking the .c file > would work better. A quick experiment with compiling psqlscan inside pgbench yields the following failures: pgbench.o: In function `psql_scan_setup': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1239: undefined reference to `pset' pgbench.o: In function `escape_variable': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to `GetVariable' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1956: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1957: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1971: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1973: undefined reference to `psql_error' pgbench.o: In function `evaluate_backtick': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1701: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1712: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1722: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1728: undefined reference to `psql_error' pgbench.o: In function `yylex': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:511: undefined reference to `standard_strings' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to `GetVariable' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:751: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to `GetVariable' pgbench.o: In function `psql_scan_slash_option': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1619: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1628: undefined reference to `psql_error' The pset references are to pset.encoding, pset.db, or pset.vars. I'd think the best way to deal with the encoding and connection are to pass them as parameters to psql_scan_setup() which'd store them in the PsqlScanState. pset.vars is only passed to GetVariable. We could refactor that away somehow (although actually, why wouldn't we want to just implement variable substitution exactly like it is in psql? Maybe the right answer is to import psql/variables.c lock stock n barrel too...) psql_error() and standard_strings() wouldn't be hard to provide. So this is looking *eminently* doable. regards, tom lane
> (although actually, why wouldn't we want to just implement variable > substitution exactly like it is in psql? Pgbench variable substitution is performed when the script is run, not while the file is being processed for being split, which is when a lexer would be used. The situation is not the same with psql. The most it could do would be to keep track of what substitution are done in queries. > So this is looking *eminently* doable. Possibly. How much more effort would be involved compared to the quick patch I did, I wonder:-) -- Fabien.
Hello Tom, >> (although actually, why wouldn't we want to just implement variable >> substitution exactly like it is in psql? > > Pgbench variable substitution is performed when the script is run, not while > the file is being processed for being split, which is when a lexer would be > used. The situation is not the same with psql. The most it could do would be > to keep track of what substitution are done in queries. > >> So this is looking *eminently* doable. > > Possibly. How much more effort would be involved compared to the quick patch > I did, I wonder:-) I had a quick look at the code, and although it seems doable to hack the psql lexer for pgbench benefit, I do not think it is a good idea: - improving pgbench scripts is really a small feature which requires a light weight approach in my opinion. There is noreal benefit of having a lexer solution which can handle contrived cases, because they would be contrived cases andnot the kind of tests really written by pgbench users. - the solution would probably be fragile to changes in psql, or could prevent such changes because of the pgbench dependency, and this does not look desirable. - it would involve much more time than I'm ready to give on such a small feature. So the current patch, or possibly variants of this patch to fix issues that may be raised, is what I'm ready to put forward on this. If you feel that this feature only deserve a lexer solution, then the patch should be "returned with feedback". -- Fabien.
Hi, > If you feel that this feature only deserve a lexer solution, then the > patch should be "returned with feedback". It's unfortunate to abandon this idea so I tried this and made it run with psql's parser. I think it works as expected. The attached files are as follwoing. - 0001-Prepare-for-share-psqlscan-with-pgbench.patchA patch to modify psql so that psqlscan can be shared with other modules. - 0002-Make-use-of-psqlscan-in-pgbench.patchA patch to use psqlscan in pgbench. - hoge.sqlA sample custom script including multilne statement and line comment I can't judge wheter this is a new version of Febien's patch following Tom's suggestion or brand-new one. Anyway I'd like to post on this thread. ====== At Fri, 17 Jul 2015 21:26:44 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1507172113080.31314@sto> > > Pgbench variable substitution is performed when the script is run, not > > while the file is being processed for being split, which is when a > > lexer would be used. The situation is not the same with psql. The most > > it could do would be to keep track of what substitution are done in > > queries. > > > >> So this is looking *eminently* doable. > > > > Possibly. How much more effort would be involved compared to the > > quick patch I did, I wonder:-) The patch set consists of two parts. The first modifies psqlscan.l to work in pgbench. Almost along on Tom's suggestion. - Eliminate direct reading of pset and store them into PsqlScanState in psql_scan_setup. - variables, common, settings and prompt in pgbench are the shrinked version from that of psql. The second part modifies pgbench to use the modified version of psqlscan.l. As the result, - Multiline SQLs (not backslash continuation) in custom script is allowed. (also for builtins but it's no use). - backslash commands is handled as the same as before: multiline is not allowed. A sample script is also attached. Suggestions? Opinions? I don't have idea how to deal with the copy of psqlscan.[lh] from psql. Currently they are simply the dead copies of those of psql. - Modifying psqlscan in psql requires consideration on how it is used in pgbench. - They are rather small but common, variables, prompt are essentially needeless files.. reagsrds, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Kyotaro-san, >> If you feel that this feature only deserve a lexer solution, then the >> patch should be "returned with feedback". > > It's unfortunate to abandon this idea so I tried this and made it run > with psql's parser. I think it works as expected. Wow, you are much more courageous than I am!:-) > - 0001-Prepare-for-share-psqlscan-with-pgbench.patch > A patch to modify psql so that psqlscan can be shared with other modules. > > - 0002-Make-use-of-psqlscan-in-pgbench.patch > A patch to use psqlscan in pgbench. > > - hoge.sql > A sample custom script including multilne statement and line comment > > I can't judge wheter this is a new version of Febien's patch > following Tom's suggestion or brand-new one. Anyway I'd like to > post on this thread. I think it is really a new patch, but posting it is seems logical because that is where the discussion was lead. > - backslash commands is handled as the same as before: multiline > is not allowed. Hmm... that is really the feature I wanted to add initially, too bad it is the dropped one:-) > Suggestions? Opinions? > > I don't have idea how to deal with the copy of psqlscan.[lh] from > psql. Currently they are simply the dead copies of those of psql. I think that there should be no copies, but it should use relative symbolic links so that the files are kept synchronized. > - Modifying psqlscan in psql requires consideration on how it is > used in pgbench. Yep, that is one of the reason why I did not want to go this way, bar my natural lazyness. -- Fabien.
Hi, all. Attatched is the revised version of this patch. The first patch is not changed from before. The second is fixed a kind of bug. Ths third is the new one to allow backslash continuation for backslash commands. hoge.sql is the test custom script. ====== At Fri, 24 Jul 2015 07:39:16 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1507240731050.12839@sto> > > - backslash commands is handled as the same as before: multiline > > is not allowed. > > Hmm... that is really the feature I wanted to add initially, too bad > it is the dropped one:-) Ouch. The story has been derailed somewhere. Since SQL statments could be multilined without particluar marker, we cannot implement multilined backslash commands in the same way.. The attached revised patch allows backslash continuation for backslash comands. I suppose this is the same as what you did in behavior. But SQL statements are still can be continued as psql does. I'm not satisfied by the design but I don't see another way.. > > > > I don't have idea how to deal with the copy of psqlscan.[lh] from > > psql. Currently they are simply the dead copies of those of psql. > > I think that there should be no copies, but it should use relative > symbolic links so that the files are kept synchronized. Yeah, I think so but symlinks could harm on git and Windows. The another way would be make copies it from psql directory. They live next door to each other. > > - Modifying psqlscan in psql requires consideration on how it is > > used in pgbench. > > Yep, that is one of the reason why I did not want to go this way, bar > my natural lazyness. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
<oops, sorry, stalled post because of wrong from, posting again...> > Attatched is the revised version of this patch. > > The first patch is not changed from before. > > The second is fixed a kind of bug. > > Ths third is the new one to allow backslash continuation for > backslash commands. Ah, thanks:-) Would you consider adding the patch to the next commitfest? I may put myself as a reviewer... > [...] I'm not satisfied by the design but I don't see another way.. I'll try to have a look. >>> I don't have idea how to deal with the copy of psqlscan.[lh] from >>> psql. Currently they are simply the dead copies of those of psql. >> >> I think that there should be no copies, but it should use relative >> symbolic links so that the files are kept synchronized. > > Yeah, I think so but symlinks could harm on git and Windows. > The another way would be make copies it from psql directory. They live > next door to each other. Indeed there are plenty of links already which are generated by makefiles (see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There should no file duplication within the source tree. -- Fabien.
Hello, > > Attatched is the revised version of this patch. > > > > The first patch is not changed from before. > > > > The second is fixed a kind of bug. > > > > Ths third is the new one to allow backslash continuation for > > backslash commands. > > Ah, thanks:-) > > Would you consider adding the patch to the next commitfest? I may put > myself as a reviewer... No problem. > > [...] I'm not satisfied by the design but I don't see another way.. > > I'll try to have a look. Thanks. > >>> I don't have idea how to deal with the copy of psqlscan.[lh] from > >>> psql. Currently they are simply the dead copies of those of psql. > >> > >> I think that there should be no copies, but it should use relative > >> symbolic links so that the files are kept synchronized. > > > > Yeah, I think so but symlinks could harm on git and Windows. > > The another way would be make copies it from psql directory. They live > > next door to each other. > > Indeed there are plenty of links already which are generated by > makefiles (see src/bin/pg_xlogdump/*), and probably a copy is made on > windows. There should no file duplication within the source tree. Thank you for pointing out that. Makefile of pg_xlogdump re-creates symlinks to those files and they are replaced with cp for the platforms where symlinks are not usable. But the files are are explicitly added to .sln file on msvc. Anyway I'll address it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
> Attatched is the revised version of this patch. > > The first patch is not changed from before. > > The second is fixed a kind of bug. > > Ths third is the new one to allow backslash continuation for > backslash commands. Ah, thanks:-) Would you consider adding the patch to the next commitfest? I may put myself as a reviewer... > [...] I'm not satisfied by the design but I don't see another way.. I'll try to have a look. >>> I don't have idea how to deal with the copy of psqlscan.[lh] from >>> psql. Currently they are simply the dead copies of those of psql. >> >> I think that there should be no copies, but it should use relative >> symbolic links so that the files are kept synchronized. > > Yeah, I think so but symlinks could harm on git and Windows. > The another way would be make copies it from psql directory. They live > next door to each other. Indeed there are plenty of links already which are generated by makefiles (see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There should no file duplication within the source tree. -- Fabien.
On 07/24/2015 11:36 AM, Kyotaro HORIGUCHI wrote: > At Fri, 24 Jul 2015 07:39:16 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1507240731050.12839@sto> >>> - backslash commands is handled as the same as before: multiline >>> is not allowed. >> >> Hmm... that is really the feature I wanted to add initially, too bad >> it is the dropped one:-) > > Ouch. The story has been derailed somewhere. > > Since SQL statments could be multilined without particluar > marker, we cannot implement multilined backslash commands in the > same way.. I don't think we actually want backslash-continuations. The feature we want is "allow SQL statements span multiple lines", and using the psql lexer solves that. We don't need the backslash-continuations when we have that. On 07/25/2015 05:53 PM, Fabien COELHO wrote: >>>> I don't have idea how to deal with the copy of psqlscan.[lh] from >>>> psql. Currently they are simply the dead copies of those of psql. >>> >>> I think that there should be no copies, but it should use relative >>> symbolic links so that the files are kept synchronized. >> >> Yeah, I think so but symlinks could harm on git and Windows. >> The another way would be make copies it from psql directory. They live >> next door to each other. > > Indeed there are plenty of links already which are generated by makefiles > (see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There > should no file duplication within the source tree. Yeah, following the example of pg_xlogdump and others is the way to go. Docs need updating, and there's probably some cleanup to do before this is ready for committing, but overall I think this is definitely the right direction. I complained upthread that this makes it impossible to use "multi-statements" in pgbench, as they would be split into separate statements, but looking at psqlscan.l there is actually a syntax for that in psql already. You escape the semicolon as \;, e.g. "SELECT 1 \; SELECT 2;", and then both queries will be sent to the server as one. So even that's OK. - Heikki
Hello Heikki, > I don't think we actually want backslash-continuations. The feature we want > is "allow SQL statements span multiple lines", and using the psql lexer > solves that. We don't need the backslash-continuations when we have that. Sure. The feature *I* initially wanted was to have multi-line meta-commands. For this feature ISTM that continuations are, alas, the solution. >> Indeed there are plenty of links already which are generated by makefiles >> (see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There >> should no file duplication within the source tree. > > Yeah, following the example of pg_xlogdump and others is the way to go. > > Docs need updating, and there's probably some cleanup to do before this is > ready for committing, but overall I think this is definitely the right > direction. I've created an entry for the next commitfest, and put the status to "waiting on author". > I complained upthread that this makes it impossible to use "multi-statements" > in pgbench, as they would be split into separate statements, but looking at > psqlscan.l there is actually a syntax for that in psql already. You escape > the semicolon as \;, e.g. "SELECT 1 \; SELECT 2;", and then both queries will > be sent to the server as one. So even that's OK. Good! -- Fabien.
Hi, all. > > I don't think we actually want backslash-continuations. The feature we > > want is "allow SQL statements span multiple lines", and using the psql > > lexer solves that. We don't need the backslash-continuations when we > > have that. > > Sure. The feature *I* initially wanted was to have multi-line > meta-commands. For this feature ISTM that continuations are, alas, the > solution. > > >> Indeed there are plenty of links already which are generated by > >> makefiles > >> (see src/bin/pg_xlogdump/*), and probably a copy is made on > >> windows. There > >> should no file duplication within the source tree. > > > > Yeah, following the example of pg_xlogdump and others is the way to > > go. > > > > Docs need updating, and there's probably some cleanup to do before > > this is ready for committing, but overall I think this is definitely > > the right direction. > > I've created an entry for the next commitfest, and put the status to > "waiting on author". > > > I complained upthread that this makes it impossible to use > > "multi-statements" in pgbench, as they would be split into separate > > statements, but looking at psqlscan.l there is actually a syntax for > > that in psql already. You escape the semicolon as \;, e.g. "SELECT 1 > > \; SELECT 2;", and then both queries will be sent to the server as > > one. So even that's OK. > > Good! Hmm. psqlscan.l handles multistatement naturally. I worked on that and the attached patche set does, - backslash continuation for pgbench metacommands. set variable \ <some value> - SQL statement natural continuation lines. SELECT :foo FROM :bar; - SQL multi-statement. SELECT 1; SELECT 2; The work to be left is eliminating double-format of Command struct. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, Thank you for registering this to CF-Sep. I missed the regtration window.. It ended earlier than usual.. Most troubles have gone and I'll be back next week. > The work to be left is eliminating double-format of Command > struct. This is done as the additional fourth patch, not merged into previous ones, to show what's changed in the manner of command storing. I repost on this thread the new version of this patch including this and posted before. This is rebased to current master. The changes in behaviors brought by this patch has been described in the privous mail as the following, > Hmm. psqlscan.l handles multistatement naturally. > I worked on that and the attached patche set does, > > - backslash continuation for pgbench metacommands. > > set variable \ > <some value> > > - SQL statement natural continuation lines. > > SELECT :foo > FROM :bar; > > - SQL multi-statement. > > SELECT 1; SELECT 2; Each of the four patches does the following thigs, 1. 0001-Prepare-to-share-psqlscan-with-pgbench.patch The global variable pset, VariableSpace and backslash syntax of psql are the obstacles for psqlscan.l from being used bypgbench. This patch eliminates direct reference to pset and masks VariableSpace feature (looks ugry..), and enables backslashsyntax in psqlscan.l to be hidden from outside psql by defining the symbol OUTSIDE_PSQL. No behavioral changes of pasql are introduced by ths patch. 2. 0002-Make-use-of-psqlscan-for-parsing-of-custom-script.patch This is the core of this patch, which makes pgbench to use psqlscan.l and enables multi-statements, multiline-SQL-statementand backslash-continuation of metacommands. The struct Command is modified that it can be chained in order to convey multistatement in one line. But the commands arestored in an array of Command just outside process_commands as of old. This double-formatting will be removed by the fourthpatch. psqlscan.c is compiled as a part of mainloop.c for some reason described at the end of the file. I haven't confirmed thatthe same thing will happen in pgbench, but I did the same thing for pgbenc.c. Compilation will fail on Windows as of this patch. 3. 0003-Change-MSVC-Build-script.patch Changes the build script for Windows platform. It certainly works but might look a bit odd because of the anormaly of thecompilation way of psqlscan.l 4. 0004-Change-the-way-to-hold-command-list.patch Changes the way to hold commad list from an array to linked list, to remove the double formatting of Command-list introducedby the second patch. This removes the explicit limitation on the number of commands in scripts, as a side-effect. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, Here is a review, sorry for the delay... > This is done as the additional fourth patch, not merged into > previous ones, to show what's changed in the manner of command > storing. > [...] >> - SQL multi-statement. >> >> SELECT 1; SELECT 2; I think this is really "SELECT 1\; SELECT 2;" I join a test script I used. The purpose of this 4 parts patch is to reuse psql scanner from pgbench so that commands are cleanly separated by ";", including managing dollar quoting, having \ continuations in backslash-commands, having multi-statement commands... This review is about 4 part v4 of the patch. The patches apply and compile cleanly. I think that the features are worthwhile. I would have prefer more limited changes to get them, but my earlier attempt was rejected, and the scanner sharing with psql results in reasonably limited changes, so I would go for it. * 0001 patch about psql scanner reworking The relevant features lexer which can be reused by pgbench are isolated and adapted thanks to ifdefs, guards, and putting some stuff in the current state. I'm not sure of the "OUTSIDE_PSQL" macro name. ISTM that "PGBENCH_SCANNER" would be better, as it makes it very clear that it is being used for pgbench, and if someone wants to use it for something else they should define and handle their own case explicitely. Use "void *" instead of "int *" for VariableSpace? Rule ":{variable_char}+": the ECHO works more or less as a side effect, and most of the code is really ignored by pgbench. Instead of the different changes which rely on NULL values, what about a simple ifdef/else/endif to replace the whole stuff by ECHO for pgbench, without changing the current code? Also, on the same part of the code, I'm not sure about having two assignments on the "const char * value" variable, because of "const". The "db" parameter is not used by psql_scan_setup, so the state's db field is never initialized, so probably "psql" does not work properly because it seems used in several places. I'm not sure what would happen if there are reconnections from psql (is that possible? Without reseting the scanner state?), as there are two connections, one in pset and one in the scanner state? Variable lookup guards: why is a database connection necessary for doing ":variables" lookups? It seemed not to be required in the initial version, and is not really needed. Avoid changing style without clear motivation, for instance the PQerrorMessage/psql_error on escape_value==NULL? *** 0002 patch to use the scanner in pgbench There is no documentation nor examples about the new features. I think that the internal builtin commands and the documentation should be used to show the new features where appropriate, and insist on that ";" are now required at the end of sql commands. If the file starts with a -- comment followed by a backslash-command, eg: -- there is only one foo currently available \set foo 1 an error is reported: the comment should probably just be ignored. I'm not sure that the special "next" command parsing management is useful. I do not see a significant added value to managing especially a list of commands for commands which happened to be on the same line but separated by a simple ";". Could not the code be simplified by just restarting the scanner where it left, instead of looping in "process_commands"? It seems that part of the changes are just reindentations, especially all the parsing code for backslash commands. This should be avoided if possible. Some spurious spaces after "next_command:". *** 0003 patch for ms build I don't do windows:-) The perl script changes look pretty reasonable, although the copied comments refer to pg_xlogdump, I guess it should rather refer to pgbench. *** 0004 command list patch This patch changes the command array to use a linked list. As the command number is needed in several places and has to be replaced by a function call which scans the list, I do not think it is a good idea, and I recommand not to consider this part for inclusion. -- Fabien.
On Wed, Oct 14, 2015 at 5:49 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello, > > Here is a review, sorry for the delay... > >> This is done as the additional fourth patch, not merged into >> previous ones, to show what's changed in the manner of command >> storing. >> [...] >>> >>> - SQL multi-statement. >>> >>> SELECT 1; SELECT 2; > > > I think this is really "SELECT 1\; SELECT 2;" > > I join a test script I used. > > > The purpose of this 4 parts patch is to reuse psql scanner from pgbench > so that commands are cleanly separated by ";", including managing dollar > quoting, having \ continuations in backslash-commands, having > multi-statement commands... > > This review is about 4 part v4 of the patch. The patches apply and compile > cleanly. > > I think that the features are worthwhile. I would have prefer more limited > changes to get them, but my earlier attempt was rejected, and the scanner > sharing with psql results in reasonably limited changes, so I would go for > it. Regarding that: +#if !defined OUTSIDE_PSQL +#include "variables.h" +#else +typedef int * VariableSpace; +#endif And that: +/* Provide dummy macros when no use of psql variables */ +#if defined OUTSIDE_PSQL +#define GetVariable(space,name) NULL +#define standard_strings() true +#define psql_error(fmt,...) do { \ + fprintf(stderr, "psql_error is called. abort.\n");\ + exit(1);\ +} while(0) +#endif That's ugly... Wouldn't it be better with something say in src/common which is frontend-only? We could start with a set of routines allowing commands to be parsed. That gives us more room for future improvement. + # fix up pg_xlogdump once it's been set up + # files symlinked on Unix are copied on windows + my $pgbench = AddSimpleFrontend('pgbench'); + $pgbench->AddDefine('FRONTEND'); + $pgbench->AddDefine('OUTSIDE_PSQL'); + $pgbench->AddFile('src/bin/psql/psqlscan.l'); + $pgbench->AddIncludeDir('src/bin/psql'); This is a simple copy-paste, with an incorrect comment at least (haven't tested compilation with MSVC, I suspect that this is going to fail still the flags are correctly set). This patch is waiting for input from its author for quite some time now, and the structure of this patch needs a rework. Are folks on this thread fine if it is returned with feedback? -- Michael
I'm terribly sorry to have overlooked Febien's comment. I'll rework on this considering Febien's previous comment and Michael's this comment in the next CF. At Tue, 22 Dec 2015 16:17:07 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqRX_-VymeEH-3ChoPrQLgKh=EGgQ2GUtZ53ccO9uLGmxA@mail.gmail.com> > On Wed, Oct 14, 2015 at 5:49 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > > Hello, > > > > Here is a review, sorry for the delay... > > > >> This is done as the additional fourth patch, not merged into > >> previous ones, to show what's changed in the manner of command > >> storing. > >> [...] > >>> > >>> - SQL multi-statement. > >>> > >>> SELECT 1; SELECT 2; > > > > > > I think this is really "SELECT 1\; SELECT 2;" Mmm. It is differenct from my recognition. I'll confirm that. > > I join a test script I used. Thank you. I'll work on with it. > > The purpose of this 4 parts patch is to reuse psql scanner from pgbench > > so that commands are cleanly separated by ";", including managing dollar > > quoting, having \ continuations in backslash-commands, having > > multi-statement commands... > > > > This review is about 4 part v4 of the patch. The patches apply and compile > > cleanly. > > > > I think that the features are worthwhile. I would have prefer more limited > > changes to get them, but my earlier attempt was rejected, and the scanner > > sharing with psql results in reasonably limited changes, so I would go for > > it. > > Regarding that: > +#if !defined OUTSIDE_PSQL > +#include "variables.h" > +#else > +typedef int * VariableSpace; > +#endif > > And that: > +/* Provide dummy macros when no use of psql variables */ > +#if defined OUTSIDE_PSQL > +#define GetVariable(space,name) NULL > +#define standard_strings() true > +#define psql_error(fmt,...) do { \ > + fprintf(stderr, "psql_error is called. abort.\n");\ > + exit(1);\ > +} while(0) > +#endif > That's ugly... I have to admit that I think just the same. > Wouldn't it be better with something say in src/common > which is frontend-only? We could start with a set of routines allowing > commands to be parsed. That gives us more room for future improvement. If I read you correctly, I should cut it out into a new file and include it. Is it correct? > + # fix up pg_xlogdump once it's been set up > + # files symlinked on Unix are copied on windows > + my $pgbench = AddSimpleFrontend('pgbench'); > + $pgbench->AddDefine('FRONTEND'); > + $pgbench->AddDefine('OUTSIDE_PSQL'); > + $pgbench->AddFile('src/bin/psql/psqlscan.l'); > + $pgbench->AddIncludeDir('src/bin/psql'); > This is a simple copy-paste, with an incorrect comment at least > (haven't tested compilation with MSVC, I suspect that this is going to > fail still the flags are correctly set). Oops. Thank you for pointing out. It worked for me but, honestly saying, I couldn't another clean way to do that but I'll reconsider it. > This patch is waiting for input from its author for quite some time > now, and the structure of this patch needs a rework. Are folks on this > thread fine if it is returned with feedback? It's fine for me. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Dec 22, 2015 at 5:34 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > I wrote: >> Wouldn't it be better with something say in src/common >> which is frontend-only? We could start with a set of routines allowing >> commands to be parsed. That gives us more room for future improvement. > > If I read you correctly, I should cut it out into a new file and > include it. Is it correct? Not really, I meant to see if it would be possible to include this set of routines directly in libpqcommon (as part of OBJS_FRONTEND). This way any client applications could easily reuse it. If we found that what was in psql is now useful to pgbench, I have little doubt that some other folks would make good use of that. I honestly have not looked at the code to see if that's doable or not, but soft-linking directly in pgbench a file of psql will not help future maintenance for sure. This increases the complexity of the code. -- Michael
Hello Michaël, >> If I read you correctly, I should cut it out into a new file and >> include it. Is it correct? > > Not really, I meant to see if it would be possible to include this set > of routines directly in libpqcommon (as part of OBJS_FRONTEND). This > way any client applications could easily reuse it. If we found that > what was in psql is now useful to pgbench, I have little doubt that > some other folks would make good use of that. I honestly have not > looked at the code to see if that's doable or not, but soft-linking > directly in pgbench a file of psql will not help future maintenance > for sure. This increases the complexity of the code. Just my 0.02€: I understand that you suggest some kind of dynamic parametrization... this is harder to do and potentially as fragile as the link/ifdef option, with an undertermined set of callbacks to provide... the generic version would be harder to debug, and this approach would prevent changing lexer options. Basically I'm not sure that doing all that for improving the handling of pgbench scripts is worth the effort. I would go with the simpler option. -- Fabien.
Hello, At Thu, 24 Dec 2015 07:40:19 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1512240729160.17411@sto> > > Hello Michaël, > > >> If I read you correctly, I should cut it out into a new file and > >> include it. Is it correct? > > > > Not really, I meant to see if it would be possible to include this set > > of routines directly in libpqcommon (as part of OBJS_FRONTEND). This > > way any client applications could easily reuse it. If we found that > > what was in psql is now useful to pgbench, I have little doubt that > > some other folks would make good use of that. I honestly have not > > looked at the code to see if that's doable or not, but soft-linking > > directly in pgbench a file of psql will not help future maintenance > > for sure. This increases the complexity of the code. Thanks. I understand it and I agree to the last sentense. I don't want them to be exposed as generic features. Instaed, I'd like to separate backslash commands from psqlscan.l and use callbacks to access variables by ":name" syntax (it is a common syntax between pgbench and psql). Current psqlscan.l simply exits noticing of leading backslash of backslash commands to the caller so it would be separated without heavy surgery. This can put away the ugliness of VariableSpace overriding. standard_strings() checks standard_comforming_strins on the session. This is necessarily called on parsing so it can be moved out into PsqlScanState. psql_error() redirects messages according to queryFout and adds input file information. They are heavily dependent to psql. So I'd like to make them a callback in PsqlScanState and do fprintf as the default behavior (=NULL). These measures will get rid of the ugliness. What do you think about this? > Just my 0.02€: > > I understand that you suggest some kind of dynamic > parametrization... this is harder to do and potentially as fragile as > the link/ifdef option, with an undertermined set of callbacks to > provide... the generic version would be harder to debug, and this > approach would prevent changing lexer options. Basically I'm not sure > that doing all that for improving the handling of pgbench scripts is > worth the effort. I would go with the simpler option. Undetermined set of callbacks would do so but it seems to me a set of few known functions to deal with as shown above. The shared lexer deals only with SQL and a backslash at the top of a command. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, At Fri, 25 Dec 2015 14:18:24 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20151225.141824.75912397.horiguchi.kyotaro@lab.ntt.co.jp> > > > Not really, I meant to see if it would be possible to include this set > > > of routines directly in libpqcommon (as part of OBJS_FRONTEND). This > > > way any client applications could easily reuse it. If we found that > > > what was in psql is now useful to pgbench, I have little doubt that > > > some other folks would make good use of that. I honestly have not > > > looked at the code to see if that's doable or not, but soft-linking > > > directly in pgbench a file of psql will not help future maintenance > > > for sure. This increases the complexity of the code. > > Thanks. I understand it and I agree to the last sentense. I don't > want them to be exposed as generic features. > > Instaed, I'd like to separate backslash commands from psqlscan.l > and use callbacks to access variables by ":name" syntax (it is a > common syntax between pgbench and psql). Current psqlscan.l > simply exits noticing of leading backslash of backslash commands > to the caller so it would be separated without heavy > surgery. This can put away the ugliness of VariableSpace > overriding. Done. Lexer for backslash commands is moved out of SQL lexer as a standalone lexer. Finally the SQL lexer left behind no longer is aware of VariableSpace and simply used from pgbench by linking the object file in psql directory. Addition to that, PGconn required by escape_variable() so the core of the escape feature is moved out into callback function side. > standard_strings() checks standard_comforming_strins on the > session. This is necessarily called on parsing so it can be moved > out into PsqlScanState. Such behavior is not compatible with the current behavior. So I made starndard_strings() a callback, too. > psql_error() redirects messages according to queryFout and adds > input file information. They are heavily dependent to psql. So > I'd like to make them a callback in PsqlScanState and do fprintf > as the default behavior (=NULL). Done. > These measures will get rid of the ugliness. What do you think > about this? > > > Just my 0.02€: > > > > I understand that you suggest some kind of dynamic > > parametrization... this is harder to do and potentially as fragile as > > the link/ifdef option, with an undertermined set of callbacks to > > provide... the generic version would be harder to debug, and this > > approach would prevent changing lexer options. Basically I'm not sure > > that doing all that for improving the handling of pgbench scripts is > > worth the effort. I would go with the simpler option. > > Undetermined set of callbacks would do so but it seems to me a > set of few known functions to deal with as shown above. The > shared lexer deals only with SQL and a backslash at the top of a > command. Finally, PsqlScanState has four callback funcions and all pgbench needs to do to use it is setting NULL to all of them and link the object file in psql directory. No link switch/ifdef are necessary. | const char get_variable(const char *, bool escape, bool as_ident, void (**free_fn)(void *)); | int enc_mblen(const char *); | bool standard_strings(void); | void error_out(const char *fmt, ...) The attached patches are the following. - 0001-Prepare-for-sharing-psqlscan-with-pgbench.patch This diff looks a bit large but most of them is cut'n-paste work and the substantial change is rather small. This refactors psqlscan.l into two .l files. The additional psqlscan_slash.l is a bit tricky in the point that recreatingscan_state on transition between psqlscan.l. .c files generated from .c are no longer complied as a part of mainloop.c. Thay have dedicated envelope .c files so thatthe sql lexer can easily used outside psql. - 0002-Change-the-access-method-to-shell-variables.patch Detaches VariableSpace from psqlscan. - 0003-Detach-common.c-from-psqlscan.patch Detaches PGconn from psqlscan by throwing out the session encoding stuff by ading two callbacks. - 0004-pgbench-uses-common-frontend-SQL-parser.patch This looks larger than what actually it does because an if branch with a large block in process_commands() was removed.'git diff -w' shows far small, substantial changes. - 0005-Change-the-way-to-hold-command-list.patch In the changes in 0004, SQL multistatement is passed as a linked list to the caller, then the caller assigns them on anarray with the same data type. This patch changes the way to hold commands entirely to linked list. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jan 7, 2016 at 5:36 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Finally, PsqlScanState has four callback funcions and all pgbench > needs to do to use it is setting NULL to all of them and link the > object file in psql directory. No link switch/ifdef are necessary. Am I missing something? This patch is not registered in the CF app. Horiguchi-san, if you expect feedback, it would be good to get it there. -- Michael
Mmm. I believed that this is on CF app.. At Tue, 19 Jan 2016 15:41:54 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQFnORG=LjDiGgD_hy_M00scx1ihn89QHx_1B9+3Vz7tQ@mail.gmail.com> > On Thu, Jan 7, 2016 at 5:36 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Finally, PsqlScanState has four callback funcions and all pgbench > > needs to do to use it is setting NULL to all of them and link the > > object file in psql directory. No link switch/ifdef are necessary. > > Am I missing something? This patch is not registered in the CF app. > Horiguchi-san, if you expect feedback, it would be good to get it > there. Thank you very much Michael but the CF app doesn't allow me to regsiter new one. Filling the Description field with "pgbench - allow backslash-continuations in custom scripts" and chose a topic then "Find thread" shows nothing. Filling the search text field on the "Attach thread" dialogue with the description or giving the exact message-id gave me nothing to choose. Maybe should I repost the patch so that the "Attach thread" can find it as a "recent" email? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Kyotaro-san, > Thank you very much Michael but the CF app doesn't allow me to > regsiter new one. Filling the Description field with "pgbench - > allow backslash-continuations in custom scripts" and chose a > topic then "Find thread" shows nothing. Filling the search text > field on the "Attach thread" dialogue with the description or > giving the exact message-id gave me nothing to choose. Strange. You could try taking the old entry and selecting state "move to next CF"? -- Fabien.
On Tue, Jan 26, 2016 at 6:51 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Mmm. I believed that this is on CF app.. > > At Tue, 19 Jan 2016 15:41:54 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQFnORG=LjDiGgD_hy_M00scx1ihn89QHx_1B9+3Vz7tQ@mail.gmail.com> >> On Thu, Jan 7, 2016 at 5:36 PM, Kyotaro HORIGUCHI >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> > Finally, PsqlScanState has four callback funcions and all pgbench >> > needs to do to use it is setting NULL to all of them and link the >> > object file in psql directory. No link switch/ifdef are necessary. >> >> Am I missing something? This patch is not registered in the CF app. >> Horiguchi-san, if you expect feedback, it would be good to get it >> there. > > Thank you very much Michael but the CF app doesn't allow me to > regsiter new one. That's perhaps a bit late I am afraid for this CF, we are at the end of January already... > Filling the Description field with "pgbench - > allow backslash-continuations in custom scripts" and chose a > topic then "Find thread" shows nothing. Filling the search text > field on the "Attach thread" dialogue with the description or > giving the exact message-id gave me nothing to choose. Really? That's because the patch is marked as returned with feedback here: https://commitfest.postgresql.org/7/319/ > Maybe should I repost the patch so that the "Attach thread" can > find it as a "recent" email? What if you just add it to next CF with a new entry? You are actually proposing an entirely new patch. -- Michael
Hello, thank you, Febien, Micael. # Though I have made almost no activity in the last month... At Tue, 26 Jan 2016 13:53:33 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1601261352210.6482@sto> > > Hello Kyotaro-san, > > > Thank you very much Michael but the CF app doesn't allow me to > > regsiter new one. Filling the Description field with "pgbench - > > allow backslash-continuations in custom scripts" and chose a > > topic then "Find thread" shows nothing. Filling the search text > > field on the "Attach thread" dialogue with the description or > > giving the exact message-id gave me nothing to choose. > > Strange. > > You could try taking the old entry and selecting state "move to next > CF"? Hmm. The state of the old entry in CF2015-11 is already "Move*d* to next CF" and it is not found in CF2016-01, as far as I saw. At Tue, 26 Jan 2016 22:21:49 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqRtaz2nGb-7wQJ+w1-rFyFFxkruesNBM3RcPCgXaCoSmQ@mail.gmail.com> > > Filling the Description field with "pgbench - > > allow backslash-continuations in custom scripts" and chose a > > topic then "Find thread" shows nothing. Filling the search text > > field on the "Attach thread" dialogue with the description or > > giving the exact message-id gave me nothing to choose. > > Really? That's because the patch is marked as returned with feedback here: > https://commitfest.postgresql.org/7/319/ Ah, I have many candidates in "Attach thread" dialog. That would be a temporary symptom of a kind of the CF-seaon-wise meaintenance. > > Maybe should I repost the patch so that the "Attach thread" can > > find it as a "recent" email? > > What if you just add it to next CF with a new entry? You are actually > proposing an entirely new patch. So, I finally could register an entry for CF2016-3. Thank you all for the suggestion. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jan 7, 2016 at 3:36 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > - 0001-Prepare-for-sharing-psqlscan-with-pgbench.patch > > This diff looks a bit large but most of them is cut'n-paste > work and the substantial change is rather small. > > This refactors psqlscan.l into two .l files. The additional > psqlscan_slash.l is a bit tricky in the point that recreating > scan_state on transition between psqlscan.l. I've looked at this patch a few times now but find it rather hard to verify. I am wondering if you would be willing to separate 0001 into subpatches. For example, maybe there could be one or two patches that ONLY move code around and then one or more patches that make the changes to that code. Right now, for example, psql_scan_setup() is getting three additional arguments, but it's also being moved to a different file. Perhaps those two things could be done one at a time. I also think this patch could really benefit from a detailed set of submission notes that specifically lay out why each change was made and why. For instance, I see that psqlscan.l used yyless() while psqlscanbody.l uses a new my_yyless() you've defined. There is probably a great reason for that and I'm sure if I stare at this for long enough I can figure out what that reason is, but it would be better if you had a list of bullet points explaining what was changed and why. I would really like to see this patch committed; my problem is that I don't have enough braincells to be sure that it's correct in the present form. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, thank you for reviewing this. > On Thu, Jan 7, 2016 at 3:36 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > - 0001-Prepare-for-sharing-psqlscan-with-pgbench.patch > > > > This diff looks a bit large but most of them is cut'n-paste > > work and the substantial change is rather small. > > > > This refactors psqlscan.l into two .l files. The additional > > psqlscan_slash.l is a bit tricky in the point that recreating > > scan_state on transition between psqlscan.l. > > I've looked at this patch a few times now but find it rather hard to > verify. I am wondering if you would be willing to separate 0001 into > subpatches. For example, maybe there could be one or two patches that > ONLY move code around and then one or more patches that make the > changes to that code. Right now, for example, psql_scan_setup() is > getting three additional arguments, but it's also being moved to a > different file. Perhaps those two things could be done one at a time. I tried to split it into patches with some meaningful (I thought) steps, but I'll arrange them if it is not easy to read. > I also think this patch could really benefit from a detailed set of > submission notes that specifically lay out why each change was made > and why. For instance, I see that psqlscan.l used yyless() while > psqlscanbody.l uses a new my_yyless() you've defined. There is > probably a great reason for that and I'm sure if I stare at this for > long enough I can figure out what that reason is, but it would be > better if you had a list of bullet points explaining what was changed > and why. I'm sorry, but I didn't understood the 'submission notes' exactly means. Is it precise descriptions in source comments? or commit message of git-commit? > I would really like to see this patch committed; my problem is that I > don't have enough braincells to be sure that it's correct in the > present form. Thank you. I'll send the rearranged patch sooner. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Feb 15, 2016 at 1:04 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > I'm sorry, but I didn't understood the 'submission notes' exactly > means. Is it precise descriptions in source comments? or commit > message of git-commit? Write a detailed email explaining each change that is part of the patch and why it is there. Attach the patch to that same email. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, At Tue, 16 Feb 2016 08:05:10 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoavXzXVV_k-89SbgMKB-Eyp+RpSKu_0tPGqx_ceEk=kCQ@mail.gmail.com> > On Mon, Feb 15, 2016 at 1:04 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > I'm sorry, but I didn't understood the 'submission notes' exactly > > means. Is it precise descriptions in source comments? or commit > > message of git-commit? > > Write a detailed email explaining each change that is part of the > patch and why it is there. Attach the patch to that same email. Sorry for the silly question. I'll try to describe this in detail. I hope you are patient enough to read my clumsy (but a bit long) text. First, I rebased the previous patch set and merged three of them. Now they are of three patches. 1. Making SQL parser part of psqlscan independent from psql. Moved psql's baskslsh command stuff out of original psqlscan.l and some psql stuff the parser directly looked are usedvia a set of callback functions, which can be all NULL for usages from other than psql. 2. Making pgbench to use the new psqlscan parser. 3. Changing the way to hold SQL/META commands from array to linked list. The #2 introduced linked list to store SQL multistatement but immediately the caller moves the elements into an array.This patch totally changes the way to linked list. I'll explain in detail of each patch. 1. Make SQL parser part of psqlscan independent from psql. The new file psqlscan_slashbody.l is the moved-out part of psqlscan.l. The prefix of flex symbols are changed to "yys". The psqlscan(_slash).c are container C source which does what is previously done by mainloop.c. This makes other bin/ programs to link psqlscan.o directly, without compilation. 1.1. The psqlscanbody.l It is the SQL part of old psqlscan.l but the difference between them is a bit bothersome to see. I attached the diff between them as "psqlscanbody.l.diff" for convenience. 1.1.1. Switching between two lexers. The SQL parser and the metacommand parser should be alternately callable each other but yyparse() cannot parse from intermediate position of the input text. So I provided functions to remake a parser state using an input text beginning at just after the position already parsed. psql_scan_switch_lexer() and psql_scan_slash_command_switch_lexer() do that for psqlscan.l and psqlscan_backslash.l respectively. I haven't found no variable available to know how many characters the parser has eaten so I decided to have a counter for the usage as PsqlScanState.curpos and keep it pointing just after the last letter already parsed. It is rather a common way to advance it using YY_USER_ACTION and I did so, but the parser occasionally steps back using yyless() when it reads some sequences. Hence yyless() should decrement .curpos, but flex has no hook point for the purpose. I defined my_yyless() (the name should need to be changed) macro to do this. 1.1.2. Detaching psqlscan from psql stuff psqlscan.l had depended on psql through pset.vars, pset.encoding, pset.db and psql_erorr(). I have modified the parser to access them via callback functions. psql_scan_setup() now has the new fourth parameter PsqlScanCallbacks to receive them. The two callbacks of them, standard_strings and error_out wouldn't need detailed explanation, but enc_mblen and get_variable would need. pset.encoding was used only to check if PQmblen is applicable then to be given to PQmblen. So I provided the callback enc_mblen which could be given if strings can be encoded using it. The another one, get_variable() is an equivalent of a wrapper function of GetVariable(). GetVariable() was called directly in lexer definitions and indirectly via escape_variable() in psqlscan.l but escape_variable() was accessing pset.db only in order to check existence of an active connection. I could give it via another callback, but I have moved out the part of the function accessing it because it is accessed only in the function. Finally, I defined the callbacks in common.c. 1.2. The reason for the names psqlscanbody.l and psqlscan_slashbody.l. psqlscan.l was finally included as psqlscan.c by mainloop.c. The reason is postgresql_fe.h must be read before psqlscan.c on some platform, according to the comment at the end of mainloop.c. But it is an annoyance when using it from other bin/ programs. Therefore, I provided dedicated .c files to do so for the two lexer .c files. In order to make the name of the file to be linked from outside psql be psqlscan.o, I have renamed the *.l files to *body.l. 1.3 The psqlscan_int.h file As the new psqlscan.o is used from outside psql, psqlscan.h should have only definitions needed to do so. psqlscan_int.h defines some stuffs used by the lexers but not necessary to use them. 1.4 Other files Other files that are not mentioned so far, Makefile, common.h, psqlscan_slash.h and startup.c would'nt be neccesary to be explained. 2. Making pgbench to use the new psqlscan parser. By the patch #2, pgbench.c gets mainly two major modifications. Splitting of process_commands() and adding backslash-continuation feature to the added function process_backslash_commands(). 2.1. process_commands() has been splitted into two functions The function process_commands() has been splitted into process_commands() and new function process_backslash_commands(). The former has been made to use psqlscan. In contrast to psql, pgbench first checks if the input on focus is a backslash command or not in process_commands(), then parses it using psqlscan if it was not a backslash command. process_backslash_commands() is a cut out from the original process_command() and it is modified to handle backslash continuation. Both functions read multiple lines in a context so the processing context is to be made by the caller (i.e. process_file) and kept throughout all input lines. 2.2 backslash continuation in process_backslash_commands(). The loop over input lines in the old process_commands() is refactored to handle SQL statements and backslash commands separately. The most part of the new process_backslash_commands() is almost the same with the corresponding part in the old process_commands() except indentation. "git diff -b -patience" gives the far-less-noisy differences so I attached it as the "pgbench.c.patient.diff" for convenience. 3. Changing the way to hold SQL/META commands from array to linked list. The patch #2 adds a new member "next" to Command_t. It is necessary to handle SQL multistatements. But the caller process_file holds the commands in an array of Commant_t. This is apparently not in undesirable form. Since the liked list seems simpler for this usage (for me), I decided to unify them to the linked list. Most of this patch is rather simple one by one replacement. I hope this submission note makes sense and this patch becomes easy to read. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2/18/16 6:54 AM, Kyotaro HORIGUCHI wrote: > First, I rebased the previous patch set and merged three of > them. Now they are of three patches. > > > 1. Making SQL parser part of psqlscan independent from psql. > > Moved psql's baskslsh command stuff out of original psqlscan.l > and some psql stuff the parser directly looked are used via a > set of callback functions, which can be all NULL for usages > from other than psql. > > 2. Making pgbench to use the new psqlscan parser. > > 3. Changing the way to hold SQL/META commands from array to > linked list. > > The #2 introduced linked list to store SQL multistatement but > immediately the caller moves the elements into an array. This > patch totally changes the way to linked list. Any takers to review this updated patch? -- -David david@pgmasters.net
Hello David, > Any takers to review this updated patch? I intend to have a look at it, I had a look at a previous instance, but I'm ok if someone wants to proceed. -- Fabien.
Hi Fabien, On 3/14/16 3:27 PM, Fabien COELHO wrote: >> Any takers to review this updated patch? > > I intend to have a look at it, I had a look at a previous instance, but > I'm ok if someone wants to proceed. There's not exactly a long line of reviewers at the moment so if you could do a followup review that would be great. Thanks, -- -David david@pgmasters.net
>> I intend to have a look at it, I had a look at a previous instance, but >> I'm ok if someone wants to proceed. > > There's not exactly a long line of reviewers at the moment so if you > could do a followup review that would be great. Ok. It is in the queue, not for right know, though. -- Fabien.
At Tue, 15 Mar 2016 14:55:52 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1603151455170.23831@sto> > > >> I intend to have a look at it, I had a look at a previous instance, > >> but > >> I'm ok if someone wants to proceed. > > > > There's not exactly a long line of reviewers at the moment so if you > > could do a followup review that would be great. > > Ok. It is in the queue, not for right know, though. Thank you. This patchset needs "make maintainer-clean" before applying because it adds src/bin/psql/psqlscan.c, which is currently generated by flex. All of the patches apply but with many offsets so I rebased them. The two subsidiary diffs are the same to the previous patch set. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
<div dir="ltr">On Thu, Feb 18, 2016 at 6:54 AM, Kyotaro HORIGUCHI <span dir="ltr"><<a href="mailto:horiguchi.kyotaro@lab.ntt.co.jp"target="_blank">horiguchi.kyotaro@lab.ntt.co.jp</a>></span> wrote:<br /><divclass="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1pxsolid rgb(204,204,204);padding-left:1ex">It is the SQL part of old psqlscan.l but the difference between<br/> them is a bit bothersome to see. I attached the diff between them<br /> as "psqlscanbody.l.diff" for convenience.<br/></blockquote></div><br /></div><div class="gmail_extra">This is a huge diff, and I don't see that you'veexplained the reason for all the changes. For example:<br /><br />-/*<br />- * We use a stack of flex buffers to handlesubstitution of psql variables.<br />- * Each stacked buffer contains the as-yet-unread text from one psql variable.<br/>- * When we pop the stack all the way, we resume reading from the outer buffer<br />- * identified by scanbufhandle.<br/>- */<br />-typedef struct StackElem<br />-{<br />- YY_BUFFER_STATE buf; /* flex inputcontrol structure */<br />- char *bufstring; /* data actually being scanned by flex *<br />/<br/>- char *origstring; /* copy of original data, if needed */<br />- char *varname; /* name of variable providing data, or N<br />ULL */<br />- struct StackElem *next;<br />-} StackElem;<br/><br /></div><div class="gmail_extra">Perhaps we could separate this part of the code motion into its own preliminarypatch? I see this went to psqlscan_int.h, but there's no obvious reason for that particular name, and the commentsdon't explain it; in fact, they say that's psqlscan.h. psqlscan_slash.h has the same problem; perhaps moving thingsthere could be another preliminary patch.<br /><br />- yyless(0);<br />+ my_yyless(0);<br /><br clear="all" /></div><div class="gmail_extra">Why do we needto do this? Is "my_" really the best prefix? Is this another change that could be its own patch?<br /><br /></div><divclass="gmail_extra">-- <br /><div class="gmail_signature">Robert Haas<br />EnterpriseDB: <a href="http://www.enterprisedb.com"target="_blank">http://www.enterprisedb.com</a><br />The Enterprise PostgreSQL Company</div></div></div>
Thank you for the comment, but could you please tell me what kind of criteria should I take to split this patch? The discussion about splitting criteria is in the following reply (in the sentence begins with "By the way"). At Wed, 16 Mar 2016 11:57:25 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+Tgmobvu1aBRdRaKvqMVp0ifhQpgvnOEZa2Rg3AHfRWPE5-Tg@mail.gmail.com> > On Thu, Feb 18, 2016 at 6:54 AM, Kyotaro HORIGUCHI < > horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > It is the SQL part of old psqlscan.l but the difference between > > them is a bit bothersome to see. I attached the diff between them > > as "psqlscanbody.l.diff" for convenience. > > > > This is a huge diff, and I don't see that you've explained the reason for > all the changes. For example: > > -/* > - * We use a stack of flex buffers to handle substitution of psql variables. > - * Each stacked buffer contains the as-yet-unread text from one psql > variable. > - * When we pop the stack all the way, we resume reading from the outer > buffer > - * identified by scanbufhandle. > - */ > -typedef struct StackElem > -{ > - YY_BUFFER_STATE buf; /* flex input control structure */ > - char *bufstring; /* data actually being scanned by > flex * > / > - char *origstring; /* copy of original data, if needed > */ > - char *varname; /* name of variable providing data, > or N > ULL */ > - struct StackElem *next; > -} StackElem; > > Perhaps we could separate this part of the code motion into its own > preliminary patch? The "preliminary patch" seems to mean the same thing with the first patch in the following message. http://www.postgresql.org/message-id/20160107.173603.31865003.horiguchi.kyotaro@lab.ntt.co.jp The commit log says as the following. | Subject: [PATCH 1/5] Prepare for sharing psqlscan with pgbench. | | Lexer is no longer compiled as a part of mainloop.c. The slash | command lexer is brought out from the command line lexer. psql_scan | no longer accesses directly to pset struct and VariableSpace. This | change allows psqlscan to be used without these things. The following two patches are the follwings. | Subject: [PATCH 2/5] Change the access method to shell variables | | Access to shell variables via a callback function so that the lexer no | longer need to be aware of VariableSpace. | Subject: [PATCH 3/5] Detach common.c from psqlscan | | Call standard_strings() and psql_error() via callback functions so | that psqlscan.l can live totally without common.c stuff. They are | bundled up with get_variable() callback in one struct since now we | have as many as four callback functions. These patches are made so as to keep the compilable and workable state of the source files. It might be a bit more readable if unshackled from such restriction. > I see this went to psqlscan_int.h, but there's no > obvious reason for that particular name, and the comments don't explain it; I assumed that is a convention of naming by looking libpq-int.h (though it doesn't use underscore, but hyphen). But the file needs a comment like libpq-int.h. I'll rename it and add such comment to it. By the way, the patch set mentioned above gives such preliminary steps separately. Should I make the next patch set based on the older one which is separating the preliminary steps? Or in new splitting criteria that is free from the compilable-workable restriction is preferable? > in fact, they say that's psqlscan.h. psqlscan_slash.h has the same > problem; perhaps moving things there could be another preliminary patch. It is also included in the 0001 patch. > - yyless(0); > + my_yyless(0); > > Why do we need to do this? Is "my_" really the best prefix? Is this > another change that could be its own patch? Oops! Sorry for the silly name. I was not able to think up a proper name for it. Does psqlscan_yyless seems good? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Mar 17, 2016 at 4:12 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Thank you for the comment, but could you please tell me what kind > of criteria should I take to split this patch? The discussion > about splitting criteria is in the following reply (in the > sentence begins with "By the way"). Well, I'm trying to find a piece of this patch that is small enough that I can understand it and in good enough shape that I can commit it independently, but I am having some difficulty with that. I keep hoping some other committer is going to come along and be able to grok this well enough to apply it based on what you've already done, but so far it seems to be the all-me show. > These patches are made so as to keep the compilable and workable > state of the source files. It might be a bit more readable if > unshackled from such restriction. Keeping it compilable and workable after each patch is essential, but the first patch is still big and doing a lot of stuff. I'm wondering if it can be further decomposed. >> I see this went to psqlscan_int.h, but there's no >> obvious reason for that particular name, and the comments don't explain it; > > I assumed that is a convention of naming by looking libpq-int.h > (though it doesn't use underscore, but hyphen). But the file > needs a comment like libpq-int.h. I'll rename it and add such > comment to it. OK. >> - yyless(0); >> + my_yyless(0); >> >> Why do we need to do this? Is "my_" really the best prefix? Is this >> another change that could be its own patch? > > Oops! Sorry for the silly name. I was not able to think up a > proper name for it. Does psqlscan_yyless seems good? That does sound better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Well, I'm trying to find a piece of this patch that is small enough > that I can understand it and in good enough shape that I can commit it > independently, but I am having some difficulty with that. I keep > hoping some other committer is going to come along and be able to grok > this well enough to apply it based on what you've already done, but so > far it seems to be the all-me show. This is mostly a flex/bison hack, isn't it? If you like I'll take it. regards, tom lane
On Fri, Mar 18, 2016 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Well, I'm trying to find a piece of this patch that is small enough >> that I can understand it and in good enough shape that I can commit it >> independently, but I am having some difficulty with that. I keep >> hoping some other committer is going to come along and be able to grok >> this well enough to apply it based on what you've already done, but so >> far it seems to be the all-me show. > > This is mostly a flex/bison hack, isn't it? If you like I'll take it. I would be delighted if you would. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Mar 18, 2016 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This is mostly a flex/bison hack, isn't it? If you like I'll take it. > I would be delighted if you would. I've committed changes equivalent to Horiguchi-san's 0001 and 0002 patches, though rather different in detail. I concur with the upthread opinion that 0003 doesn't seem really necessary. This solves the problem of allowing SQL commands in scripts to span lines, but it doesn't do anything about backslash commands, which was the original point according to the thread title ;-). I can think of two somewhat-independent changes we might want to make at this point, since we're breaking exact script compatibility for 9.6 anyway: * Allow multiple backslash commands on one line, eg\set foo 5 \set bar 6 The main reason for that is that psql allows it, and one of the things we're supposedly trying to do here is reduce the behavioral distance between psql and pgbench parsing rules. * Allow backslash commands to span lines, probably by adopting the rule that backslash immediately followed by newline is to be ignored within a backslash command. This would not be compatible with psql, though, at least not unless we wanted to change psql too. I don't have strong feelings about either. regards, tom lane
On Sunday, March 20, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* Allow backslash commands to span lines, probably by adopting the
rule that backslash immediately followed by newline is to be ignored
within a backslash command. This would not be compatible with psql,
though, at least not unless we wanted to change psql too.
This would be appreciated. The main case I find wanting this is writing out long \copy expressions. Solving really complex ones using temporary tables works but being able to spread it out over multiple lines would be a welcomed addition.
David J.
On Sun, Mar 20, 2016 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Mar 18, 2016 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> This is mostly a flex/bison hack, isn't it? If you like I'll take it. > >> I would be delighted if you would. > > I've committed changes equivalent to Horiguchi-san's 0001 and 0002 > patches, though rather different in detail. I concur with the upthread > opinion that 0003 doesn't seem really necessary. > > This solves the problem of allowing SQL commands in scripts to span > lines, ... Excellent. > but it doesn't do anything about backslash commands, which was > the original point according to the thread title ;-). Wait, was it really? I'd been thinking it was mostly to continue queries, not metacommands, but maybe I missed the boat. > I can think of > two somewhat-independent changes we might want to make at this point, > since we're breaking exact script compatibility for 9.6 anyway: > > * Allow multiple backslash commands on one line, eg > \set foo 5 \set bar 6 > The main reason for that is that psql allows it, and one of the things > we're supposedly trying to do here is reduce the behavioral distance > between psql and pgbench parsing rules. This seems to me to be going in the wrong direction. > * Allow backslash commands to span lines, probably by adopting the > rule that backslash immediately followed by newline is to be ignored > within a backslash command. This would not be compatible with psql, > though, at least not unless we wanted to change psql too. This might have some point to it, though, if you want to say \set i <incredibly long expression not easily contained on a single line> -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Mar 20, 2016 at 1:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This solves the problem of allowing SQL commands in scripts to span >> lines, ... > Excellent. >> but it doesn't do anything about backslash commands, which was >> the original point according to the thread title ;-). > Wait, was it really? I'd been thinking it was mostly to continue > queries, not metacommands, but maybe I missed the boat. Nah, you're right, it was about continuing queries. Still, we've had complaints about the other thing too, and I think if we're going to change anything here, we should change it all in the same release. >> I can think of >> two somewhat-independent changes we might want to make at this point, >> since we're breaking exact script compatibility for 9.6 anyway: >> >> * Allow multiple backslash commands on one line, eg >> \set foo 5 \set bar 6 >> The main reason for that is that psql allows it, and one of the things >> we're supposedly trying to do here is reduce the behavioral distance >> between psql and pgbench parsing rules. > This seems to me to be going in the wrong direction. Um, why exactly? That psql behavior is of really ancient standing, and we have not had complaints about it. >> * Allow backslash commands to span lines, probably by adopting the >> rule that backslash immediately followed by newline is to be ignored >> within a backslash command. This would not be compatible with psql, >> though, at least not unless we wanted to change psql too. > This might have some point to it, though, if you want to say \set i > <incredibly long expression not easily contained on a single line> Shall I make a patch that allows backslash-newline to be handled this way in both psql and pgbench backslash commands? At least in psql, there would be no backwards compatibility problem, since right now the case just fails: regression=# \set x y \ Invalid command \. Try \? for help. regards, tom lane
On Mon, Mar 21, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Wait, was it really? I'd been thinking it was mostly to continue >> queries, not metacommands, but maybe I missed the boat. > > Nah, you're right, it was about continuing queries. Still, we've had > complaints about the other thing too, and I think if we're going to > change anything here, we should change it all in the same release. Fair enough. >>> * Allow multiple backslash commands on one line, eg >>> \set foo 5 \set bar 6 >>> The main reason for that is that psql allows it, and one of the things >>> we're supposedly trying to do here is reduce the behavioral distance >>> between psql and pgbench parsing rules. > >> This seems to me to be going in the wrong direction. > > Um, why exactly? That psql behavior is of really ancient standing, and > we have not had complaints about it. I think that's mostly because the psql metacommands are ridiculously impoverished. I'm guessing that pgbench's expression language is eventually going to support strings as a data type, for example, and those strings might want to contain backlashes. There's basically no value in cramming multiple metacommands onto a single line, but there is the risk of creating unnecessary lexing or parsing difficulties in the future. >>> * Allow backslash commands to span lines, probably by adopting the >>> rule that backslash immediately followed by newline is to be ignored >>> within a backslash command. This would not be compatible with psql, >>> though, at least not unless we wanted to change psql too. > >> This might have some point to it, though, if you want to say \set i >> <incredibly long expression not easily contained on a single line> > > Shall I make a patch that allows backslash-newline to be handled this way > in both psql and pgbench backslash commands? At least in psql, there > would be no backwards compatibility problem, since right now the case > just fails: > > regression=# \set x y \ > Invalid command \. Try \? for help. I certainly don't object to such a patch, although if it's between you writing that patch and you getting Tomas Vondra's multivariate statistics stuff committed, I'll take the latter. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 21, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Um, why exactly? That psql behavior is of really ancient standing, and >> we have not had complaints about it. > I think that's mostly because the psql metacommands are ridiculously > impoverished. I'm guessing that pgbench's expression language is > eventually going to support strings as a data type, for example, and > those strings might want to contain backlashes. Sure, but once we define strings, they'll be quoted, and the behavior can/should/will be different for a backslash inside quotes than one outside them --- as it is already in psql. Moreover, if you're on board with the backslash-newline proposal, you've already bought into the idea that backslashes outside quotes will behave differently from those inside. >> Shall I make a patch that allows backslash-newline to be handled this way >> in both psql and pgbench backslash commands? > I certainly don't object to such a patch, although if it's between you > writing that patch and you getting Tomas Vondra's multivariate > statistics stuff committed, I'll take the latter. :-) I'll get to that, but I'd like to get this area fully dealt with before context-swapping to that one. regards, tom lane
On Mon, Mar 21, 2016 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Mar 21, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Um, why exactly? That psql behavior is of really ancient standing, and >>> we have not had complaints about it. > >> I think that's mostly because the psql metacommands are ridiculously >> impoverished. I'm guessing that pgbench's expression language is >> eventually going to support strings as a data type, for example, and >> those strings might want to contain backlashes. > > Sure, but once we define strings, they'll be quoted, and the behavior > can/should/will be different for a backslash inside quotes than one > outside them --- as it is already in psql. Moreover, if you're on > board with the backslash-newline proposal, you've already bought into > the idea that backslashes outside quotes will behave differently from > those inside. Mmph. I just don't see any benefit in being able to start a command in the middle of a line. Even if it's not dangerous to future things we might want o implement, I'd argue it's still poor style, and of no practical utility. But if I lose that argument, then I do. >>> Shall I make a patch that allows backslash-newline to be handled this way >>> in both psql and pgbench backslash commands? > >> I certainly don't object to such a patch, although if it's between you >> writing that patch and you getting Tomas Vondra's multivariate >> statistics stuff committed, I'll take the latter. :-) > > I'll get to that, but I'd like to get this area fully dealt with before > context-swapping to that one. Understood. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Mmph. I just don't see any benefit in being able to start a command > in the middle of a line. I agree that there's no functional benefit; it's a matter of consistency. In particular, psql has always allowed you to write multiple SQL commands per line: SELECT 2+2; SELECT x FROM tab; SELECT y FROM othertab; and as of yesterday pgbench supports that as well. So allowing multiple backslash commands on a line improves consistency both with psql and with pgbench's own behavior, IMV. regards, tom lane
So I looked into this, and found that persuading psql to let backslash commands cross line boundaries is a much bigger deal than just fixing the lexer. The problem is that MainLoop would need to grow an understanding of having received only a partial backslash command and needing to go back to readline() for another line. And probably HandleSlashCmds would need to be changed to stop parsing and back out without doing anything when it hits backslash-newline. It's do-able no doubt, but it's not going to be a small and simple patch. However, since pgbench is already set up to slurp the entire file and lex it in one go, it is just a trivial adjustment to the lexer rules in that program. The only thing I found that made it complicated is that syntax_error() had too simplistic an understanding of how to position the error cursor usefully, so that needed a bit of work. I think it'd be okay to commit this without fixing psql at the same time; if you try it in psql you get an error, so on that side it's unimplemented behavior rather than an actual incompatibility. Perhaps somebody will be motivated to fix it later, but I'm not going to spend that kind of time on it right now. I've not written a docs update, but otherwise I think this is committable. Comments? regards, tom lane diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l index 825dacc..12b5f7e 100644 *** a/src/bin/pgbench/exprscan.l --- b/src/bin/pgbench/exprscan.l *************** *** 6,17 **** * * This lexer supports two operating modes: * ! * In INITIAL state, just parse off whitespace-separated words (this mode ! * is basically equivalent to strtok(), which is what we used to use). * * In EXPR state, lex for the simple expression syntax of exprparse.y. * ! * In either mode, stop upon hitting newline or end of string. * * Note that this lexer operates within the framework created by psqlscan.l, * --- 6,18 ---- * * This lexer supports two operating modes: * ! * In INITIAL state, just parse off whitespace-separated words. (This mode ! * is basically equivalent to strtok(), which is what we used to use.) * * In EXPR state, lex for the simple expression syntax of exprparse.y. * ! * In either mode, stop upon hitting newline, end of string, or unquoted ! * backslash (except that backslash-newline is silently swallowed). * * Note that this lexer operates within the framework created by psqlscan.l, * *************** extern void expr_yyset_column(int column *** 61,69 **** alpha [a-zA-Z_] digit [0-9] alnum [a-zA-Z0-9_] ! /* {space} + {nonspace} + {newline} should cover all characters */ space [ \t\r\f\v] ! nonspace [^ \t\r\f\v\n] newline [\n] /* Exclusive states */ --- 62,71 ---- alpha [a-zA-Z_] digit [0-9] alnum [a-zA-Z0-9_] ! /* {space} + {nonspace} + {backslash} + {newline} should cover all characters */ space [ \t\r\f\v] ! nonspace [^ \t\r\f\v\\\n] ! backslash [\\] newline [\n] /* Exclusive states */ *************** newline [\n] *** 98,103 **** --- 100,113 ---- {space}+ { /* ignore */ } + {backslash}{newline} { /* ignore */ } + + {backslash} { + /* do not eat, and report end of command */ + yyless(0); + return 0; + } + {newline} { /* report end of command */ last_was_newline = true; *************** newline [\n] *** 130,143 **** return FUNCTION; } {newline} { /* report end of command */ last_was_newline = true; return 0; } - {space}+ { /* ignore */ } - . { /* * must strdup yytext so that expr_yyerror_more doesn't --- 140,161 ---- return FUNCTION; } + {space}+ { /* ignore */ } + + {backslash}{newline} { /* ignore */ } + + {backslash} { + /* do not eat, and report end of command */ + yyless(0); + return 0; + } + {newline} { /* report end of command */ last_was_newline = true; return 0; } . { /* * must strdup yytext so that expr_yyerror_more doesn't *************** expr_yyerror_more(yyscan_t yyscanner, co *** 177,183 **** /* * While parsing an expression, we may not have collected the whole line * yet from the input source. Lex till EOL so we can report whole line. ! * (If we're at EOF, it's okay to call yylex() an extra time.) */ if (!last_was_newline) { --- 195,201 ---- /* * While parsing an expression, we may not have collected the whole line * yet from the input source. Lex till EOL so we can report whole line. ! * (If we're at backslash/EOF, it's okay to call yylex() an extra time.) */ if (!last_was_newline) { diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 4196b0e..e947f77 100644 *** a/src/bin/pgbench/pgbench.c --- b/src/bin/pgbench/pgbench.c *************** syntax_error(const char *source, int lin *** 2413,2426 **** fprintf(stderr, "\n"); if (line != NULL) { ! fprintf(stderr, "%s\n", line); ! if (column >= 0) { int i; ! for (i = 0; i < column; i++) ! fprintf(stderr, " "); ! fprintf(stderr, "^ error found here\n"); } } exit(1); --- 2413,2455 ---- fprintf(stderr, "\n"); if (line != NULL) { ! /* ! * Multi-line backslash commands make this harder than you'd think; we ! * have to identify which line to put the error cursor on. So print ! * one line at a time. ! */ ! for (;;) { + const char *nlpos = strchr(line, '\n'); + int len; int i; ! if (nlpos) ! { ! /* ! * It's tempting to use fprintf("%.*s"), but that can fail if ! * glibc has a different idea of the encoding than we do. ! */ ! len = nlpos - line + 1; ! for (i = 0; i < len; i++) ! fputc(line[i], stderr); ! } ! else ! { ! len = column + 1; /* ensure we print ^ if not done */ ! fprintf(stderr, "%s\n", line); ! } ! if (column >= 0 && column < len) ! { ! for (i = 0; i < column; i++) ! fputc(' ', stderr); ! fprintf(stderr, "^ error found here\n"); ! } ! column -= len; ! if (nlpos) ! line = nlpos + 1; ! else ! break; } } exit(1);
First, thank you all involved, and thank you for polishing this and committing, Tom. At Mon, 21 Mar 2016 17:15:18 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <1596.1458594918@sss.pgh.pa.us> > So I looked into this, and found that persuading psql to let backslash > commands cross line boundaries is a much bigger deal than just fixing the > lexer. The problem is that MainLoop would need to grow an understanding > of having received only a partial backslash command and needing to go back > to readline() for another line. And probably HandleSlashCmds would need > to be changed to stop parsing and back out without doing anything when it > hits backslash-newline. It's do-able no doubt, but it's not going to be a > small and simple patch. I agree. > However, since pgbench is already set up to slurp the entire file and > lex it in one go, it is just a trivial adjustment to the lexer rules in > that program. The only thing I found that made it complicated is that > syntax_error() had too simplistic an understanding of how to position > the error cursor usefully, so that needed a bit of work. The modified lexer treats {backslash}{newline} as the same with whitespace and it looks ok for me. > /test.sql:6: syntax error, unexpected FUNCTION, expecting $end in command "set" > \set naccounts\ > 10x0 > ^ error found here The error message seems fine. (The mysterious message would be another problem.) But it prints the lines after the error indicator. > \set naccounts\ > 10x0\ > ^ error found here > * :scale I suppose that the trailing lines might be better not be printed. (gcc doesn't seem to do so.) > I think it'd be okay to commit this without fixing psql at the same time; > if you try it in psql you get an error, so on that side it's unimplemented > behavior rather than an actual incompatibility. Perhaps somebody will be > motivated to fix it later, but I'm not going to spend that kind of time > on it right now. > > I've not written a docs update, but otherwise I think this is committable. > Comments? > > regards, tom lane regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Tom Lane wrote: > So I looked into this, and found that persuading psql to let backslash > commands cross line boundaries is a much bigger deal than just fixing the > lexer. The problem is that MainLoop would need to grow an understanding > of having received only a partial backslash command and needing to go back > to readline() for another line. And probably HandleSlashCmds would need > to be changed to stop parsing and back out without doing anything when it > hits backslash-newline. It's do-able no doubt, but it's not going to be a > small and simple patch. FWIW, I would love to see this in some future release: particularly for \copy lines with large queries, the limitation that only single-line input is accepted is very annoying -- much more so when the query comes pasted from some other input, or when you have a file with a query and just want to add a quick \copy prefix. (Hmm, a much simpler alternative would be to allow \g-like syntax, i.e. the query is already in the query buffer and the \copy line just specifies the output file. In fact, for queries in history this is much more convenient than the current syntax ...) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services