Thread: pgbench - allow to store select results into variables
Hello devs, I mentionned my intention to add some features to pgbench back in March: https://www.postgresql.org/message-id/alpine.DEB.2.10.1603301618570.5677@sto The attached patch adds an \into meta command to store results of preceding SELECTs into pgbench variables, so that they can be reused afterwards. The feature is useful to make more realistic scripts, currently pgbench script cannot really interact with the database as results are discarded. The chosen syntax is easy to understand and the implementation is quite light, with minimal impact on the code base. I think that this is a reasonnable compromise. The SELECTs must yield exactly one row, the number of variables must be less than the number of columns. Also attached a set of test scripts, especially to trigger various error cases. -- Fabien.
Attachment
Hi
2016-07-09 10:20 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello devs,
I mentionned my intention to add some features to pgbench back in March:
https://www.postgresql.org/message-id/alpine.DEB.2.10.1603301618570.5677@sto
The attached patch adds an \into meta command to store results of preceding SELECTs into pgbench variables, so that they can be reused afterwards.
The feature is useful to make more realistic scripts, currently pgbench script cannot really interact with the database as results are discarded.
The chosen syntax is easy to understand and the implementation is quite light, with minimal impact on the code base. I think that this is a reasonnable compromise.
The SELECTs must yield exactly one row, the number of variables must be less than the number of columns.
Also attached a set of test scripts, especially to trigger various error cases.
Why you are introducing \into and not \gset like psql does?
Regards
Pavel
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Pavel, > Why you are introducing \into and not \gset like psql does? Good question. The \into syntax I implemented is more generic, you can send a bunch of queries together and extract the results, which makes sense from a client perspective where reducing latency is important: SELECT 1, 2 \; SELECT 3; \into one two three However "gset" only works on the last SELECT and if all columns have a name. This feature probably makes sense interactively, but for a script it seems more useful to allow batch processing and collect results afterwards. Also a more subjective argument: I do not like the gset automagic naming feature. I got more inspired by PL/pgSQL and ECPG which both have an "into" syntax with explicit variable names that let nothing to guessing. I like things to be simple and explicit, hence the proposed into. -- Fabien.
2016-07-09 11:19 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,Why you are introducing \into and not \gset like psql does?
Good question.
The \into syntax I implemented is more generic, you can send a bunch of queries together and extract the results, which makes sense from a client perspective where reducing latency is important:
SELECT 1, 2 \; SELECT 3;
\into one two three
I understand, but it looks little bit scary - but the argument of reducing latency can be valid
However "gset" only works on the last SELECT and if all columns have a name. This feature probably makes sense interactively, but for a script it seems more useful to allow batch processing and collect results afterwards.
Also a more subjective argument: I do not like the gset automagic naming feature. I got more inspired by PL/pgSQL and ECPG which both have an "into" syntax with explicit variable names that let nothing to guessing. I like things to be simple and explicit, hence the proposed into.
the gset was originally designed differently - but now it is here - and it is not practical to have two different, but pretty similar statements in psql and pgbench.
Regards
Pavel
--
Fabien.
>> Also a more subjective argument: I do not like the gset automagic naming >> feature. I got more inspired by PL/pgSQL and ECPG which both have an "into" >> syntax with explicit variable names that let nothing to guessing. I like >> things to be simple and explicit, hence the proposed into. > > the gset was originally designed differently - but now it is here - and it > is not practical to have two different, but pretty similar statements in > psql and pgbench. In my view they are unrelated: on the one hand "gset" is really an interactive feature, where typing is costly so "automagic" might make sense; on the other hand "into" is a scripting feature, where you want to understand the code and have something as readable as possible, without surprises. The commands are named differently and behave differently. If someone thinks that "gset" is a good idea for pgbench, which I don't, it could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG, makes more sense for scripting. -- Fabien.
On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > If someone thinks that "gset" is a good idea for pgbench, which I don't, it > could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG, > makes more sense for scripting. I agree: I like \into. But: > SELECT 1, 2 \; SELECT 3; > \into one two three I think that's pretty weird. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> If someone thinks that "gset" is a good idea for pgbench, which I don't, it >> could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG, >> makes more sense for scripting. > I agree: I like \into. > But: >> SELECT 1, 2 \; SELECT 3; >> \into one two three > I think that's pretty weird. Yeah, that's seriously nasty action-at-a-distance in my view. I'd be okay with SELECT 1, 2 \into one two SELECT 3 \into three but I do not think that a metacommand on a following line should retroactively affect the execution of a prior command, much less commands before the last one. Even if this happens to be easy to do in pgbench's existing over-contorted logic, it's tremendously confusing to the user; and it might be much less easy if we try to refactor that logic. And I'm with Pavel on this: it should work exactly like \gset. Inventing \into to do almost the same thing in a randomly different way exhibits a bad case of NIH syndrome. Sure, you can argue about how it's not quite the same use-case and so you could micro-optimize by doing it differently, but that's ignoring the cognitive load on users who have to remember two different commands. Claiming that plpgsql's SELECT INTO is a closer analogy than psql's \gset is quite bogus, too: the environment is different (client side vs server side, declared vs undeclared target variables), and the syntax is different (backslash or not, commas or not, just for starters). I note also that we were talking a couple months ago about trying to align psql and pgbench backslash commands more closely. This would not be a good step in that direction. regards, tom lane
On Wed, Jul 13, 2016 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> If someone thinks that "gset" is a good idea for pgbench, which I don't, it >>> could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG, >>> makes more sense for scripting. > >> I agree: I like \into. > >> But: > >>> SELECT 1, 2 \; SELECT 3; >>> \into one two three > >> I think that's pretty weird. > > Yeah, that's seriously nasty action-at-a-distance in my view. I'd be okay > with > > SELECT 1, 2 \into one two > SELECT 3 \into three > > but I do not think that a metacommand on a following line should > retroactively affect the execution of a prior command, much less commands > before the last one. Even if this happens to be easy to do in pgbench's > existing over-contorted logic, it's tremendously confusing to the user; > and it might be much less easy if we try to refactor that logic. > > And I'm with Pavel on this: it should work exactly like \gset. Inventing > \into to do almost the same thing in a randomly different way exhibits a > bad case of NIH syndrome. Sure, you can argue about how it's not quite > the same use-case and so you could micro-optimize by doing it differently, > but that's ignoring the cognitive load on users who have to remember two > different commands. Claiming that plpgsql's SELECT INTO is a closer > analogy than psql's \gset is quite bogus, too: the environment is > different (client side vs server side, declared vs undeclared target > variables), and the syntax is different (backslash or not, commas or not, > just for starters). I note also that we were talking a couple months ago > about trying to align psql and pgbench backslash commands more closely. > This would not be a good step in that direction. True, but I'd still argue that \into is a lot more readable than \gset. Maybe both programs should support both commands. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jul 13, 2016 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I note also that we were talking a couple months ago >> about trying to align psql and pgbench backslash commands more closely. >> This would not be a good step in that direction. > True, but I'd still argue that \into is a lot more readable than > \gset. Maybe both programs should support both commands. Meh, personal preference there no doubt. I'd be okay with both programs supporting both commands, but the \into command as described here would be quite unreasonable to implement in psql. It needs to act more like a semicolon-substitute, as \g and the rest of its family do. (I think I'd also lobby for spelling it \ginto.) regards, tom lane
Hello Robert, > I agree: I like \into. Great! > But: > >> SELECT 1, 2 \; SELECT 3; >> \into one two three > > I think that's pretty weird. I agree that it is weird, but I do not think that it is bad. Sending a batch of requests is a feature of libpq which is accessible through pgbench by using "\;", although the fact is not documented. It makes sense for a client to send independent queries together so as to reduce latency. From pgbench perspective, I would find it pretty weird as well that one can send several queries together but could only read the answer from... say the first one, and the others would be lost. From an implementation perspective doing it is straightforward, and rejecting it would require some more logic. An obvious nicer feature would be to allow intermixing \into & \; but ISTM that it would require to rethink deeply pgbench lexing/parsing which has just been merged with psql by Tom and others... If I had not pointed out the fact that it works, maybe no one would have noticed... so a compromise could be not to advertise the fact that it works (as the \; feature is not advertised anyway), but letting the implementation do it because it is simple and may be useful, and rephrase the documentation so that it is just about the previous select and not previous select*S*? -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > Sending a batch of requests is a feature of libpq which is accessible > through pgbench by using "\;", although the fact is not documented. It > makes sense for a client to send independent queries together so as to > reduce latency. You're putting an awful lot of weight on an unsupported assertion about latency. If a user cares about that, why would she not simply merge the commands into "SELECT 1, 2, 3 \into one two three" ? And I still say that what you're proposing might be easy right now, but it might also be next door to impossible in a refactored implementation. I don't think we should go there on the basis of a weak argument about latency. \into should retrieve data only from the last PGresult. regards, tom lane
Hello Tom, >> Sending a batch of requests is a feature of libpq which is accessible >> through pgbench by using "\;", although the fact is not documented. It >> makes sense for a client to send independent queries together so as to >> reduce latency. > > You're putting an awful lot of weight on an unsupported assertion about > latency. For support, I would submit that many applications today are web/mobile apps which are quite sensitive to latency. See for instance the Fast 2016 white paper by people at Google which discusses in depth "tail latency" as a key measure of quality for IO systems used for live services, or the new HTTP2 protocole (based on Google spdy) which aims at reducing latency through multiple features (compression, serveur push, pipelining...). > If a user cares about that, why would she not simply merge the > commands into "SELECT 1, 2, 3 \into one two three" ? Because the code would look pretty awful: SELECT (SELECT first data FROM ... JOIN ... WHERE ... ), (SELECT second data FROM ... JOIN ... WHERE ...), (SELECT third data FROM ... JOIN ... WHERE ...); > And I still say that what you're proposing might be easy right now, but > it might also be next door to impossible in a refactored implementation. I do not understand. There is one "multi" sql-command followed by a meta command, and somehow a refactor implementation would have troubled with that? > I don't think we should go there on the basis of a weak argument about > latency. \into should retrieve data only from the last PGresult. This looks pretty arbitrary: Why not the first one, as I asked for it first? Anyway, why allowing to send several queries if you are not allowed to extract their results. -- Fabien.
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> If someone thinks that "gset" is a good idea for pgbench, which I don't, it
>> could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG,
>> makes more sense for scripting.
> I agree: I like \into.
> But:
>> SELECT 1, 2 \; SELECT 3;
>> \into one two three
> I think that's pretty weird.
Yeah, that's seriously nasty action-at-a-distance in my view. I'd be okay
with
SELECT 1, 2 \into one two
SELECT 3 \into three
but I do not think that a metacommand on a following line should
retroactively affect the execution of a prior command, much less commands
before the last one.
You need a test and a definition for:
SELECT 1, 2;
SELECT 3;
\into x, y, z
It should fail - "too many variables" - right?
David J.
Hello Tom, >>> SELECT 1, 2 \; SELECT 3; >>> \into one two three > > Yeah, that's seriously nasty action-at-a-distance in my view. I'd be okay > with > > SELECT 1, 2 \into one two > SELECT 3 \into three ISTM that is not the same, because then you would have two queries (over the network) instead of one, so you pay the network latency twice? > but I do not think that a metacommand on a following line should > retroactively affect the execution of a prior command, much less commands > before the last one. Nope. The meta-command applies to the preceeding SQL command... which happens to be a \;-compound command. ISTM that all is logically fine. Some motivation about the feature (not its syntax or implementation), from a benchmarking perspective: - clients MUST read the server answers and possibly reuse them, hence a proposed \into feature. Discarding the answer as pgbench does not really comply with typical benchmark rules, eg from tpc-b: """1.3.2 Each transaction shall return to the driver the Account_Balance resulting from successful commit of the transaction. Comment: It is the intent of this clause that the account balance in the database be returned to the driver, i.e., thatthe application retrieve the account balance.""" - latency is important to applications (eg web applications), thus the ability to compound statements is a good thing. However, if in a bench one can compound statements but not retrieve their values, it fails the previous "retrieve the value" requirement. So basically I wish to avoid splitting compound queries and paying latency just because of a lack of syntax to do the right thing, hence the proposed feature which can retrieve data from various parts of a compound statement. -- Fabien.
Hello Tom, > Yeah, that's seriously nasty action-at-a-distance in my view. I'd be okay > with > > SELECT 1, 2 \into one two > SELECT 3 \into three After giving it some thoughts, it could work on compound commands if \into does not close the current sql command. Something like: SELECT 1, 2 ; \into one two SELECT 3 ; \into three => 2 SQL commands SELECT 1, 2 \; \into one two SELECT 3 ; \into three => 1 compound SQL command I'd like \; or ; to stay mandatory as separators, though. Or at least to be allowed. I'm not quite sure how it could be implemented, though. > And I'm with Pavel on this: it should work exactly like \gset. Hmmm. Maybe I'll do that thing in the end, but I really think than gset only makes sense in interactive context, and is pretty ugly for scripting. > Inventing \into to do almost the same thing in a randomly different way > exhibits a bad case of NIH syndrome. No, it is a question of design suitable to programming: > SELECT 1, 2 \gset v could not set variable "?column?" > Sure, you can argue about how it's not quite the same use-case Indeed, that is my point. > and so you could micro-optimize by doing it differently, No, the underlying implementation is basically the same. > but that's ignoring the cognitive load on users who have to remember two > different commands. I do not buy this argument: It is easier for me to remember that keyword INTO happens to do the same thing the same way in PL/pgSQL and ECPG, although with slightly different syntaxes, than to have to remember psql-specific "gset" which does the same thing but in quite a different way, because it means both another name and another concept. > Claiming that plpgsql's SELECT INTO is a closer analogy than psql's > \gset is quite bogus, too: I disagree. I mentionned ECPG as well. Both ECPG & PLpgSQL are "programming", psql is interactive. > the environment is different (client side vs server side, ECPG is client side. I think that the side does not matter. > declared vs undeclared target variables), Sure, the "gset" hack is only possible for a language without variable declarations... but that does not make it a good idea. > and the syntax is different (backslash or not, commas or not, just for > starters). Sure, different languages do not have the same syntax. -- Fabien.
Hello again, > I'd be okay with > > SELECT 1, 2 \into one two > SELECT 3 \into three Here is a v2 with more or less this approach, although \into does not end the query, but applies to the current or last sql command. A query is still terminated with a ";". Now it handles things like : -- standard sql command SELECT balance FROM bank WHERE id=1; \into balance -- compound sql command, three == 3. SELECT 1, 2 \; SELECT 3 ; \into three -- compound query with 2 selects & 3 variables SELECT i \into one FROM generate_series(1, 1) AS i \; SELECT i+1, i+2 \into two three FROM generate_series(1, 1) AS i ; I had to add a few lines in psql scanner to count "\;", so the parsing logic is a little more complicated than before. -- Fabien.
Attachment
Hi Fabien, On 2016/07/16 1:33, Fabien COELHO wrote: > Here is a v2 with more or less this approach, although \into does not end > the query, but applies to the current or last sql command. A query is > still terminated with a ";". This patch needs to be rebased because of commit 64710452 (on 2016-08-19). Thanks, Amit
Hello Amit, > This patch needs to be rebased because of commit 64710452 (on 2016-08-19). Here it is! -- Fabien.
Attachment
Hi Fabien, On 2016/09/03 2:47, Fabien COELHO wrote: >> This patch needs to be rebased because of commit 64710452 (on 2016-08-19). > > Here it is! Thanks for sending the updated patch. Here are some (mostly cosmetic) comments. Before the comments, let me confirm whether the following result is odd (a bug) or whether I am just using it wrong: Custom script looks like: \; select a \into a from tab where a = 1; \set i debug(:a) I get the following error: undefined variable "a" client 0 aborted in state 1; execution of meta-command failed Even the following script gives the same result: \; select a from a where a = 1; \into a \set t debug(:a) However with the following there is no error and a gets set to 2: select a from a where a = 1 \; select a+1 from a where a = 1; \into a \set t debug(:a) Comments on the patch follow: + <listitem> + <para> + Stores the first fields of the resulting row from the current or preceding + <command>SELECT</> command into these variables. Any command returning rows ought to work, no? For example, the following works: update a set a = a+1 returning *; \into a \set t debug(:a) - char *line; /* text of command line */ + char *line; /* first line for short display */ + char *lines; /* full multi-line text of command */ When I looked at this and related hunks (and also the hunks related to semicolons), it made me wonder whether this patch contains two logical changes. Is this a just a refactoring for the \into implementation or does this provide some new externally visible feature? char *argv[MAX_ARGS]; /* command word list */ + int compound; /* last compound command (number of \;) */ + char ***intos; /* per-compound command \into variables */ Need an extra space for intos to align with earlier fields. Also I wonder if intonames or intoargs would be a slightly better name for the field. +static bool +read_response(CState *st, char ** intos[]) Usual style seems to be to use ***intos here. + fprintf(stderr, + "client %d state %d compound %d: " + "cannot apply \\into to non SELECT statement\n", + st->id, st->state, compound); How about make this error message say something like other messages related to \into, perhaps something like: "\\into cannot follow non SELECT statements\n" /* * Read and discard the query result; note this is not included in - * the statement latency numbers. + * the statement latency numbers (above), thus if reading the + * response fails the transaction is counted nevertheless. */ Does this comment need to mention that the result is not discarded when \into is specified? + my_command->intos = pg_malloc0(sizeof(char**) * (compounds+1)); Need space: s/char**/char **/g This comments applies also to a couple of nearby places. - my_command->line = pg_malloc(nlpos - p + 1); + my_command->line = pg_malloc(nlpos - p + 1 + 3); A comment mentioning what the above means would be helpful. + bool sql_command_in_progress = false; This variable's name could be slightly confusing to readers. If I understand its purpose correctly, perhaps it could be called sql_command_continues. + if (index == 0) + syntax_error(desc, lineno, NULL, NULL, + "\\into cannot start a script", + NULL, -1); How about: "\\into cannot be at the beginning of a script" ? Thanks, Amit
Hello Amit, > Custom script looks like: > > \; > select a \into a > from tab where a = 1; > \set i debug(:a) > > I get the following error: > > undefined variable "a" > client 0 aborted in state 1; execution of meta-command failed Good catch! Indeed, there is a problem with empty commands which are simply ignored by libpq/postgres if there are other commands around, so my synchronization between results & commands was too naive. In order to fix this, I made the scanner also count empty commands and ignore comments. I guessed that proposing to change libpq/postgres behavior was not an option. > Comments on the patch follow: > > + <listitem> > + <para> > + Stores the first fields of the resulting row from the current or > preceding > + <command>SELECT</> command into these variables. > > Any command returning rows ought to work, no? Yes. I put "SQL command" instead. > > - char *line; /* text of command line */ > + char *line; /* first line for short display */ > + char *lines; /* full multi-line text of command */ > > When I looked at this and related hunks (and also the hunks related to > semicolons), it made me wonder whether this patch contains two logical > changes. Is this a just a refactoring for the \into implementation or > does this provide some new externally visible feature? There is essentially a refactoring that I did when updating how Command is managed because it has to be done in several stages to fit "into" into it and to take care of compounds. However there was small "new externally visible feature": on -r, instead of cutting abruptly a multiline command at the end of the first line it appended "..." as an ellipsis because it looked nicer. I've removed this small visible changed. > char *argv[MAX_ARGS]; /* command word list */ > + int compound; /* last compound command (number of \;) */ > + char ***intos; /* per-compound command \into variables */ > Need an extra space for intos to align with earlier fields. Ok. > Also I wonder if intonames or intoargs would be a slightly better name > for the field. I put "intovars" as they are variable names. > +static bool > +read_response(CState *st, char ** intos[]) > > Usual style seems to be to use ***intos here. Ok. > + fprintf(stderr, > + "client %d state %d compound %d: " > + "cannot apply \\into to non SELECT statement\n", > + st->id, st->state, compound); > > How about make this error message say something like other messages > related to \into, perhaps something like: "\\into cannot follow non SELECT > statements\n" As you pointed out above, there may be statements without "SELECT" which which return a row. I wrote "\\into expects a row" instead. > /* > * Read and discard the query result; note this is not included in > - * the statement latency numbers. > + * the statement latency numbers (above), thus if reading the > + * response fails the transaction is counted nevertheless. > */ > > Does this comment need to mention that the result is not discarded when > \into is specified? Hmmm. The result structure is discarded, but the results are copied into variables before that, so the comments seems ok... > + my_command->intos = pg_malloc0(sizeof(char**) * (compounds+1)); > > Need space: s/char**/char **/g Ok. > This comments applies also to a couple of nearby places. Indeed. > - my_command->line = pg_malloc(nlpos - p + 1); > + my_command->line = pg_malloc(nlpos - p + 1 + 3); > > A comment mentioning what the above means would be helpful. Ok. I removed the "+ 3" anyway. > + bool sql_command_in_progress = false; > > This variable's name could be slightly confusing to readers. If I > understand its purpose correctly, perhaps it could be called > sql_command_continues. It is possible. I like 'in progress' though. Why is it confusing? It means that the current command is not finished yet and more is expected, that is the final ';' has not been encountered. > + if (index == 0) > + syntax_error(desc, lineno, NULL, NULL, > + "\\into cannot start a script", > + NULL, -1); > > How about: "\\into cannot be at the beginning of a script" ? It is possible, but it's quite longer... I'm not a native speaker, so I'm do not know whether it would be better. The attached patch takes into all your comments but: - comment about discarded results... - the sql_command_in_progress variable name change - the longer message on into at the start of a script -- Fabien.
Attachment
Hi Fabien, On 2016/09/07 23:01, Fabien COELHO wrote: >> Custom script looks like: >> >> \; >> select a \into a >> from tab where a = 1; >> \set i debug(:a) >> >> I get the following error: >> >> undefined variable "a" >> client 0 aborted in state 1; execution of meta-command failed > > Good catch! > > Indeed, there is a problem with empty commands which are simply ignored by > libpq/postgres if there are other commands around, so my synchronization > between results & commands was too naive. > > In order to fix this, I made the scanner also count empty commands and > ignore comments. I guessed that proposing to change libpq/postgres > behavior was not an option. Seems to be fixed. >> Comments on the patch follow: >> >> + <listitem> >> + <para> >> + Stores the first fields of the resulting row from the current or >> preceding >> + <command>SELECT</> command into these variables. >> >> Any command returning rows ought to work, no? > > Yes. I put "SQL command" instead. Check. >> - char *line; /* text of command line */ >> + char *line; /* first line for short display */ >> + char *lines; /* full multi-line text of command */ >> >> When I looked at this and related hunks (and also the hunks related to >> semicolons), it made me wonder whether this patch contains two logical >> changes. Is this a just a refactoring for the \into implementation or >> does this provide some new externally visible feature? > > There is essentially a refactoring that I did when updating how Command is > managed because it has to be done in several stages to fit "into" into it > and to take care of compounds. > > However there was small "new externally visible feature": on -r, instead > of cutting abruptly a multiline command at the end of the first line it > appended "..." as an ellipsis because it looked nicer. > I've removed this small visible changed. There still seems to be a change in behavior of the -r option due to the patch. Consider the following example: # no \into used in the script $ cat /tmp/into.sql select a from a where a = 1 \; select a+1 from a where a = 1; \set a 1 \set b 2 \set i debug(:a) \set i debug(:b) $ pgbench -r -n -t 1 -f /tmp/into.sql postgres <snip>- statement latencies in milliseconds: 2.889 select a from a where a = 1 ; 0.012 \set a 1 0.009 \set b 2 0.031 \set i debug(:a) 0.014 \set i debug(:b) # behavior wrt compound statement changes when \into is used $ cat /tmp/into.sql select a from a where a = 1 \; \into a select a+1 from a where a = 1; \into b \set i debug(:a) \set i debug(:b) $ pgbench -r -n -t 1 -f /tmp/into.sql postgres <snip>- statement latencies in milliseconds: 2.093 select a from a where a = 1 ; select a+1 from a where a = 1; 0.034 \set i debug(:a) 0.015 \set i debug(:b) One more thing I observed which I am not sure if it's a fault of this patch is illustrated below: $ cat /tmp/into.sql \; select a from a where a = 1 \; select a+1 from a where a = 1; \set a 1 \set b 2 \set i debug(:a) \set i debug(:b) $ pgbench -r -n -t 1 -f /tmp/into.sql postgres <snip>- statement latencies in milliseconds: 2.349 ; 0.013 \set a 1 0.009 \set b 2 0.029 \set i debug(:a) 0.015 \set i debug(:b) Note that the compound select statement is nowhere to be seen in the latencies output. The output remains the same even if I use the \into's. What seems to be going on is that the empty statement on the first line (\;) is the only part kept of the compound statement spanning lines 1-3. >> Also I wonder if intonames or intoargs would be a slightly better name >> for the field. > > I put "intovars" as they are variable names. Sounds fine. >> + fprintf(stderr, >> + "client %d state %d compound %d: " >> + "cannot apply \\into to non SELECT statement\n", >> + st->id, st->state, compound); >> >> How about make this error message say something like other messages >> related to \into, perhaps something like: "\\into cannot follow non SELECT >> statements\n" > > As you pointed out above, there may be statements without "SELECT" which > which return a row. I wrote "\\into expects a row" instead. Sounds fine. > >> /* >> * Read and discard the query result; note this is not >> included in >> - * the statement latency numbers. >> + * the statement latency numbers (above), thus if reading the >> + * response fails the transaction is counted nevertheless. >> */ >> >> Does this comment need to mention that the result is not discarded when >> \into is specified? > > Hmmm. The result structure is discarded, but the results are copied into > variables before that, so the comments seems ok... Hmm, OK. >> + bool sql_command_in_progress = false; >> >> This variable's name could be slightly confusing to readers. If I >> understand its purpose correctly, perhaps it could be called >> sql_command_continues. > > It is possible. I like 'in progress' though. Why is it confusing? > It means that the current command is not finished yet and more is > expected, that is the final ';' has not been encountered. I understood that it refers to what you explain here. But to me it sounded like the name is referring to the progress of *execution* of a SQL command whereas the code in question is simply expecting to finish *parsing* the SQL command using the next lines. It may be fine though. >> + if (index == 0) >> + syntax_error(desc, lineno, NULL, NULL, >> + "\\into cannot start a script", >> + NULL, -1); >> >> How about: "\\into cannot be at the beginning of a script" ? > > It is possible, but it's quite longer... I'm not a native speaker, so I'm > do not know whether it would be better. Me neither, let's leave it for the committer to decide. > The attached patch takes into all your comments but: > - comment about discarded results... > - the sql_command_in_progress variable name change > - the longer message on into at the start of a script The patch seems fine without these, although please consider the concern I raised with regard to the -r option above. Thanks, Amit
Hello Amit, > [...] > There still seems to be a change in behavior of the -r option due to the > patch. Consider the following example: > > select a from a where a = 1 \; > select a+1 from a where a = 1; > ... > - statement latencies in milliseconds: > 2.889 select a from a where a = 1 ; vs > select a from a where a = 1 \; \into a > select a+1 from a where a = 1; \into b > ... > 2.093 select a from a where a = 1 ; select a+1 from a where a = 1; Yep. Note that there is a small logical conumdrum in this argument: As the script is not the same, especially as into was not possible before, strictly speaking there is no behavior "change". This said, what you suggest can be done. After giving it some thought, I suggest that it is not needed nor desirable. If you want to achieve the initial effect, you just have to put the "into a" on the next line: select a from a where a = 1 \; \into a select a+1 from a where a = 1; \into b Then you would get the -r cut at the end of the compound command. Thus the current version gives full control of what will appear in the summary. If I change "\into xxx\n" to mean "also cut here", then there is less control on when the cut occurs when into is used. > One more thing I observed which I am not sure if it's a fault of this > patch is illustrated below: > > $ cat /tmp/into.sql > \; > select a from a where a = 1 \; > select a+1 from a where a = 1; > > $ pgbench -r -n -t 1 -f /tmp/into.sql postgres > <snip> > - statement latencies in milliseconds: > 2.349 ; > > Note that the compound select statement is nowhere to be seen in the > latencies output. The output remains the same even if I use the \into's. > What seems to be going on is that the empty statement on the first line > (\;) is the only part kept of the compound statement spanning lines 1-3. Yes. This is really the (debatable) current behavior, and is not affected by the patch. The "-r" summary takes the first line of the command, whatever it is. In your example the first line is "\;", so you get what you asked for, even if it looks rather strange, obviously. >>> + bool sql_command_in_progress = false; > [...] > I understood that it refers to what you explain here. But to me it > sounded like the name is referring to the progress of *execution* of a SQL > command whereas the code in question is simply expecting to finish > *parsing* the SQL command using the next lines. Ok. I changed it "sql_command_lexing_in_progress". >> The attached patch takes into all your comments but: >> - comment about discarded results... >> - the sql_command_in_progress variable name change >> - the longer message on into at the start of a script > > The patch seems fine without these, although please consider the concern I > raised with regard to the -r option above. I have considered it. As the legitimate behavior you suggested can be achieved just by putting the into on the next line, ISTM that the current proposition gives more control than doing a mandatory cut when into is used. Attached is a new version with the boolean renaming. The other thing I have considered is whether to implemented a "\gset" syntax, as suggested by Pavel and Tom. Bar the aesthetic, the main issue I have with it is that it does not work with compound commands, and what I want is to get the values out of compound commands... because of my focus on latency... so basically "\gset" does not do the job I want... Now I recognize that other people would like it, so probably I'll do it anyway in another patch. -- Fabien.
Attachment
Hi Fabien, On 2016/09/13 17:41, Fabien COELHO wrote: > > Hello Amit, > >> [...] >> There still seems to be a change in behavior of the -r option due to the >> patch. Consider the following example: >> >> select a from a where a = 1 \; >> select a+1 from a where a = 1; >> ... >> - statement latencies in milliseconds: >> 2.889 select a from a where a = 1 ; > > vs > >> select a from a where a = 1 \; \into a >> select a+1 from a where a = 1; \into b >> ... >> 2.093 select a from a where a = 1 ; select a+1 from a where a = 1; > > Yep. > > Note that there is a small logical conumdrum in this argument: As the > script is not the same, especially as into was not possible before, > strictly speaking there is no behavior "change". Sure, scripts are not the same but it seemed like showing the whole compound query whereas previously only part of it was shown may be an implementation artifact of \into. > This said, what you suggest can be done. > > After giving it some thought, I suggest that it is not needed nor > desirable. If you want to achieve the initial effect, you just have to put > the "into a" on the next line: > > select a from a where a = 1 \; > \into a > select a+1 from a where a = 1; \into b > > Then you would get the -r cut at the end of the compound command. Thus the > current version gives full control of what will appear in the summary. If > I change "\into xxx\n" to mean "also cut here", then there is less control > on when the cut occurs when into is used. So it means that position of \into affects where a compound command gets cut for -r display. I was just wondering if that was unintentional. >> One more thing I observed which I am not sure if it's a fault of this >> patch is illustrated below: >> >> $ cat /tmp/into.sql >> \; >> select a from a where a = 1 \; >> select a+1 from a where a = 1; >> >> $ pgbench -r -n -t 1 -f /tmp/into.sql postgres >> <snip> >> - statement latencies in milliseconds: >> 2.349 ; >> >> Note that the compound select statement is nowhere to be seen in the >> latencies output. The output remains the same even if I use the \into's. >> What seems to be going on is that the empty statement on the first line >> (\;) is the only part kept of the compound statement spanning lines 1-3. > > Yes. > > This is really the (debatable) current behavior, and is not affected by > the patch. The "-r" summary takes the first line of the command, whatever > it is. In your example the first line is "\;", so you get what you asked > for, even if it looks rather strange, obviously. Yep, perhaps it's strange to write a script like that anyway, :) >>>> + bool sql_command_in_progress = false; >> [...] >> I understood that it refers to what you explain here. But to me it >> sounded like the name is referring to the progress of *execution* of a SQL >> command whereas the code in question is simply expecting to finish >> *parsing* the SQL command using the next lines. > > Ok. I changed it "sql_command_lexing_in_progress". > >>> The attached patch takes into all your comments but: >>> - comment about discarded results... >>> - the sql_command_in_progress variable name change >>> - the longer message on into at the start of a script >> >> The patch seems fine without these, although please consider the concern I >> raised with regard to the -r option above. > > I have considered it. As the legitimate behavior you suggested can be > achieved just by putting the into on the next line, ISTM that the current > proposition gives more control than doing a mandatory cut when into is used. > > Attached is a new version with the boolean renaming. Thanks. > The other thing I have considered is whether to implemented a "\gset" > syntax, as suggested by Pavel and Tom. Bar the aesthetic, the main issue I > have with it is that it does not work with compound commands, and what I > want is to get the values out of compound commands... because of my focus > on latency... so basically "\gset" does not do the job I want... Now I > recognize that other people would like it, so probably I'll do it anyway > in another patch. So, psql's \gset does not work as desired for compound commands (viz. it is only able to save the result of the last sub-command). You need to use \gset with each sub-command separately if no result should be discarded. Because of undesirable latency characteristics of sending multiple commands, you want to be able to modify compound command handling such that every sub-command's result could be saved. In that regard, it's good that pgbench does not use PQexec() which only returns the result of the last sub-command if a compound command was issued through it. pgbench's doCustom() currently discards all results by issuing discard_response(). You propose to change things such that results are intercepted in a new function read_response(), values assigned to intovars corresponding to each sub-command, and then discarded. The intovars arrays are allocated within each sub-command's Command struct when parsing the compound command based on where \into is located wrt to the sub-command. So, most of the code in the patch is about handling compound statements to be be able to save results of all sub-commands, not just the last. Do you think it would be OK to suffer the bad latency of multiple round trips and implement a version of \into (or \gset or \ginto) for pgbench scripts that behaves exactly like psql's \gset as a first step? But you say you will do it as another patch. By the way, I tend to agree with your point about \gset syntax being suitable (only) in a interactive context such as psql; it's not as readable as \into x y ... when used in a script. Thanks, Amit
Hello Amit, >> [...] >> Then you would get the -r cut at the end of the compound command. Thus the >> current version gives full control of what will appear in the summary. If >> I change "\into xxx\n" to mean "also cut here", then there is less control >> on when the cut occurs when into is used. > > So it means that position of \into affects where a compound command gets > cut for -r display. I was just wondering if that was unintentional. Yes, but it happens to work reasonnably:-) >> The other thing I have considered is whether to implemented a "\gset" >> syntax, as suggested by Pavel and Tom. Bar the aesthetic, the main issue I >> have with it is that it does not work with compound commands, and what I >> want is to get the values out of compound commands... because of my focus >> on latency... so basically "\gset" does not do the job I want... Now I >> recognize that other people would like it, so probably I'll do it anyway >> in another patch. > > So, psql's \gset does not work as desired for compound commands (viz. it > is only able to save the result of the last sub-command). Yes. > You need to use \gset with each sub-command separately if no result > should be discarded. Because of undesirable latency characteristics of > sending multiple commands, you want to be able to modify compound > command handling such that every sub-command's result could be saved. Exactly. > In that regard, it's good that pgbench does not use PQexec() which only > returns the result of the last sub-command if a compound command was > issued through it. Indeed! > pgbench's > doCustom() currently discards all results by issuing discard_response(). > You propose to change things such that results are intercepted in a new > function read_response(), values assigned to intovars corresponding to > each sub-command, and then discarded. The intovars arrays are allocated > within each sub-command's Command struct when parsing the compound command > based on where \into is located wrt to the sub-command. Yep. > So, most of the code in the patch is about handling compound statements to > be be able to save results of all sub-commands, not just the last. Yep. Previously pgbench did not need to handle compounds commands which where just seen as one large string. Note that the added machinery is also a first step to allow the handling of prepared statements on compounds command, which I think is a desirable feature for benchmarks. > Do you think it would be OK to suffer the bad latency of multiple round > trips and implement a version of \into (or \gset or \ginto) for pgbench > scripts that behaves exactly like psql's \gset as a first step? I do not see gset as a reasonnable "first step" because: (0) "\into" already works while "\gset" in pgbench will need some time that I do not have at the moment (1) it is not what I want/need to do a clean bench (2) the feature is not orthogonal to compounds statements -- which is what I want -- (3) I do not like the "implicit" naming thing -- but this is really just a matter of taste. I simply recognize that Peter & Tom have a point: whatever I think of gset it is there in "psql" so it makes some sense to have it as well in "pgbench", so I agree to put that on my pgbench todo list. > But you say you will do it as another patch. Yep. I suggested another patch because it is a different feature and previous submissions where I mixed features, even closely related ones, all resulted in me having to separate the features in distinct patches. > By the way, I tend to agree with your point about \gset syntax being > suitable (only) in a interactive context such as psql; it's not as > readable as \into x y ... when used in a script. Yep, but as people already know it, it makes sense to provide it as well at some point. -- Fabien.
Hi Fabien, I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I have no further comments at the moment. Thanks, Amit [1] https://www.postgresql.org/message-id/alpine.DEB.2.20.1609130730380.10870%40lancre
On 2016/09/26 16:12, Amit Langote wrote: > I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I > have no further comments at the moment. Wait... Heikki's latest commit now requires this patch to be rebased. commit 12788ae49e1933f463bc59a6efe46c4a01701b76 Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Date: Mon Sep 26 10:56:02 2016 +0300 Refactor script execution state machine in pgbench. So, will change the status to "Waiting on Author". Thanks, Amit
Hello Amit, >> I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I >> have no further comments at the moment. > > Wait... Heikki's latest commit now requires this patch to be rebased. Indeed. Here is the rebased version, which still get through my various tests. -- Fabien.
Attachment
On 2016/09/26 20:27, Fabien COELHO wrote: > > Hello Amit, > >>> I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I >>> have no further comments at the moment. >> >> Wait... Heikki's latest commit now requires this patch to be rebased. > > Indeed. Here is the rebased version, which still get through my various > tests. Thanks, Fabien. It seems to work here too. Regards, Amit
On Tue, Sep 27, 2016 at 10:41 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2016/09/26 20:27, Fabien COELHO wrote: >> >> Hello Amit, >> >>>> I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I >>>> have no further comments at the moment. >>> >>> Wait... Heikki's latest commit now requires this patch to be rebased. >> >> Indeed. Here is the rebased version, which still get through my various >> tests. > > Thanks, Fabien. It seems to work here too. Moved to next CF with same status, ready for committer. -- Michael
On Mon, Oct 3, 2016 at 12:43 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Sep 27, 2016 at 10:41 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2016/09/26 20:27, Fabien COELHO wrote:
>>
>> Hello Amit,
>>
>>>> I am marking the pgbench-into-5.patch [1] as "Ready for Committer" as I
>>>> have no further comments at the moment.
>>>
>>> Wait... Heikki's latest commit now requires this patch to be rebased.
>>
>> Indeed. Here is the rebased version, which still get through my various
>> tests.
>
> Thanks, Fabien. It seems to work here too.
Moved to next CF with same status, ready for committer.
Moved to next CF with same status (ready for committer).
Regards,
Hari Babu
Fujitsu Australia
Fabien COELHO <coelho@cri.ensmp.fr> writes: > Indeed. Here is the rebased version, which still get through my various > tests. I looked through this again, and I still think that the syntactic design of the new command is seriously misguided, leading to an ugly and unmaintainable implementation that may well block further innovation in pgbench. I will not commit it in this form. Possibly you can find some other committer whom you can convince this is a good design --- but since the patch has gone untouched for two full commitfest cycles, I rather imagine that whoever else has looked at it has likewise decided they didn't want to be responsible for it. Please look at changing \into to be a SQL-command-ending backslash command as we previously discussed. I think you will find that the implementation is a great deal simpler that way and doesn't require weird hackery on the shared lexer. (BTW, said hackery is not just weird but broken. You can't simply remove comments. Consider something like "SELECT foo/*as*/bar". This code reduces that to "SELECT foobar" which is wrong.) If you won't do that, and you can't find another committer who will accept responsibility for this patch before the end of the current commitfest, I think we should mark it Rejected. regards, tom lane
Hello Tom, > Please look at changing \into to be a SQL-command-ending backslash > command as we previously discussed. Hmmm. I do want storing results & compound command ending to be orthogonal. In order to keep this feature, I think that I can move the "into/ginto/gset/..." at the end of the command. For the compound command list to necessarily end, I can probably do some reassembly as a post phase on Commands in pgbench so that the impact on the lexer is much reduced, in particular without undue "hackery" as you put it. -- Fabien.
Hello Tom, > Please look at changing \into to be a SQL-command-ending backslash > command as we previously discussed. Done. There are two variants: \gset & \gcset for compound (end SQL query, set variables, but do not end command, so that several settings are allowed in a compound command, a key feature for performance testing). Personnally, I find the end-of-query semicolon-replacing syntax ugly. However I'm more interested in feature than in elegance on this one, so I'll put up with it. > I think you will find that the implementation is a great deal simpler > that way and doesn't require weird hackery on the shared lexer. I have removed the "hackery", only counting embedded semicolons remains to keep track of compound queries. Note that the (somehow buggy and indeed not too clean) hackery was not related to the into syntax, but to detecting empty queries which are silently skipped by the server. > If you won't do that, [...] I think that I have done what you required. I have documented the fact that now the feature does not work if compound commands contain empty queries, which is a very minor drawback for a pgbench script anyway. Attached are the patch, a test script for the feature, and various test scripts to trigger error cases. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
<APOLOGY> Please pardon the redondance: this is a slightly edited repost from another thread where motivation for thispatch was discussed, so that it appear in the relevant thread. </APOLOGY> Tom> [...] there was immediately objection as to whether his idea of TPC-B Tom> compliance was actually right. From my point of view TPC-* are simply objective examples of typical benchmark requirements to show which features are needed in a tool for doing this activity. Once features are available, I think that pgbench should also be a show-case for their usage. Currently a few functions (for implementing the bench as specified) and actually extracting results into variables (for suspicious auditors and bench relevance, see below) are missing. Tom> I remember complaining that he had a totally artificial idea of what Tom> "fetching a data value" requires. Yep. I think that the key misunderstanding is that you are honest and assume that other people are honest too. This is naïve: There is a long history of vendors creatively "cheating" to get better than deserve benchmark results. Benchmark specifications try to prevent such behaviors by laying careful requirements and procedures. In this instance, you "know" that when pg has returned the result of the query the data is actually on the client side, so you considered it is fetched. That is fine for you, but from a benchmarking perspective with external auditors your belief/knowledge is not good enough. For instance, the vendor could implement a new version of the protocol where the data are only transfered on demand, and the result just tells that the data is indeed somewhere on the server (eg on "SELECT abalance" it could just check that the key exists, no need to actually fetch the data from the table, so no need to read the table, the index is enough...). That would be pretty stupid for real application performance, but the benchmark would get better tps by doing so. Without even intentionnaly cheating, this could be part of a useful "streaming mode" protocol option which make sense for very large results but would be activated for a small result. Another point is that decoding the message may be a little expensive, so that by not actually extracting the data into the client but just keeping it in the connection/OS one gets better performance. Thus, TPC-B 2.0.0 benchmark specification says: "1.3.2 Each transaction shall return to the driver the Account_Balance resulting from successful commit of the transaction. Comment: It is the intent of this clause that the account balance in the database be returned to the driver, i.e., that the application retrieve the account balance." For me the correct interpretation of "the APPLICATION retrieve the account balance" is that the client application code, pgbench in this context, did indeed get the value from the vendor code, here "libpq" which is handling the connection. Having the value discarded from libpq by calling PQclear instead of PQntuples/PQgetvalue/... skips a key part of the client code that no real application would skip. This looks strange and is not representative of real client code: as a potential auditor, because of this performance impact doubt and lack of relevance, I would not check the corresponding item in the audit check list: "11.3.1.2 Verify that transaction inputs and outputs satisfy Clause 1.3." So the benchmark implementation would not be validated. Another trivial reason to be able to actually retrieve data is that for benchmarking purpose it is very easy to want to test a scenario where you do different things based on data received, which imply that the data can be manipulated somehow on the benchmarking client side, which is currently not possible. -- Fabien.
On Sat, Jan 7, 2017 at 6:25 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I think that I have done what you required. > > I have documented the fact that now the feature does not work if compound > commands contain empty queries, which is a very minor drawback for a pgbench > script anyway. > > Attached are the patch, a test script for the feature, and various test > scripts to trigger error cases. I have moved this patch to next CF as the last status is a new patch set with no further reviews. I did not check if the comments have been applied though, this is a bit too much for me now.. -- Michael
Bonjour Michaël, >> Attached are the patch, a test script for the feature, and various test >> scripts to trigger error cases. > > I have moved this patch to next CF Ok. > as the last status is a new patch set with no further reviews. Indeed. > I did not check if the comments have been applied though, this is a bit > too much for me now... Sure. -- Fabien.
On Tue, Jan 31, 2017 at 11:54 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
--
Bonjour Michaël,Attached are the patch, a test script for the feature, and various test
scripts to trigger error cases.
I have moved this patch to next CF
Ok.as the last status is a new patch set with no further reviews.
Indeed.I did not check if the comments have been applied though, this is a bit too much for me now...
Sure.
I was reviewing v7 of this patch, to start with I found following white space errors when applying with git apply,
/home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing whitespace.
char *line; /* first line for short display */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:67: trailing whitespace.
char *lines; /* full multi-line text of command */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:72: trailing whitespace.
int compound; /* last compound command (number of \;) */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:73: trailing whitespace.
char **gset; /* per-compound command prefix */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:81: trailing whitespace.
/* read all responses from backend */
error: patch failed: doc/src/sgml/ref/pgbench.sgml:815
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
error: patch failed: src/bin/pgbench/pgbench.c:375
error: src/bin/pgbench/pgbench.c: patch does not apply
error: patch failed: src/bin/pgbench/pgbench.h:11
error: src/bin/pgbench/pgbench.h: patch does not apply
error: patch failed: src/fe_utils/psqlscan.l:678
error: src/fe_utils/psqlscan.l: patch does not apply
error: patch failed: src/include/fe_utils/psqlscan_int.h:112
error: src/include/fe_utils/psqlscan_int.h: patch does not apply
Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT 2 AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is giving error Invalid command \gcset. Not sure what is the intention of this script anyway? Also, instead of so many different files for error why don't you combine it into one.
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
Hello Rafia, > I was reviewing v7 of this patch, to start with I found following white > space errors when applying with git apply, > /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing > whitespace. Yep. I do not know why "git apply" sometimes complains. All is fine for me both with "git apply" and "patch". Last time it was because my mailer uses text/x-diff for the mime type, as define by the system in "/etc/mime.types", which some mailer then interpret as a license to change eol-style when saving, resulting in this kind of behavior. Could you tell your mailer just to save the file as is? > Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT 2 > AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is > giving error Invalid command \gcset. Are you sure that you are using the compiled pgbench, not a previously installed one? bin/pgbench> pgbench -t 1 -f SQL/gset-1.sql SQL/gset-1.sql:1: invalid command in command "gset" \gset bin/pgbench> ./pgbench -t 1 -f SQL/gset-1.sql starting vacuum...end. debug(script=0,command=2): int 1 debug(script=0,command=4):int 2 ... > Not sure what is the intention of this script anyway? The intention is to test that gset & gcset work as expected in various settings, especially with combined queries (\;) the right result must be extracted in the sequence. > Also, instead of so many different files for error why don't you combine > it into one. Because a pgbench scripts stops on the first error, and I wanted to test what happens with several kind of errors. -- Fabien.
On Thu, Mar 16, 2017 at 12:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Rafia, > >> I was reviewing v7 of this patch, to start with I found following white >> space errors when applying with git apply, >> /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing >> whitespace. > > > Yep. > > I do not know why "git apply" sometimes complains. All is fine for me both > with "git apply" and "patch". > > Last time it was because my mailer uses text/x-diff for the mime type, as > define by the system in "/etc/mime.types", which some mailer then interpret > as a license to change eol-style when saving, resulting in this kind of > behavior. Could you tell your mailer just to save the file as is? > >> Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT >> 2 >> AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is >> giving error Invalid command \gcset. > > > Are you sure that you are using the compiled pgbench, not a previously > installed one? > > bin/pgbench> pgbench -t 1 -f SQL/gset-1.sql > SQL/gset-1.sql:1: invalid command in command "gset" > \gset > > bin/pgbench> ./pgbench -t 1 -f SQL/gset-1.sql > starting vacuum...end. > debug(script=0,command=2): int 1 > debug(script=0,command=4): int 2 > ... > >> Not sure what is the intention of this script anyway? > > > The intention is to test that gset & gcset work as expected in various > settings, especially with combined queries (\;) the right result must be > extracted in the sequence. > >> Also, instead of so many different files for error why don't you combine >> it into one. > > > Because a pgbench scripts stops on the first error, and I wanted to test > what happens with several kind of errors. > if (my_command->argc > 2) + syntax_error(source, lineno, my_command->line, my_command->argv[0], + "at most on argument expected", NULL, -1); I suppose you mean 'one' argument here. Apart from that indentation is not correct as per pgindent, please check. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
Hello Rafia, > if (my_command->argc > 2) > + syntax_error(source, lineno, my_command->line, my_command->argv[0], > + "at most on argument expected", NULL, -1); > > I suppose you mean 'one' argument here. Indeed. > Apart from that indentation is not correct as per pgindent, please check. I guess that you are refering to switch/case indentation which my emacs does not do as expected. Please find attached a v8 which hopefully fixes these two issues. -- Fabien.
Attachment
On Fri, Mar 24, 2017 at 8:59 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Rafia, > >> if (my_command->argc > 2) >> + syntax_error(source, lineno, my_command->line, my_command->argv[0], >> + "at most on argument expected", NULL, -1); >> >> I suppose you mean 'one' argument here. > > > Indeed. > >> Apart from that indentation is not correct as per pgindent, please check. > > > I guess that you are refering to switch/case indentation which my emacs does > not do as expected. > > Please find attached a v8 which hopefully fixes these two issues. Looks good to me, marking as ready for committer. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
>> Please find attached a v8 which hopefully fixes these two issues. > Looks good to me, marking as ready for committer. I have looked into this a little bit. It seems the new feature \gset doesn't work with tables having none ascii column names: $ src/bin/pgbench/pgbench -t 1 -f /tmp/f test starting vacuum...end. gset: invalid variable name: "数字" client 0 file 0 command 0 compound 0: error storing into var 数字 transaction type: /tmp/f scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 1 number of transactions actually processed: 0/1 This is because pgbench variable names are limited to ascii ranges. IMO the limitation is unnecessary and should be removed. (I know that the limitation was brought in by someone long time ago and the patch author is not responsible for that). Anyway, now that the feature reveals the undocumented limitation, we should document the limitation of \gset at least. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tatsuo-san, > It seems the new feature \gset doesn't work with tables having none > ascii column names: Indeed. The same error is triggered with the \set syntax, which does not involve any query execution. I have added a sentence mentionning the restriction when variables are first discussed in the documentation, see attached patch. -- Fabien.
Attachment
Tom and others, I still wonder whether I should commit this or not because this patch does not allow none ascii column names. We know pgbench variable name has been restricted since the functionality was born. When users explicitly define a pgbench variable using \set, it is not a too strong limitation, because it's in a "pgbench world" anyway and nothing is related to PostgreSQL core specs. However, \gset is not happy with perfectly valid column names in PostgreSQL core, which looks too inconsistent and confusing for users. So the choices are: 1) commit the patch now with documenting the limitation. (the patch looks good to me except the issue above) 2) move it to next cf hoping that someone starts the implementation to eliminate the limitation of none ascii variable names. Comments? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp > Hello Tatsuo-san, > >> It seems the new feature \gset doesn't work with tables having none >> ascii column names: > > Indeed. The same error is triggered with the \set syntax, which does > not involve any query execution. > > I have added a sentence mentionning the restriction when variables are > first discussed in the documentation, see attached patch. > > -- > Fabien.
Tatsuo Ishii <ishii@sraoss.co.jp> writes: > I still wonder whether I should commit this or not because this patch > does not allow none ascii column names. Well, personally, as an all-ASCII guy I'm not too fussed about that, but I can see that other people might be annoyed. The main problem in dealing with it seems to be whether you're willing to support pgbench running in non-backend-safe encodings (eg SJIS). If we rejected that case then it'd be a relatively simple change to allow pgbench to treat any high-bit-set byte as a valid variable name character. (I think anyway, haven't checked the code.) Although ... actually, psql allows any high-bit-set byte in variable names (cf valid_variable_name()) without concern about encoding. That means it's formally incorrect in SJIS, but it's been like that for an awful lot of years and nobody's complained. Maybe it'd be fine for pgbench to act the same. Having said all that, I think we're at the point in the commitfest where if there's any design question at all about a patch, it should get booted to the next cycle. We are going to have more than enough to do to stabilize what's already committed, we don't need to be adding more uncertainty. regards, tom lane
On 2017-04-05 20:24:19 -0400, Tom Lane wrote: > Having said all that, I think we're at the point in the commitfest > where if there's any design question at all about a patch, it should > get booted to the next cycle. We are going to have more than enough > to do to stabilize what's already committed, we don't need to be > adding more uncertainty. +1
> Well, personally, as an all-ASCII guy I'm not too fussed about that, > but I can see that other people might be annoyed. > > The main problem in dealing with it seems to be whether you're willing > to support pgbench running in non-backend-safe encodings (eg SJIS). > If we rejected that case then it'd be a relatively simple change to allow > pgbench to treat any high-bit-set byte as a valid variable name character. > (I think anyway, haven't checked the code.) > > Although ... actually, psql allows any high-bit-set byte in variable > names (cf valid_variable_name()) without concern about encoding. > That means it's formally incorrect in SJIS, but it's been like that > for an awful lot of years and nobody's complained. Maybe it'd be fine > for pgbench to act the same. That's my thought too. > Having said all that, I think we're at the point in the commitfest > where if there's any design question at all about a patch, it should > get booted to the next cycle. We are going to have more than enough > to do to stabilize what's already committed, we don't need to be > adding more uncertainty. Ok, I will move the patch to the next cf. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> It seems the new feature \gset doesn't work with tables having none >> ascii column names: > > Indeed. The same error is triggered with the \set syntax, which does not > involve any query execution. > > I have added a sentence mentionning the restriction when variables are first > discussed in the documentation, see attached patch. Here is a v10: - does not talk about ASCII variable name constraint, as a patch has been submitted independently to lift this constraint. - rename gcset to cset (compound set, \; + \set), where gset is ; + \set, because "\gcset" looked really strange. - simplify the code a little bit. Also attached is an updated test script. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Here is a v11. It is basically a simple rebase after Tom committed the "pgbench -M order" patch. It interfered because the compound command management also needs to delay part of the SQL command initialization. Some patch are luckier than others:-) > Here is a v10: > > - does not talk about ASCII variable name constraint, as a patch has been > submitted independently to lift this constraint. > > - rename gcset to cset (compound set, \; + \set), where gset is ; + \set, > because "\gcset" looked really strange. > > - simplify the code a little bit. > > Also attached is an updated test script. > > -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi
2017-08-13 20:33 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Here is a v11.
It is basically a simple rebase after Tom committed the "pgbench -M order" patch. It interfered because the compound command management also needs
to delay part of the SQL command initialization. Some patch are luckier than others:-)Here is a v10:
- does not talk about ASCII variable name constraint, as a patch has been
submitted independently to lift this constraint.
- rename gcset to cset (compound set, \; + \set), where gset is ; + \set,
because "\gcset" looked really strange.
- simplify the code a little bit.
Also attached is an updated test script.
looks little bit strange, but it has sense
+1
Pavel
--
Fabien.
Here is a v12. There is no changes in the code or documentation, only TAP tests are added. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
> Here is a v12. Here is a v13, which is just a rebase after the documentation xml-ization. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi
2017-10-20 18:37 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Here is a v12.
Here is a v13, which is just a rebase after the documentation xml-ization.
I am looking to this patch.
Not sure if "cset" is best name - maybe "eset" .. like embeded set?
The code of append_sql_command is not too readable and is not enough commented.
I don't understand why you pass a param compounds to append_sql_command, when this value is stored in my_command->compound from create_sql_command?
Or maybe some unhappy field or variable names was chosen.
Regards
Pavel
--
Fabien.
Hello Pavel, >> Here is a v13, which is just a rebase after the documentation xml-ization. Here is a v14, after yet another rebase, and some comments added to answer your new comments. > I am looking to this patch. > > Not sure if "cset" is best name - maybe "eset" .. like embeded set? I used c for "compound", because they compound SQL commands. Now I do not have a very strong opinion, only that it should be some letter which some logic followed by "set". The variables and fields in the source currently use "compound" everywhere, if this is changed they should be updated accordingly. ISTM that the ";" is embedded, but the commands are compound, so "compound" seems better word to me. However it is debatable. If there some standard naming for the concept, it should be used. > The code of append_sql_command is not too readable and is not enough > commented. Ok. I have added comments in the code. > I don't understand why you pass a param compounds to append_sql_command, > when this value is stored in my_command->compound from create_sql_command? This is the number of compound commands in the "more" string. It must be updated as well as the command text, so that the my_command text and number of compounds is consistant. Think of one initialization followed by two appends: SELECT 1 AS x \cset SELECT 2 \; SELECT 3 AS y \cset SELECT 4 \; SELECT 5 \; SELECT 6 AS z \gset In the end, we must have the full 6 queries "SELECT 1 AS x \; SELECT 2 \; SELECT 3 AS y \; SELECT 4 \; SELECT 5 \; SELECT 6 AS z" and know that we want to set variables from queries 1, 3 and 6 and ignore the 3 others. > Or maybe some unhappy field or variable names was chosen. It seems ok to me. What would your suggest? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
> Here is a v14, after yet another rebase, and some comments added to answer > your new comments. Attached v15 is a simple rebase after Teodor push of new functions & operators in pgbench. -- Fabien.
Attachment
>> Here is a v14, after yet another rebase, and some comments added to answer >> your new comments. > > Attached v15 is a simple rebase after Teodor push of new functions & > operators in pgbench. Patch v16 is a rebase. -- Fabien.
Attachment
Greetings Fabien, * Fabien COELHO (fabien.coelho@mines-paristech.fr) wrote: > >>Here is a v14, after yet another rebase, and some comments added to > >>answer your new comments. > > > >Attached v15 is a simple rebase after Teodor push of new functions & > >operators in pgbench. > > Patch v16 is a rebase. Thank you for the diligent efforts to keep this patch moving forward, I know it's been a long time coming but it's also been through a number of reviews and improvements. With the UTF bits addressed previously, the rest of this patch doesn't strike me as overly controversial. I'm working through reviewing it and barring any big issues I'll see if I can get it committed before feature freeze, yet again, ends up swallowing it. I expect to get the review done tonight, at which point I'll either announce plans to commit it sometime tomorrow, or not, if there's issues still to be resovled. :) As such, taking it in the CF app as committer. Thanks! Stephen
Attachment
Fabien, * Fabien COELHO (fabien.coelho@mines-paristech.fr) wrote: > Patch v16 is a rebase. Here's that review. > diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml > index d52d324..203b6bc 100644 > --- a/doc/src/sgml/ref/pgbench.sgml > +++ b/doc/src/sgml/ref/pgbench.sgml > @@ -900,6 +900,51 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d > </para> > > <variablelist> > + <varlistentry id='pgbench-metacommand-gset'> > + <term> > + <literal>\cset [<replaceable>prefix</replaceable>]</literal> or > + <literal>\gset [<replaceable>prefix</replaceable>]</literal> > + </term> Seems like this should really be moved down below the section for '\set'. > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c > index 894571e..4a8595f 100644 > --- a/src/bin/pgbench/pgbench.c > +++ b/src/bin/pgbench/pgbench.c > @@ -434,12 +434,15 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"}; > > typedef struct > { > - char *line; /* text of command line */ > + char *line; /* first line for short display */ > + char *lines; /* full multi-line text of command */ Not really a fan of such closely-named variables... Maybe first_line instead? > +/* read all responses from backend */ > +static bool > +read_response(CState *st, char **gset) > +{ > + PGresult *res; > + int compound = 0; > + > + while ((res = PQgetResult(st->con)) != NULL) > + { > + switch (PQresultStatus(res)) > + { > + case PGRES_COMMAND_OK: /* non-SELECT commands */ > + case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */ > + if (gset[compound] != NULL) > + { > + fprintf(stderr, > + "client %d file %d command %d compound %d: " > + "\\gset expects a row\n", > + st->id, st->use_file, st->command, compound); > + st->ecnt++; > + return false; > + } > + break; /* OK */ > + > + case PGRES_TUPLES_OK: > + if (gset[compound] != NULL) Probably be good to add some comments here, eh: /* * The results are intentionally thrown away if we aren't under a gset * call. */ > + { > + /* store result into variables */ > + int ntuples = PQntuples(res), > + nfields = PQnfields(res), > + f; > + > + if (ntuples != 1) > + { > + fprintf(stderr, > + "client %d file %d command %d compound %d: " > + "expecting one row, got %d\n", > + st->id, st->use_file, st->command, compound, ntuples); > + st->ecnt++; > + PQclear(res); > + discard_response(st); > + return false; > + } Might be interesting to support mutli-row (or no row?) returns in the future, but not something this patch needs to do, so this looks fine to me. > + for (f = 0; f < nfields ; f++) > + { > + char *varname = PQfname(res, f); > + if (*gset[compound] != '\0') > + varname = psprintf("%s%s", gset[compound], varname); > + > + /* store result as a string */ > + if (!putVariable(st, "gset", varname, > + PQgetvalue(res, 0, f))) > + { > + /* internal error, should it rather abort? */ > + fprintf(stderr, > + "client %d file %d command %d compound %d: " > + "error storing into var %s\n", > + st->id, st->use_file, st->command, compound, > + varname); > + st->ecnt++; > + PQclear(res); > + discard_response(st); > + return false; > + } I'm a bit on the fence about if we should abort in this case or not. A failure here seems likely to happen consistently (whereas the ntuples case might be a fluke of some kind), which tends to make me think we should abort, but still thinking about it. > + if (*gset[compound] != '\0') > + free(varname); A comment here, and above where we're assigning the result of the psprintf(), to varname probably wouldn't hurt, explaining that the variable is sometimes pointing into the query result structure and sometimes not... Thinking about it a bit more, wouldn't it be cleaner to just always use psprintf()? eg: char *varname; varname = psprintf("%s%s", gset[compound] != '\0' ? gset[compound] : "", varname); ... free(varname); > + /* read and discard the query results */ That comment doesn't feel quite right now. ;) > @@ -3824,8 +3910,7 @@ parseQuery(Command *cmd) > char *sql, > *p; > > - /* We don't want to scribble on cmd->argv[0] until done */ > - sql = pg_strdup(cmd->argv[0]); > + sql = pg_strdup(cmd->lines); The function-header comment for parseQuery() could really stand to be improved. > + /* merge gset variants into preceeding SQL command */ > + if (pg_strcasecmp(bs_cmd, "gset") == 0 || > + pg_strcasecmp(bs_cmd, "cset") == 0) > + { > + int cindex; > + Command *sql_cmd; > + > + is_compound = bs_cmd[0] == 'c'; > + > + if (index == 0) > + syntax_error(desc, lineno, NULL, NULL, > + "\\gset cannot start a script", > + NULL, -1); > + > + sql_cmd = ps.commands[index-1]; > + > + if (sql_cmd->type != SQL_COMMAND) > + syntax_error(desc, lineno, NULL, NULL, > + "\\gset must follow a SQL command", > + sql_cmd->line, -1); > + > + /* this \gset applies to the last sub-command */ > + cindex = sql_cmd->compound; > + > + if (sql_cmd->gset[cindex] != NULL) > + syntax_error(desc, lineno, NULL, NULL, > + "\\gset cannot follow one another", > + NULL, -1); > + > + /* get variable prefix */ > + if (command->argc <= 1 || command->argv[1][0] == '\0') > + sql_cmd->gset[cindex] = ""; > + else > + sql_cmd->gset[cindex] = command->argv[1]; > + > + /* cleanup unused backslash command */ > + pg_free(command); These errors should probably be '\\gset and \\cset' or similar, no? Since we fall into this for both.. Probably not a big deal to someone using pgbench, but still. So, overall, looks pretty good to me. There's definitely some cleanup work to be done with variable names and comments and such, but nothing too terrible and I should have time to go through those changes and then go back over the patch again tomorrow with an eye towards committing it tomorrow afternoon, barring objections, etc. Thanks! Stephen
Attachment
Hello Stephen, > Here's that review. Thanks for the review. >> <variablelist> >> + <varlistentry id='pgbench-metacommand-gset'> >> + <term> >> + <literal>\cset [<replaceable>prefix</replaceable>]</literal> or >> + <literal>\gset [<replaceable>prefix</replaceable>]</literal> >> + </term> > > Seems like this should really be moved down below the section for > '\set'. Dunno. I put them there because it is in alphabetical order (for cset at least) and because "set" documentation is heavy about expressions (operators, functions, constants, ...) which have nothing to do with cset/gset, so I felt that having them clearly separated and in abc order was best. >> - char *line; /* text of command line */ >> + char *line; /* first line for short display */ >> + char *lines; /* full multi-line text of command */ > > Not really a fan of such closely-named variables... Maybe first_line > instead? Indeed, looks better. >> + case PGRES_TUPLES_OK: >> + if (gset[compound] != NULL) > > Probably be good to add some comments here, eh: > /* > * The results are intentionally thrown away if we aren't under a gset > * call. > */ I added a (shorter) comment. >> + if (ntuples != 1) >> + { >> + fprintf(stderr, >> + "client %d file %d command %d compound %d: " >> + "expecting one row, got %d\n", >> + st->id, st->use_file, st->command, compound, ntuples); >> + st->ecnt++; >> + PQclear(res); >> + discard_response(st); >> + return false; >> + } > > Might be interesting to support mutli-row (or no row?) returns in the > future, but not something this patch needs to do, so this looks fine to > me. It could match PL/pgSQL's INTO, that is first row or NULL if none. >> + >> + /* store result as a string */ >> + if (!putVariable(st, "gset", varname, >> + PQgetvalue(res, 0, f))) >> + { >> + /* internal error, should it rather abort? */ > > I'm a bit on the fence about if we should abort in this case or not. A > failure here seems likely to happen consistently (whereas the ntuples > case might be a fluke of some kind), which tends to make me think we > should abort, but still thinking about it. I'm also still thinking about it:-) For me it is an abort, but there is this whole "return false" internal error management in pgbench the purpose of which fails me a little bit, so I stick to that anyway. >> + if (*gset[compound] != '\0') >> + free(varname); > > A comment here, and above where we're assigning the result of the > psprintf(), to varname probably wouldn't hurt, explaining that the > variable is sometimes pointing into the query result structure and > sometimes not... I added two comments to avoid a malloc/free when there are no prefixes. ISTM that although it might be a border-line over-optimization case, it is a short one, the free is a few lines away, so it might be ok. >> + /* read and discard the query results */ > > That comment doesn't feel quite right now. ;) Indeed. Changed with "store or discard". >> >> - /* We don't want to scribble on cmd->argv[0] until done */ >> - sql = pg_strdup(cmd->argv[0]); >> + sql = pg_strdup(cmd->lines); > > The function-header comment for parseQuery() could really stand to be > improved. Indeed. >> + /* merge gset variants into preceeding SQL command */ >> + if (pg_strcasecmp(bs_cmd, "gset") == 0 || >> + pg_strcasecmp(bs_cmd, "cset") == 0) >> + { >> + "\\gset cannot start a script", >> + "\\gset must follow a SQL command", >> + "\\gset cannot follow one another", > > These errors should probably be '\\gset and \\cset' or similar, no? > Since we fall into this for both.. Indeed. Attached an improved version, mostly comments and error message fixes. I have not changed the 1 row constraint for now. Could be an later extension. If this patch get through, might be handy for simplifying tests in current pgbench submissions, especially the "error handling" one. -- Fabien.
Attachment
Fabien, * Fabien COELHO (coelho@cri.ensmp.fr) wrote: > >> <variablelist> > >>+ <varlistentry id='pgbench-metacommand-gset'> > >>+ <term> > >>+ <literal>\cset [<replaceable>prefix</replaceable>]</literal> or > >>+ <literal>\gset [<replaceable>prefix</replaceable>]</literal> > >>+ </term> > > > >Seems like this should really be moved down below the section for > >'\set'. > > Dunno. > > I put them there because it is in alphabetical order (for cset at least) and > because "set" documentation is heavy about expressions (operators, > functions, constants, ...) which have nothing to do with cset/gset, so I > felt that having them clearly separated and in abc order was best. Ah, hmmm, yes, alphabetical order is sensible, certainly. > >>- char *line; /* text of command line */ > >>+ char *line; /* first line for short display */ > >>+ char *lines; /* full multi-line text of command */ > > > >Not really a fan of such closely-named variables... Maybe first_line > >instead? > > Indeed, looks better. Great, thanks. > >>+ case PGRES_TUPLES_OK: > >>+ if (gset[compound] != NULL) > > > >Probably be good to add some comments here, eh: > > >/* > >* The results are intentionally thrown away if we aren't under a gset > >* call. > >*/ > > I added a (shorter) comment. Ok. > >>+ if (ntuples != 1) > >>+ { > >>+ fprintf(stderr, > >>+ "client %d file %d command %d compound %d: " > >>+ "expecting one row, got %d\n", > >>+ st->id, st->use_file, st->command, compound, ntuples); > >>+ st->ecnt++; > >>+ PQclear(res); > >>+ discard_response(st); > >>+ return false; > >>+ } > > > >Might be interesting to support mutli-row (or no row?) returns in the > >future, but not something this patch needs to do, so this looks fine to > >me. > > It could match PL/pgSQL's INTO, that is first row or NULL if none. Yeah, but that's not really something that needs to go into this patch. > >>+ > >>+ /* store result as a string */ > >>+ if (!putVariable(st, "gset", varname, > >>+ PQgetvalue(res, 0, f))) > >>+ { > >>+ /* internal error, should it rather abort? */ > > > >I'm a bit on the fence about if we should abort in this case or not. A > >failure here seems likely to happen consistently (whereas the ntuples > >case might be a fluke of some kind), which tends to make me think we > >should abort, but still thinking about it. > > I'm also still thinking about it:-) For me it is an abort, but there is this > whole "return false" internal error management in pgbench the purpose of > which fails me a little bit, so I stick to that anyway. Yeah. > >>+ if (*gset[compound] != '\0') > >>+ free(varname); > > > >A comment here, and above where we're assigning the result of the > >psprintf(), to varname probably wouldn't hurt, explaining that the > >variable is sometimes pointing into the query result structure and > >sometimes not... > > I added two comments to avoid a malloc/free when there are no prefixes. ISTM > that although it might be a border-line over-optimization case, it is a > short one, the free is a few lines away, so it might be ok. Ok, having the comments is definitely good as it was a bit confusing as to what was going on. :) > >>+ /* read and discard the query results */ > > > >That comment doesn't feel quite right now. ;) > > Indeed. Changed with "store or discard". Ok. > >> > >>- /* We don't want to scribble on cmd->argv[0] until done */ > >>- sql = pg_strdup(cmd->argv[0]); > >>+ sql = pg_strdup(cmd->lines); > > > >The function-header comment for parseQuery() could really stand to be > >improved. > > Indeed. > > >>+ /* merge gset variants into preceeding SQL command */ > >>+ if (pg_strcasecmp(bs_cmd, "gset") == 0 || > >>+ pg_strcasecmp(bs_cmd, "cset") == 0) > >>+ { > >>+ "\\gset cannot start a script", > >>+ "\\gset must follow a SQL command", > >>+ "\\gset cannot follow one another", > > > >These errors should probably be '\\gset and \\cset' or similar, no? > >Since we fall into this for both.. > > Indeed. > > Attached an improved version, mostly comments and error message fixes. > > I have not changed the 1 row constraint for now. Could be an later > extension. > > If this patch get through, might be handy for simplifying tests in > current pgbench submissions, especially the "error handling" one. Glad to hear that. Unfortunately, I didn't end up having time to wrap this up to a satisfactory level for myself to get it into PG11. I know it's been a long time coming, and thank you for continuing to push on it; I'll try to make sure that I take some time in the first CF for PG12 to wrap this up and get it in. I'm all for these improvements in pgbench in general, it's a really useful tool and it's great to see work going into it. Thanks again! Stephen
Attachment
Hello Stephen, >>> Might be interesting to support mutli-row (or no row?) returns in the >>> future, but not something this patch needs to do, so this looks fine to >>> me. >> >> It could match PL/pgSQL's INTO, that is first row or NULL if none. > > Yeah, but that's not really something that needs to go into this patch. Sure. I did not. I checked psql \gset behavior: psql> SELECT 1 AS stuff WHERE false \gset no rows returned for \gset psql> \echo :stuff :stuff -- "stuff" var was not set psql> SELECT i AS stuff FROM generate_series(1,5) AS i \gset more than one row returned for \gset psql> \echo :stuff :stuff -- "stuff" var was not set either If the semantics is changed in anyway, ISTM that psql & pgbench should be kept consistent. >> If this patch get through, might be handy for simplifying tests in >> current pgbench submissions, especially the "error handling" one. > > Glad to hear that. Unfortunately, I didn't end up having time to wrap > this up to a satisfactory level for myself to get it into PG11. No problem with waiting for PG<N+1>. Whatever N:-) > I know it's been a long time coming, and thank you for continuing to > push on it; Yeah, I'm kind of stubborn. Sometimes a quality, often a flaw. > I'll try to make sure that I take some time in the first CF > for PG12 to wrap this up and get it in. I'm all for these improvements > in pgbench in general, it's a really useful tool and it's great to see > work going into it. Thanks for scheduling a try! :-) When it gets in, I'll submit, eventually, a "tpcb-strict" builtin benchmarking script for illustration, which would implement the bench requirement that clients more often query in their own branch. This would take advantage of recently (PG11) added \if and logical expressions (for correlating clients to their branch) and gset (the benchmark states that the client must retrieve the value, whereas it is currently discarded). -- Fabien.
Hello Stephen, Attached is v18, another basic rebase after some perl automatic reindentation. -- Fabien.
Attachment
On 04/27/2018 12:28 PM, Fabien COELHO wrote: > > Hello Stephen, > > Attached is v18, another basic rebase after some perl automatic > reindentation. > This patch contains CRLF line endings - and in any case it doesn't apply any more. Please fix those things. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Andrew, >> Attached is v18, another basic rebase after some perl automatic >> reindentation. > > This patch contains CRLF line endings Alas, not according to "file" nor "hexdump" (only 0A, no 0D) on my local version, AFAICS. What happens on the path and what is done by mail clients depending on the mime type is another question (eg text/x-diff or text/plain). > - and in any case it doesn't apply any more. Please fix those things. Here is the new generated version, v19, that I just tested on my linux ubuntu bionic laptop: sh> git checkout -b test master Switched to a new branch 'test' sh> cksum ~/pgbench-into-19.patch 3375461661 26024 ~/pgbench-into-19.patch sh> hexdump ~/pgbench-into-19.patch 0000000 6964 6666 2d20 672d 7469 6120 642f 636f 0000010 732f 6372 732f 6d67 2f6c 6572 2f66 6770 0000020 6562 636e 2e68 6773 6c6d 6220 642f 636f 0000030 732f 6372 732f 6d67 2f6c 6572 2f66 6770 0000040 6562 636e 2e68 6773 6c6d 690a 646e .... # no 0d in front of 0a ^^ sh> git apply ~/pgbench-into-19.patch sh> git status On branch test ... modified: doc/src/sgml/ref/pgbench.sgml modified: src/bin/pgbench/pgbench.c modified: src/bin/pgbench/pgbench.h modified: src/bin/pgbench/t/001_pgbench_with_server.pl modified: src/fe_utils/psqlscan.l modified: src/include/fe_utils/psqlscan_int.h -- Fabien.
Attachment
On 08/13/2018 06:30 PM, Fabien COELHO wrote: > > Hello Andrew, > >>> Attached is v18, another basic rebase after some perl automatic >>> reindentation. >> >> This patch contains CRLF line endings > > Alas, not according to "file" nor "hexdump" (only 0A, no 0D) on my > local version, AFAICS. > > What happens on the path and what is done by mail clients depending on > the mime type is another question (eg text/x-diff or text/plain). It's not done by my MUA, and it's present in your latest posted patch. If anything I'd suspect your MUA: andrew@emma*$ curl -s https://www.postgresql.org/message-id/attachment/64237/pgbench-into-19.patch | head -n 3 | od -c 0000000 d i f f - - g i t a / d o c 0000020 / s r c / s g m l / r e f / p g 0000040 b e n c h . s g m l b / d o c 0000060 / s r c / s g m l / r e f / p g 0000100 b e n c h . s g m l \r \n i n d e 0000120 x 8 8 c f 8 b 3 9 3 3 . . 9 4 0000140 6 f 0 8 0 0 5 d 1 0 0 6 4 4 \r 0000160 \n - - - a / d o c / s r c / s 0000200 g m l / r e f / p g b e n c h . 0000220 s g m l \r \n The gzipped version you also attached did not have the CRs. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>> This patch contains CRLF line endings >> >> Alas, not according to "file" nor "hexdump" (only 0A, no 0D) on my >> local version, AFAICS. >> >> What happens on the path and what is done by mail clients depending >> on the mime type is another question (eg text/x-diff or text/plain). Andrew> It's not done by my MUA, and it's present in your latest posted Andrew> patch. If anything I'd suspect your MUA: The patch in the original email is in text/plain with base64 transfer encoding, which means that CRLF line endings are mandatory. It's actually up to the receiving MUA (or the archives webserver) to undo that. If the archives webserver isn't handling that then it's a bug there. -- Andrew (irc:RhodiumToad)
Hello Andrew, > It's not done by my MUA, and it's present in your latest posted patch. If > anything I'd suspect your MUA: > > andrew@emma*$ curl -s > https://www.postgresql.org/message-id/attachment/64237/pgbench-into-19.patch Argh. Indeed, this downloaded version has CRLF. Now when I save the attachment in my MUA, I only have LF... Let us look at the raw format: Content-Type: text/plain; name=pgbench-into-19.patch Content-Transfer-Encoding: BASE64 ... ZGlmZiAtLWdpdCBhL2RvYy9zcmMvc2dtbC9yZWYvcGdiZW5jaC5zZ21sIGIv ZG9jL3NyYy9zZ21sL3JlZi9wZ2JlbmNoLnNnbWwNCmluZGV4IDg4Y2Y4YjM5 ... Where you immediatly see that it has indeed CRLF at the end of the second line:-). So you are right, and my trusted mailer is encoding *AND* decoding silently. Why would it do that? After some googling, this is because RFC 2046 (MIME) says you "MUST": https://tools.ietf.org/html/rfc2046#section-4.1.1 So I'm right in the end, and the whole world is wrong, which is a relief:-) As I cannot except everybody to have a RFC 2046 compliant MUA, and after some meddling in "/etc/mime.types", I now have: Content-Type: application/octet-stream; name=pgbench-into-19.patch Content-Transfer-Encoding: BASE64 ... ZGlmZiAtLWdpdCBhL2RvYy9zcmMvc2dtbC9yZWYvcGdiZW5jaC5zZ21sIGIv ZG9jL3NyYy9zZ21sL3JlZi9wZ2JlbmNoLnNnbWwKaW5kZXggODhjZjhiMzkz Which is much better:-) I re-attached the v19 for a check on the list. -- Fabien.
Attachment
> Andrew> It's not done by my MUA, and it's present in your latest posted > Andrew> patch. If anything I'd suspect your MUA: > > The patch in the original email is in text/plain with base64 transfer > encoding, which means that CRLF line endings are mandatory. It's > actually up to the receiving MUA (or the archives webserver) to undo > that. I came to the same conclusion. This is hidden because most people post patches as "application/octet-stream", where no meddling is allowed. I'll try to do that in the future. > If the archives webserver isn't handling that then it's a bug there. I'm not sure that the webserver is at fault either: it sends "CRLF" on "text/plain", which seems okay, even required, by MIME. Maybe the web user agent should do the translating if appropriate to the receiving system... Obviously "curl" does not do it, nor "firefox" on "save as". I have no doubt that some MUA around would forget to do the conversion. -- Fabien.
On 08/14/2018 07:37 AM, Andrew Gierth wrote: >>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > >>> This patch contains CRLF line endings > >> > >> Alas, not according to "file" nor "hexdump" (only 0A, no 0D) on my > >> local version, AFAICS. > >> > >> What happens on the path and what is done by mail clients depending > >> on the mime type is another question (eg text/x-diff or text/plain). > > Andrew> It's not done by my MUA, and it's present in your latest posted > Andrew> patch. If anything I'd suspect your MUA: > > The patch in the original email is in text/plain with base64 transfer > encoding, which means that CRLF line endings are mandatory. It's > actually up to the receiving MUA (or the archives webserver) to undo > that. > > If the archives webserver isn't handling that then it's a bug there. > Probably a good reason not to use text/plain for patches, ISTM. I do note that my MUA (Thunderbird) uses text/x-patch and probably violates RFC2046 4.1.1 cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> The patch in the original email is in text/plain with base64 transfer >> encoding, which means that CRLF line endings are mandatory. It's >> actually up to the receiving MUA (or the archives webserver) to undo >> that. >> >> If the archives webserver isn't handling that then it's a bug there. Andrew> Probably a good reason not to use text/plain for patches, ISTM. Andrew> I do note that my MUA (Thunderbird) uses text/x-patch and Andrew> probably violates RFC2046 4.1.1 The first patch of yours I found was in text/x-patch with 7bit transfer encoding, so the line endings are actually those of the mail message itself (i.e. CRLF on the wire). -- Andrew (irc:RhodiumToad)
Fabien COELHO <coelho@cri.ensmp.fr> writes: > I have no doubt that some MUA around would forget to do the conversion. FWIW, one reason that I invariably use patch(1) to apply submitted patches is that it will take care of stripping any CRs that may have snuck in. So I'm not particularly fussed about the problem. I'm not excited about encouraging people to use application/octet-stream rather than text/something for submitted patches. If you use text then the patch can easily be examined in the web archives; with application/octet-stream the patch has to be downloaded, adding a lot of manual overhead. (At least, that's true with my preferred web browser, maybe it's different for other people?) regards, tom lane
On Tue, Aug 14, 2018 at 1:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fabien COELHO <coelho@cri.ensmp.fr> writes: >> I have no doubt that some MUA around would forget to do the conversion. > > FWIW, one reason that I invariably use patch(1) to apply submitted patches > is that it will take care of stripping any CRs that may have snuck in. > So I'm not particularly fussed about the problem. Yeah. I think that we shouldn't care about this, or about context/unified diffs, or really anything other than that patch can apply it. Once you apply it, you can issue the correct incantation to see it in whatever format you prefer. If it's a whole patch stack, it makes sense to use 'git format-patch' to generate the patches, because then it's a lot easier to apply the whole stack, but for a single patch it really doesn't matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 08/14/2018 01:44 PM, Tom Lane wrote: > Fabien COELHO <coelho@cri.ensmp.fr> writes: >> I have no doubt that some MUA around would forget to do the conversion. > FWIW, one reason that I invariably use patch(1) to apply submitted patches > is that it will take care of stripping any CRs that may have snuck in. > So I'm not particularly fussed about the problem. I also use patch(1), although probably mainly out of laziness in not changing habits going back to the CVS days ;-) You obviously commit far more patches that I do, but I don't normally see patch complaining about CRs, which is why I raised the issue. > > I'm not excited about encouraging people to use application/octet-stream > rather than text/something for submitted patches. If you use text then > the patch can easily be examined in the web archives; with > application/octet-stream the patch has to be downloaded, adding a lot of > manual overhead. (At least, that's true with my preferred web browser, > maybe it's different for other people?) > > You might have a point. Compressed patched can be particularly annoying. Most other things my browser will download and pop unto geany for me. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> I have no doubt that some MUA around would forget to do the conversion. > > FWIW, one reason that I invariably use patch(1) to apply submitted patches > is that it will take care of stripping any CRs that may have snuck in. > So I'm not particularly fussed about the problem. Good to know. > I'm not excited about encouraging people to use application/octet-stream > rather than text/something for submitted patches. I'm not happy with that either, it is just to avoid complaints. > If you use text then the patch can easily be examined in the web > archives; with application/octet-stream the patch has to be downloaded, > adding a lot of manual overhead. Indeed. > (At least, that's true with my preferred web browser, maybe it's > different for other people?) So if I send with text/x-diff or text/plain I've got complaints, if I send with application/octet-stream, it is not right either:-) Everybody being somehow right. Sigh. -- Fabien.
On 2018-Aug-14, Fabien COELHO wrote: > > (At least, that's true with my preferred web browser, maybe it's > > different for other people?) > > So if I send with text/x-diff or text/plain I've got complaints, if I send > with application/octet-stream, it is not right either:-) Everybody being > somehow right. I like that I can look at the text/* ones directly in the browser instead of having to download, but I can handle whatever (and I expect the same for most people, except maybe those who work directly on Windows). I just wish people would not send tarballs, which aren't as comfy to page through with "zcat | cdiff" ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Stephen, On Tue, Aug 14, 2018 at 01:38:21PM +0200, Fabien COELHO wrote: > I re-attached the v19 for a check on the list. You are marked as the committer of this patch in the CF app since last April and this patch is marked as ready for committer. Are you planning to look at it soon? -- Michael
Attachment
Bonjour Michaël, > On Tue, Aug 14, 2018 at 01:38:21PM +0200, Fabien COELHO wrote: >> I re-attached the v19 for a check on the list. > > You are marked as the committer of this patch in the CF app since last > April and this patch is marked as ready for committer. Are you planning > to look at it soon? Here is yet another rebase. Whether Stephen will have time to commit this patch is unclear. I'd suggest that I remove his name from the committer column so that another committer may consider it. What do you think? For me this patch is fundamental because it allows a test script to interact both ways with the database, and to act on database data (in particular thanks to \if and expressions already added), and also actually retrieving results is a key benchmark compliance constraint that pgbench does not meet. -- Fabien.
Attachment
> Here is yet another rebase. Here is yet another rebase after Peter's exit status changes on pgbench. > Whether Stephen will have time to commit this patch is unclear. I'd suggest > that I remove his name from the committer column so that another committer > may consider it. > > What do you think? ? > For me this patch is fundamental because it allows a test script to interact > both ways with the database, and to act on database data (in particular > thanks to \if and expressions already added), and also actually retrieving > results is a key benchmark compliance constraint that pgbench does not meet. -- Fabien.
Attachment
On 2017-Nov-04, Fabien COELHO wrote: > Think of one initialization followed by two appends: > > SELECT 1 AS x \cset > SELECT 2 \; SELECT 3 AS y \cset > SELECT 4 \; SELECT 5 \; SELECT 6 AS z \gset > > In the end, we must have the full 6 queries > > "SELECT 1 AS x \; SELECT 2 \; SELECT 3 AS y \; SELECT 4 \; SELECT 5 \; SELECT 6 AS z" > > and know that we want to set variables from queries 1, 3 and 6 and ignore > the 3 others. I'm not sure I understand this. Why is the "SELECT 2" ignored? (I can see why the 4 and 5 are ignored: they are not processed by gset). What exactly does \cset do? I thought "SELECT 2 \; SELECT 3 AS y \cset" would search for the \; and process *both* queries. I think the doc addition should be split. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Nov-16, Alvaro Herrera wrote: > On 2017-Nov-04, Fabien COELHO wrote: > > > Think of one initialization followed by two appends: > > > > SELECT 1 AS x \cset > > SELECT 2 \; SELECT 3 AS y \cset > > SELECT 4 \; SELECT 5 \; SELECT 6 AS z \gset > > > > In the end, we must have the full 6 queries > > > > "SELECT 1 AS x \; SELECT 2 \; SELECT 3 AS y \; SELECT 4 \; SELECT 5 \; SELECT 6 AS z" > > > > and know that we want to set variables from queries 1, 3 and 6 and ignore > > the 3 others. > > I'm not sure I understand this. Why is the "SELECT 2" ignored? (I can > see why the 4 and 5 are ignored: they are not processed by gset). > > What exactly does \cset do? Oh! I understand it now. You say "replace a semicolon" to mean "works as if it were a semicolon, and also captures the result". So \cset means "works as if it were an escaped semicolon". It all suddenly makes sense now! I think I'll propose some rewording of that explanation, as it was very confusing to me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I think this patch's Command->lines would benefit from using PQExpBuffer (or maybe StringInfo?) for the command string instead of open-coding string manipulation and allocation. I'm not sure that Command->first_line is really all that useful. It seems we go to a lot of trouble to keep it up to date. Isn't it easier to chop Command->lines at the first newline when it is needed? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, Thanks for having a look at this patch. >> Think of one initialization followed by two appends: >> >> SELECT 1 AS x \cset >> SELECT 2 \; SELECT 3 AS y \cset >> SELECT 4 \; SELECT 5 \; SELECT 6 AS z \gset >> >> In the end, we must have the full 6 queries >> >> "SELECT 1 AS x \; SELECT 2 \; SELECT 3 AS y \; SELECT 4 \; SELECT 5 \; SELECT 6 AS z" >> >> and know that we want to set variables from queries 1, 3 and 6 and ignore >> the 3 others. > > I'm not sure I understand this. Why is the "SELECT 2" ignored? Because there is no \cset nor \gset attached to it, so the command does not say to put the result into variables. > (I can see why the 4 and 5 are ignored: they are not processed by gset). Same thing with SELECT 2, which is followed by "\;", meaning execute and that's all. > What exactly does \cset do? I thought "SELECT 2 \; SELECT 3 AS y \cset" > would search for the \; and process *both* queries. No, "\cset" does not end the compound query, only ";" or "\gset" do that. \cset separates queries (like \;) and adds the fact that the just preceding query result is to be put into variables when received. \cset = \; + store result into variables \gset = ; + store result into variables From a performance point of view, the point is to be able to use compound queries which reduce the number of round trips so should impact latency. > I think the doc addition should be split. Indeed, as the current version is confusing. Attached an attempt at clarifying the documentation on this point. The doc is split as suggested, descriptions and examples are specific to each presented case. -- Fabien.
Attachment
> I think this patch's Command->lines would benefit from using PQExpBuffer > (or maybe StringInfo?) for the command string instead of open-coding > string manipulation and allocation. Indeed it could be used, but it is not used anywhere in "pgbench": not for lines, not for variable subtitutions, not for the PREPARE stuff... So I did not think that it was time to start, I just kept the current style. Probably it could be a refactoring patch to replace all basic string stuff with PQExpBuffer infrastructure within pgbench. > I'm not sure that Command->first_line is really all that useful. It > seems we go to a lot of trouble to keep it up to date. Isn't it easier > to chop Command->lines at the first newline when it is needed? Hmmm, it is needed quite often (about 12 times) to report errors, that would mean having to handle the truncation in many places, so I felt it was worth the trouble. -- Fabien.
On 2018-Nov-17, Fabien COELHO wrote: > > > I think this patch's Command->lines would benefit from using PQExpBuffer > > (or maybe StringInfo?) for the command string instead of open-coding > > string manipulation and allocation. > > Indeed it could be used, but it is not used anywhere in "pgbench": not for > lines, not for variable subtitutions, not for the PREPARE stuff... So I did > not think that it was time to start, I just kept the current style. > > Probably it could be a refactoring patch to replace all basic string stuff > with PQExpBuffer infrastructure within pgbench. Well, I think command handling was mostly straightforward before, but it's not as straightforward after your patch. Also, PQExpBuffer is already in use in pgbench when interacting with the lexer, so yeah I think we should fix that. I don't think we should replace *ALL* string handling in pgbench with PQExpBuffer, just the command stuff. > > I'm not sure that Command->first_line is really all that useful. It > > seems we go to a lot of trouble to keep it up to date. Isn't it easier > > to chop Command->lines at the first newline when it is needed? > > Hmmm, it is needed quite often (about 12 times) to report errors, that would > mean having to handle the truncation in many places, so I felt it was worth > the trouble. Ok, as long as we don't repeat the work during script execution. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, >>> I think this patch's Command->lines would benefit from using PQExpBuffer >>> (or maybe StringInfo?) for the command string instead of open-coding >>> string manipulation and allocation. > > [...] Ok. >>> I'm not sure that Command->first_line is really all that useful. It >>> seems we go to a lot of trouble to keep it up to date. Isn't it easier >>> to chop Command->lines at the first newline when it is needed? >> >> Hmmm, it is needed quite often (about 12 times) to report errors, that would >> mean having to handle the truncation in many places, so I felt it was worth >> the trouble. > > Ok, as long as we don't repeat the work during script execution. Sure, the point of first_line is that it is computed once at parse time. Attached a v23 with PQExpBuffer for managing lines. I've also added a function to compute the summary first line, which handles carriage-return. -- Fabien.
Attachment
On 2018-Nov-18, Fabien COELHO wrote: > Attached a v23 with PQExpBuffer for managing lines. > > I've also added a function to compute the summary first line, which handles > carriage-return. Thanks. Please when you rebase, consider these (minor) changes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello Alvaro, > Please when you rebase, consider these (minor) changes. Here it is. -- Fabien.
Attachment
FWIW I think the terminology in this patch is wrong. You use the term "compound" to mean "one query within a string containing multiple queries", but that's not what compound means. Compound is the whole thing, comprised of the multiple queries. Maybe "query" is the right word to use there, not sure. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> FWIW I think the terminology in this patch is wrong. You use the term > "compound" to mean "one query within a string containing multiple > queries", but that's not what compound means. Compound is the whole > thing, comprised of the multiple queries. Indeed. Compounded query? > Maybe "query" is the right word to use there, not sure. I do not understand, "query queries"? I think it should avoid using sql-related words, such as "group", "aggregate", "merge", "join"... I thought of "combined", meaning the queries are combined together in a single message at the protocol level. Basically I'm ok with any better idea. -- Fabien.
I revised this patch a bit. Here's v25, where some finishing touches are needed -- see below. I think with these changes the patch would become committable, at least for me. I didn't like that you were adding an #include of psqlscan_int.h into pgbench.c, when there's a specific comment in the header saying not to do that, so I opted for adding a new accessor function on psqlscan.h. I renamed some of your parameter additions. I think the new names are clearer, but they meant the +1's in your code are now in illogical places. (I moved some; probably not enough). Please review/fix that. I think "gset" is not a great name for the new struct member; please find a better name. I suggest "targetvar" but feel free to choose a name that suits your taste. There are two XXX comments. One is about a function needing a comment atop it. The other is about realloc behavior. To fix this one I would add a new struct member indicating the allocated size of the array, then growing exponentially instead of one at a time. For most cases you can probably get away with never reallocating beyond an initial allocation of, say, 8 members. In the docs, the [prefix] part needs to be explained in the \cset entry; right now it's in \gset, which comes afterwards. Let's move the explanation up, and then in \gset say "prefix behaves as in \cset". Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello Alvaro, > I revised this patch a bit. Here's v25, where some finishing touches > are needed -- see below. I think with these changes the patch would > become committable, at least for me. Thanks a lot for having a look a this patch, and improving it. The updated version did not work, though, but the fix was easy. I do not understand why you have removed the num_commands stuff, which indeed is not very useful but could be for debugging. No big deal. > I didn't like that you were adding an #include of psqlscan_int.h into > pgbench.c, when there's a specific comment in the header saying not to > do that, Oops, I did not notice the comment. > so I opted for adding a new accessor function on psqlscan.h. Ok. > I renamed some of your parameter additions. I think the new names are > clearer, but they meant the +1's in your code are now in illogical > places. (I moved some; probably not enough). Please review/fix that. It needed some fixing. I understood that you suggest to avoid keeping the last index and prefer the number of elements instead, so I applied it to the Command struct as well. > I think "gset" is not a great name for the new struct member; Indeed. > please find a better name. I suggest "targetvar" but feel free to > choose a name that suits your taste. Ok. Note that it is not a variable name but a prefix, so I named it "prefix". > There are two XXX comments. One is about a function needing a comment > atop it. Ok. > The other is about realloc behavior. To fix this one I would add a new > struct member indicating the allocated size of the array, then growing > exponentially instead of one at a time. For most cases you can probably > get away with never reallocating beyond an initial allocation of, say, 8 > members. Yep. I guess I did it 1 by 1 because it should be a rare case and it was saving a counter. I've done some exponential thing instead. > In the docs, the [prefix] part needs to be explained in the \cset entry; > right now it's in \gset, which comes afterwards. Let's move the > explanation up, and then in \gset say "prefix behaves as in \cset". I do not understand, the very same explanation text about prefix appears in both entry? The examples are different, is that what you mean? They are about different backslash commands, so they have an interest of their own. Attached a v26 with I hope everything ok, but for the documentation that I'm unsure how to improve. -- Fabien.
Attachment
> Attached a v26 with I hope everything ok, but for the documentation that I'm > unsure how to improve. Attached v27 is the same but with an improved documentation where full examples, with and without prefix, are provided for both cset & gset. -- Fabien.
Attachment
On 2019-Jan-09, Fabien COELHO wrote: > > Attached a v26 with I hope everything ok, but for the documentation that > > I'm unsure how to improve. > > Attached v27 is the same but with an improved documentation where full > examples, with and without prefix, are provided for both cset & gset. I have already made changes on top of v26. Can you please submit this as a delta patch on top of v26? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> Attached v27 is the same but with an improved documentation where full >> examples, with and without prefix, are provided for both cset & gset. > > I have already made changes on top of v26. Can you please submit this > as a delta patch on top of v26? Attached. -- Fabien.
Attachment
Here are my proposed final changes. I noticed that you were allocating the prefixes for all cases even when there were no cset/gset in the command; I changed it to delay the allocation until needed. I also noticed you were skipping the Meta enum dance for no good reason; added that makes conditionals simpler. The read_response routine seemed misplaced and I gave it a name in a style closer to the rest of the pgbench code. Also, you were missing to free the ->lines pqexpbuffer when the command is discarded. I grant that the free_command() stuff might be bogus since it's only tested with a command that's barely initialized, but it seems better to make it bogus in this way (fixable if we ever extend its use) than to forever leak memory silently. I didn't test this beyond running "make check". -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello Alvaro, > Here are my proposed final changes. Thanks again for improving the patch! > I noticed that you were allocating the prefixes for all cases even when > there were no cset/gset in the command; I changed it to delay the > allocation until needed. Ok, why not. > I also noticed you were skipping the Meta enum dance for no good reason; Indeed. I think that the initial version of the patch was before the "dance" was added, and it did not keep up with it. > added that makes conditionals simpler. The read_response routine seemed > misplaced and I gave it a name in a style closer to the rest of the > pgbench code. Fine. > Also, you were missing to free the ->lines pqexpbuffer when the command > is discarded. I grant that the free_command() stuff might be bogus > since it's only tested with a command that's barely initialized, but it > seems better to make it bogus in this way (fixable if we ever extend its > use) than to forever leak memory silently. Ok. However, I switched "pg_free" to "termPQExpBuffer", which seems more appropriate, even if it just does the same thing. I also ensured that prefixes are allocated & freed. I've commented about expr which is not freed. > I didn't test this beyond running "make check". That's a good start. I'm not keen on having the command array size checked and updated *after* the command is appended, even if the initial allocation ensures that there is no overflow, but I let it as is. Further tests did not yield any new issue. Attached a v29 with the above minor changes wrt your version. -- Fabien.
Attachment
On 2019-Jan-10, Fabien COELHO wrote: > However, I switched "pg_free" to "termPQExpBuffer", which seems more > appropriate, even if it just does the same thing. I also ensured that > prefixes are allocated & freed. I've commented about expr which is not > freed. Oops, of course, thanks. > I'm not keen on having the command array size checked and updated *after* > the command is appended, even if the initial allocation ensures that there > is no overflow, but I let it as is. It was already done that way, only it was done in two places rather than one. I just refactored it. (In fairness, I think the assignment of the new command to the array could also be done in one place instead of two, but it seems slightly clearer like this.) > Attached a v29 with the above minor changes wrt your version. Thanks, pushed. I fixed a couple of very minor issues in the docs. Now let's see how the buildfarm likes this ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Now let's see how the buildfarm likes this ... This \cset thing seem like an incredibly badly thought out kluge. What is its excuse to live? regards, tom lane
On 2019-Jan-10, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Now let's see how the buildfarm likes this ... > > This \cset thing seem like an incredibly badly thought out kluge. > What is its excuse to live? The reason is that you can set variables from several queries in one network trip. We can take it out I guess, but my impression was that we already pretty much had a consensus that it was wanted. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-Jan-10, Tom Lane wrote: >> This \cset thing seem like an incredibly badly thought out kluge. >> What is its excuse to live? > The reason is that you can set variables from several queries in one > network trip. So who needs that? Just merge the queries, if it's so important that you avoid multiple round trips. > We can take it out I guess, but my impression was that we already pretty > much had a consensus that it was wanted. Maybe if the implementation weren't a pile of junk it'd be all right, but as-is this is a mess. The dependency on counting \; in particular is setting me off, because that has little if anything to do with the number of query results to be expected. I imagine the argument will be that nobody would write the sort of queries that break that assumption in a pgbench script; but I don't find that kind of design to be up to project standards, especially not when the argument for the feature is tissue-thin in the first place. regards, tom lane
On 2019-Jan-10, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2019-Jan-10, Tom Lane wrote: > >> This \cset thing seem like an incredibly badly thought out kluge. > >> What is its excuse to live? > > > The reason is that you can set variables from several queries in one > > network trip. > > So who needs that? Just merge the queries, if it's so important that > you avoid multiple round trips. Hmm, I suppose that's true. > > We can take it out I guess, but my impression was that we already pretty > > much had a consensus that it was wanted. > > Maybe if the implementation weren't a pile of junk it'd be all right, > but as-is this is a mess. The dependency on counting \; in particular > is setting me off, because that has little if anything to do with the > number of query results to be expected. I imagine the argument will > be that nobody would write the sort of queries that break that assumption > in a pgbench script; but I don't find that kind of design to be up > to project standards, especially not when the argument for the feature > is tissue-thin in the first place. There's a lot of the new code in pgbench that can be simplified if we remove \cset. I'll leave time for others to argue for or against cset, and then act accordingly. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Tom, > So who needs that? Just merge the queries, if it's so important that > you avoid multiple round trips. Pgbench is about testing (measuring) performance in various settings and realistic scenarii: queries prepared or not, possible combined, and so on. As postgres allows to send several queries into one message, I think it interesting to be able to test the performance impact of doing that (the answer is very significant, esp wrt lantency), and it should not preclude to get the results out as a client application could do. Queries can be "merged", but ISTM syntax is not especially friendly (UNION, SELECT of SELECT, CROSS JOIN not sure which one you had in mind...) nor reprensentative of what a client application would really do, and it would mess with readability, message size, planning and so. Also, compound queries need to return all one row, but this constraint is neeeded for variable. >> We can take it out I guess, but my impression was that we already pretty >> much had a consensus that it was wanted. > > Maybe if the implementation weren't a pile of junk it'd be all right, > but as-is this is a mess. Thanks:-) > The dependency on counting \; in particular is setting me off, because > that has little if anything to do with the number of query results to be > expected. The kludge is because there is a kludge (aka optimizations:-) server side to silently ignore empty queries. On "SELECT 1 \; /**/ \; SELECT 2 ;" the server sends two results back instead of 3, whereas it should logically return an empty result on the empty query. Now pgbench could detect that there is an empty query (possibly skipping comments and so), an early version of the patch did that AFAICR, but the code did not seem worth it, it seemed cleaner to just document the restriction, so it was removed. > I imagine the argument will be that nobody would write the sort of > queries that break that assumption in a pgbench script; Detecting empty queries is possible, although the code for doing that is kind of ugly and would look bad in the lexer, on which it seemed desirable to have minimum changes. > but I don't find that kind of design to be up to project standards, > especially not when the argument for the feature is tissue-thin in the > first place. The "first place" is to be able to implement more realistic scenarii, and to have getting results into variables orthogonal to combined queries. Although I'm not specially thrilled by the resulting syntax, the point is to provide a feature pertinent to performance testing, not to have a incredibly well designed syntax. It just goes on with the existing backslash approach used by psql & pgbench, which has the significant advantage that it is mostly SQL with a few things around. -- Fabien.
Hello Alvaro, > There's a lot of the new code in pgbench that can be simplified if we > remove \cset. I'm not very happy with the resulting syntax, but IMO the feature is useful. My initial design was to copy PL/pgSQL "into" with some "\into" orthogonal to \; and ;, but the implementation was not especially nice and I was told to use psql's \gset approach, which I did. If we do not provide \cset, then combined queries and getting results are not orthogonal, although from a performance testing point of view an application could do both, and the point is to allow pgbench to test the performance impact of doing that. There are other existing restrictions which are a arbitrary, eg you cannot use prepared with combined. I wish not to add more of this kind of restrictions, which are not "up to project standard" in my opinion. I may try to remove this particular restriction in the future. Not many people know that queries can be combined, but if you are interested in latency that is really an interesting option, and being able to check how much can be gained from doing that is a point of a tool like pgbench. -- Fabien.