Thread: pgbench and timestamps

pgbench and timestamps

From
Jaime Soler
Date:
Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in a interval time.sql: 
 (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
query executed successfullly with psql
/usr/lib/postgresql/12/bin/psql -p 5432 -h localhost -d picp -U postgres -f time.sql
BEGIN
?column?
--------------------------
2005-11-24 13:22:02.4781
(1 fila)COMMIT
psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
but look at what happen with pgbench
pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
pghost: localhost pgport: 5432 nclients: 2 nxacts: 10 dbName: picp
starting vacuum...ERROR: no existe la relación «pgbench_branches»
(ignoring this error and continuing anyway)
ERROR: no existe la relación «pgbench_tellers»
(ignoring this error and continuing anyway)
ERROR: no existe la relación «pgbench_history»
(ignoring this error and continuing anyway)
end.
client 0 executing script "time.sql"
ERROR: la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
LINE 1: ...t timestamp '2005-09-01' + random() * ( timestamp '2006-03-0...
^
client 0 sending P0_0
client 0 receiving
client 0 receiving
client 0 sending P0_1
client 0 receiving
client 0 receiving
client 0 script 0 aborted in command 1 query 0: ERROR: no existe la sentencia preparada «P0_1»
client 1 executing script "time.sql"
ERROR: la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
LINE 1: ...t timestamp '2005-09-01' + random() * ( timestamp '2006-03-0...
^Run was aborted; the above results are incomplete.
pgbench (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00' 

Regards

Re: pgbench and timestamps

From
David Rowley
Date:
On Wed, 24 Jun 2020 at 20:41, Jaime Soler <jaime.soler@gmail.com> wrote:
>
> Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in
ainterval time.sql: 
>
>  (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
> pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
> ERROR:  la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
>
> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David



Re: pgbench and timestamps

From
Jaime Soler
Date:
Hi,

Thanks for your comments, I worked around that problem because I was able to truncate the timestamp and use only the date part , alsoit might works the use of to_timestamp.  But I would like to understand what is happening , I realized that pgbench is identified erroneously  the minutes and seconds parts :00:00 as two variables .

Regards

El mié., 24 jun. 2020 a las 14:50, David Rowley (<dgrowleyml@gmail.com>) escribió:
On Wed, 24 Jun 2020 at 20:41, Jaime Soler <jaime.soler@gmail.com> wrote:
>
> Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in a interval time.sql:
>
>  (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
> pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
> ERROR:  la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
>
> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David

Re: pgbench and timestamps

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 24 Jun 2020 at 20:41, Jaime Soler <jaime.soler@gmail.com> wrote:
>> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

> I've not debugged it, but it looks like pgbench thinks that :00 is a
> pgbench variable and is replacing each instance with a query
> parameter.

Yeah.

> I don't often do much with pgbench and variables, but there are a few
> things that surprise me here.

> 1) That pgbench replaces variables within single quotes, and;
> 2) that we still think it's a variable name when it starts with a digit, and;
> 3) We replace variables that are undefined.

Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".

I can think of use-cases for substituting variables inside quotes,
so maybe (1) isn't a bug; but it sure seems like (3) and (4) are.

In any case, the documentation about this seems pretty inadequate.

            regards, tom lane



Re: pgbench and timestamps

From
Tom Lane
Date:
I wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> I don't often do much with pgbench and variables, but there are a few
>> things that surprise me here.
>> 1) That pgbench replaces variables within single quotes, and;
>> 2) that we still think it's a variable name when it starts with a digit, and;
>> 3) We replace variables that are undefined.

> Also (4) this only happens when in non-simple query mode --- the
> example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode.  So that explains the
behavioral difference.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet.  We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode.  The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

            regards, tom lane



Re: pgbench and timestamps

From
Fabien COELHO
Date:
I'll look into it. Thanks for the analysis and CC-ing.

-- 
Fabien.



Re: pgbench and timestamps

From
Jaime Soler
Date:
Thanks for your analysis.


Regards

El mié., 24 jun. 2020 a las 17:17, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:
I wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> I don't often do much with pgbench and variables, but there are a few
>> things that surprise me here.
>> 1) That pgbench replaces variables within single quotes, and;
>> 2) that we still think it's a variable name when it starts with a digit, and;
>> 3) We replace variables that are undefined.

> Also (4) this only happens when in non-simple query mode --- the
> example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode.  So that explains the
behavioral difference.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet.  We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode.  The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

                        regards, tom lane