Thread: Selecting a constant question
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">SELECT 1 FROM test.dbo.a_003 </span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><spanstyle="font-size: 12.0pt"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">gets about 60,000 records per second</span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><spanstyle="font-size: 12.0pt"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">SELECT '1' FROM test.dbo.a_003 </span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><spanstyle="font-size: 12.0pt"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">gets about 600 records per second.</span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><spanstyle="font-size: 12.0pt"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">The cause is that postgres describes the return column as "unknown" length 65534 in the 2nd case.</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Since the value is a constant, it seems rather odd to make the length 65534 characters. Why not makeit char(1) or some other appropriate and less costly data type? After all, it isn’t going to grow during the query.</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font></div>
"Dann Corbit" <DCorbit@connx.com> writes: > SELECT 1 FROM test.dbo.a_003 > gets about 60,000 records per second > SELECT '1' FROM test.dbo.a_003 > gets about 600 records per second. > The cause is that postgres describes the return column as "unknown" > length 65534 in the 2nd case. Postgres describes it in no such fashion --- unknown will always have a typmod of -1 which means "unspecified". Possibly you have some client code that knows much less than it thinks it does about the meanings of typmod values? The actual volume of data transmitted is going to be just about the same either way, so I'm not sure you've diagnosed the cause of slowdown correctly. Trying the example in psql seems to be about the same speed both ways, with if anything a slight advantage to select '1'. regards, tom lane
"Dann Corbit" <DCorbit@connx.com> writes: > SELECT 1 FROM test.dbo.a_003 > > gets about 60,000 records per second > > SELECT '1' FROM test.dbo.a_003 > > gets about 600 records per second. > > The cause is that postgres describes the return column as "unknown" > length 65534 in the 2nd case. Wait, back up. How does this cause it to go slower? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> -----Original Message----- > From: Gregory Stark [mailto:stark@enterprisedb.com] > Sent: Monday, June 11, 2007 12:48 PM > To: Dann Corbit > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > "Dann Corbit" <DCorbit@connx.com> writes: > > > SELECT 1 FROM test.dbo.a_003 > > > > gets about 60,000 records per second > > > > SELECT '1' FROM test.dbo.a_003 > > > > gets about 600 records per second. > > > > The cause is that postgres describes the return column as "unknown" > > length 65534 in the 2nd case. > > Wait, back up. How does this cause it to go slower? The issue is this: Postgres describes the column with a typmod of -1 (unknown) and a length of 65534. This means that any client application must reserve 65534 bytes of spaces for every row of data (like a grid control for example), which postgres should know (and report) that the maximum length of the column is 1. This is not a PSQL issue, it's an issue with other products relying on the accuracy of the reported postgres metadata for a given SQL statement.
On Mon, Jun 11, 2007 at 12:55:55PM -0700, Dann Corbit wrote: > The issue is this: > > Postgres describes the column with a typmod of -1 (unknown) and a length > of 65534. Postgres does no such thing. How can it possibly know the maximum size of a column before executing the query? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
"Dann Corbit" <DCorbit@connx.com> writes: > The issue is this: > Postgres describes the column with a typmod of -1 (unknown) and a length > of 65534. Oh, you're looking at typlen not typmod. Please observe the comments in pg_type.h: /* * For a fixed-size type, typlen is the number of bytes we use to * represent a value of this type, e.g. 4 for an int4. But for a * variable-length type, typlen is negative. We use -1 to indicate a * "varlena" type (one that has a lengthword), -2 to indicate a * null-terminated C string. */int2 typlen; You should be treating typlen as signed not unsigned, and not assuming a fixed width for any negative value. Since the width refers to the server internal representation, and not to what comes down the wire, I find it pretty strange for an application to be using typlen for anything at all actually. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, June 11, 2007 1:32 PM > To: Dann Corbit > Cc: Gregory Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > "Dann Corbit" <DCorbit@connx.com> writes: > > The issue is this: > > Postgres describes the column with a typmod of -1 (unknown) and a length > > of 65534. > > Oh, you're looking at typlen not typmod. Please observe the comments in > pg_type.h: > > /* > * For a fixed-size type, typlen is the number of bytes we use to > * represent a value of this type, e.g. 4 for an int4. But for > a > * variable-length type, typlen is negative. We use -1 to indicate > a > * "varlena" type (one that has a length word), -2 to indicate a > * null-terminated C string. > */ > int2 typlen; > > You should be treating typlen as signed not unsigned, and not assuming a > fixed width for any negative value. > > Since the width refers to the server internal representation, and not to > what comes down the wire, I find it pretty strange for an application to > be using typlen for anything at all actually. Thanks for the response. Since libpq function PQfsize returns -2 for all constant character strings in SQL statements ... What is the proper procedure to determine the length of a constant character column after query execution but before fetching the first row of data?
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Trying the example in psql seems to be about the same speed both ways, with > if anything a slight advantage to select '1'. Fwiw I see a slight advantage for '1' as well. I wonder why. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> -----Original Message----- > From: Martijn van Oosterhout [mailto:kleptog@svana.org] > Sent: Monday, June 11, 2007 1:46 PM > To: Dann Corbit > Subject: Re: [HACKERS] Selecting a constant question > > On Mon, Jun 11, 2007 at 01:29:37PM -0700, Dann Corbit wrote: > > Our application is using the libPQ interface to access postgres. > > > > The query is "select '123' from <tablename>" .. the table is not > > important. > > > > After executing the query, we interrogate the metadata of the result set > > using the PQfsize, PQfmod and PQftype functions. > > Did you read the documentation of the PQfsize function? > > PQfsize returns the space allocated for this column in a database row, > in other words the size of the server's internal representation of the > data type. (Accordingly, it is not really very useful to clients.) A > negative value indicates the data type is variable-length. > > http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html#LIBPQ-EXE C- > SELECT-INFO > > > The size of the column is returned as 65534 (or -2 if you consider this > > a signed short value) > > It's variable length, you can't say anything more. So what you are saying is that the constant '1' is variable length, and there is no way to find out the maximum length from the database.
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Dann Corbit > Sent: Monday, June 11, 2007 1:52 PM > To: Martijn van Oosterhout > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > > -----Original Message----- > > From: Martijn van Oosterhout [mailto:kleptog@svana.org] > > Sent: Monday, June 11, 2007 1:46 PM > > To: Dann Corbit > > Subject: Re: [HACKERS] Selecting a constant question > > > > On Mon, Jun 11, 2007 at 01:29:37PM -0700, Dann Corbit wrote: > > > Our application is using the libPQ interface to access postgres. > > > > > > The query is "select '123' from <tablename>" .. the table is not > > > important. > > > > > > After executing the query, we interrogate the metadata of the result > set > > > using the PQfsize, PQfmod and PQftype functions. > > > > Did you read the documentation of the PQfsize function? > > > > PQfsize returns the space allocated for this column in a database row, > > in other words the size of the server's internal representation of the > > data type. (Accordingly, it is not really very useful to clients.) A > > negative value indicates the data type is variable-length. > > > > > http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html#LIBPQ-EXE > C- > > SELECT-INFO > > > > > The size of the column is returned as 65534 (or -2 if you consider > this > > > a signed short value) > > > > It's variable length, you can't say anything more. > > So what you are saying is that the constant '1' is variable length, and > there is no way to find out the maximum length from the database. I have a PostgreSQL feature request: Report the maximum size of a variable length string from the server. Surely, we cannot be the only people who will need this information. If (for example) someone wants to bind to a grid, then the maximum size has to be known in advance.
"Dann Corbit" <DCorbit@connx.com> writes: > Surely, we cannot be the only people who will need this information. If > (for example) someone wants to bind to a grid, then the maximum size has > to be known in advance. In fact psql needs it and implements this. It has to skim through the entire result set to calculate the column widths. It's quite a lot of work but the server is in no better position to do it than psql. On the contrary the server is missing quite a bit of information of how you intend to display the information. Do you need the number of bytes or characters? Are all the characters the same width in your display system? What about currency symbols? Do you intend to reverse any quoting or just display backslashes? Even knowing how many characters and assuming fixed character widths that wouldn't even be enough to set your grid control widths. Usually people like numeric quantities decimal aligned and so two records "1.00" and "0.01" will take much more width than two records with "1.00" and "2.00". -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> -----Original Message----- > From: Gregory Stark [mailto:stark@enterprisedb.com] > Sent: Monday, June 11, 2007 2:41 PM > To: Dann Corbit > Cc: Martijn van Oosterhout; pgsql-hackers@postgresql.org; Larry McGhaw > Subject: Re: [HACKERS] Selecting a constant question > > "Dann Corbit" <DCorbit@connx.com> writes: > > > Surely, we cannot be the only people who will need this information. If > > (for example) someone wants to bind to a grid, then the maximum size has > > to be known in advance. > > In fact psql needs it and implements this. It has to skim through the > entire > result set to calculate the column widths. It's quite a lot of work but > the > server is in no better position to do it than psql. Reading the data twice sounds a little painful. What if there are 30 million rows? > On the contrary the server is missing quite a bit of information of how > you > intend to display the information. Do you need the number of bytes or > characters? Are all the characters the same width in your display system? > What > about currency symbols? Do you intend to reverse any quoting or just > display > backslashes? Giving me the information about the data type will be enough. As an example, in this case we have varchar data. If the server should be so kind as to report varchar(1) for '1' or varchar(3) for '123' then I would not have any difficulty binding the data to a grid. > Even knowing how many characters and assuming fixed character widths that > wouldn't even be enough to set your grid control widths. Usually people > like > numeric quantities decimal aligned and so two records "1.00" and "0.01" > will > take much more width than two records with "1.00" and "2.00". SQL*Server, Oracle, Ingres, DB/2 and other database systems somehow manage to do it, so I guess it is not technically intractable. I suspect that your own ODBC/JDBC and other drivers suffer from this same effect. Now, I do recognize that sometimes nobody is going to know how big something is, including the server. But with a query using a constant it seems like it ought to be well defined to me. Perhaps the difficulties are escaping me because I am not familiar with the low level guts of this problem. But I suspect that lots of people besides me would benefit if sizes of things were known when it is possible to know them. As I said before, I see that it cannot be known right now. So I am putting it in as a feature request. If you could be so kind as to point out the right spot to look in the server code, I imagine we could fix it and check in the patch ourselves.
Dann Corbit wrote: > > "Dann Corbit" <DCorbit@connx.com> writes: > > > > In fact psql needs it and implements this. It has to skim through the > > entire > > result set to calculate the column widths. It's quite a lot of work > but > > the > > server is in no better position to do it than psql. > > Reading the data twice sounds a little painful. What if there are 30 > million rows? You get an "out of memory" error. > > On the contrary the server is missing quite a bit of information of > > how you intend to display the information. Do you need the number of > > bytes or characters? Are all the characters the same width in your > > display system? What about currency symbols? Do you intend to > > reverse any quoting or just display backslashes? > > Giving me the information about the data type will be enough. As an > example, in this case we have varchar data. If the server should be so > kind as to report varchar(1) for '1' or varchar(3) for '123' then I > would not have any difficulty binding the data to a grid. Oh, you have the length information for each datum all right. It's on the first four bytes of it. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "The West won the world not by the superiority of its ideas or values or religion but rather by its superiority in applying organized violence. Westerners often forget this fact, non-Westerners never do." (Samuel P. Huntington)
> -----Original Message----- > From: Alvaro Herrera [mailto:alvherre@commandprompt.com] > Sent: Monday, June 11, 2007 3:16 PM > To: Dann Corbit > Cc: Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org; > Larry McGhaw > Subject: Re: [HACKERS] Selecting a constant question > > Dann Corbit wrote: > > > > "Dann Corbit" <DCorbit@connx.com> writes: > > > > > > In fact psql needs it and implements this. It has to skim through the > > > entire > > > result set to calculate the column widths. It's quite a lot of work > > but > > > the > > > server is in no better position to do it than psql. > > > > Reading the data twice sounds a little painful. What if there are 30 > > million rows? > > You get an "out of memory" error. > > > > On the contrary the server is missing quite a bit of information of > > > how you intend to display the information. Do you need the number of > > > bytes or characters? Are all the characters the same width in your > > > display system? What about currency symbols? Do you intend to > > > reverse any quoting or just display backslashes? > > > > Giving me the information about the data type will be enough. As an > > example, in this case we have varchar data. If the server should be so > > kind as to report varchar(1) for '1' or varchar(3) for '123' then I > > would not have any difficulty binding the data to a grid. > > Oh, you have the length information for each datum all right. It's on > the first four bytes of it. Sure, but when I bind to a grid, I need to know a-priori how big the biggest returned instance can be. Reading the entire data set twice to learn the size of a constant seems rather conceptually odd to me.
Dann Corbit wrote: > > Oh, you have the length information for each datum all right. It's on > > the first four bytes of it. > > Sure, but when I bind to a grid, I need to know a-priori how big the > biggest returned instance can be. Reading the entire data set twice to > learn the size of a constant seems rather conceptually odd to me. Did you read up on typmod already? I think that's part of the info sent down in the query response. -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ "No single strategy is always right (Unless the boss says so)" (Larry Wall)
On Mon, Jun 11, 2007 at 03:18:33PM -0700, Dann Corbit wrote: > Sure, but when I bind to a grid, I need to know a-priori how big the > biggest returned instance can be. Reading the entire data set twice to > learn the size of a constant seems rather conceptually odd to me. To be honest, the concept that a widget requires a constant that can't be changed later is also a bit odd. There are many times you won't know beforehand how big the data is, surely the framework should be smart enough to handle these cases? Start the width at 100, if it turns out to be too small, make it bigger... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Dann Corbit wrote: > I have a PostgreSQL feature request: > > Report the maximum size of a variable length string from the server. > > Surely, we cannot be the only people who will need this information. If > (for example) someone wants to bind to a grid, then the maximum size has > to be known in advance. > > > > Does PQfmod not tell you what you need if the field is varchar(n) ? cheers andrew
"Dann Corbit" <DCorbit@connx.com> writes: > Giving me the information about the data type will be enough. As an > example, in this case we have varchar data. If the server should be so > kind as to report varchar(1) for '1' or varchar(3) for '123' then I > would not have any difficulty binding the data to a grid. This seems merest fantasy. Reflect on multibyte character sets for a bit --- even if it's known that the column is varchar(3) there is no guarantee that the value will fit in 3 bytes. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, June 11, 2007 3:35 PM > To: Dann Corbit > Cc: Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org; > Larry McGhaw > Subject: Re: [HACKERS] Selecting a constant question > > "Dann Corbit" <DCorbit@connx.com> writes: > > Giving me the information about the data type will be enough. As an > > example, in this case we have varchar data. If the server should be so > > kind as to report varchar(1) for '1' or varchar(3) for '123' then I > > would not have any difficulty binding the data to a grid. > > This seems merest fantasy. Reflect on multibyte character sets for a > bit --- even if it's known that the column is varchar(3) there is no > guarantee that the value will fit in 3 bytes. If the server bound the data as UNICODE, then it will tell me UNICODE(3). I know how big this will be. In the worst case scenario it will fit in 3*4 = 12 bytes. If the server is built without UNICODE enabled, then it will definitely fit in 3 bytes.
> -----Original Message----- > From: Martijn van Oosterhout [mailto:kleptog@svana.org] > Sent: Monday, June 11, 2007 3:29 PM > To: Dann Corbit > Cc: Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; Larry > McGhaw > Subject: Re: [HACKERS] Selecting a constant question > > On Mon, Jun 11, 2007 at 03:18:33PM -0700, Dann Corbit wrote: > > Sure, but when I bind to a grid, I need to know a-priori how big the > > biggest returned instance can be. Reading the entire data set twice to > > learn the size of a constant seems rather conceptually odd to me. > > To be honest, the concept that a widget requires a constant that can't > be changed later is also a bit odd. Not when the data itself is a constant that cannot be changed. > There are many times you won't know > beforehand how big the data is, surely the framework should be smart > enough to handle these cases? If it were impossible to know the size of a string constant supplied in the query, then I think I would agree with you here. However, it seems to me that the maximum possible size of such a known, constant-width string is not hard to determine. > Start the width at 100, if it turns out to be too small, make it > bigger... If that were a good idea, then why report data sizes at all? Just let it always be a surprise when it comes streaming down the pipe. Honestly, I cannot fathom this answer.
Dann Corbit wrote: > If the server bound the data as UNICODE, then it will tell me > UNICODE(3). I know how big this will be. > > In the worst case scenario it will fit in 3*4 = 12 bytes. > > If the server is built without UNICODE enabled, then it will definitely > fit in 3 bytes. Unless it's some other multibyte encoding. And nowadays, the server is always "unicode enabled". The stuff sent down the wire is unicode or not depending on a configuration parameter. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"Dann Corbit" <DCorbit@connx.com> writes: >> To be honest, the concept that a widget requires a constant that can't >> be changed later is also a bit odd. > Not when the data itself is a constant that cannot be changed. Surely this case is not sufficiently important to justify designing your entire application (not to mention the client/server protocol) around it. You're always going to have variable-width columns in there somewhere. regards, tom lane
> -----Original Message----- > From: Alvaro Herrera [mailto:alvherre@commandprompt.com] > Sent: Monday, June 11, 2007 3:44 PM > To: Dann Corbit > Cc: Tom Lane; Gregory Stark; Martijn van Oosterhout; pgsql- > hackers@postgresql.org; Larry McGhaw > Subject: Re: [HACKERS] Selecting a constant question > > Dann Corbit wrote: > > > If the server bound the data as UNICODE, then it will tell me > > UNICODE(3). I know how big this will be. > > > > In the worst case scenario it will fit in 3*4 = 12 bytes. > > > > If the server is built without UNICODE enabled, then it will definitely > > fit in 3 bytes. > > Unless it's some other multibyte encoding. And nowadays, the server is > always "unicode enabled". The stuff sent down the wire is unicode or > not depending on a configuration parameter. Even at that, we still know an absolute maximum of 12 bytes.
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, June 11, 2007 3:50 PM > To: Dann Corbit > Cc: Martijn van Oosterhout; Alvaro Herrera; Gregory Stark; pgsql- > hackers@postgresql.org; Larry McGhaw > Subject: Re: [HACKERS] Selecting a constant question > > "Dann Corbit" <DCorbit@connx.com> writes: > >> To be honest, the concept that a widget requires a constant that can't > >> be changed later is also a bit odd. > > > Not when the data itself is a constant that cannot be changed. > > Surely this case is not sufficiently important to justify designing > your entire application (not to mention the client/server protocol) > around it. You're always going to have variable-width columns in there > somewhere. Right. But normally I get back a length for those variable length columns, or I can collect it from the metadata of the database. Surely, PostgreSQL can determine the size of a constant string. Otherwise it would be impossible to know if it would be safe to insert a constant string into a database column. PostgreSQL has decided upon a data type, and gives me data bound in that type. It is only the length that it is unwilling to divulge.
I think perhaps we have lost sight of the main issue: 1) libpq can properly describe the maximum internal data size of any numeric or char column in a table via Pqfsize 2) libpq can properly describe the maximum internal data size of any varchar column via Pqfmod 3) libpq can properly describe the maximum internal data size of any numeric constant in a SQL statement via Pqfsize 4) libpq **cannot** describe the maximum internal data size of a char or varchar constant! Example: select '123' from <any table> This is clearly a bug or serious oversight in libpq that should be addressed. The database *knows* this size of the char constant (obviously), and should report the size via a metadata call, as all other relational databases do. Thanks lm -----Original Message----- From: Alvaro Herrera [mailto:alvherre@commandprompt.com] Sent: Monday, June 11, 2007 3:44 PM To: Dann Corbit Cc: Tom Lane; Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org; Larry McGhaw Subject: Re: [HACKERS] Selecting a constant question Dann Corbit wrote: > If the server bound the data as UNICODE, then it will tell me > UNICODE(3). I know how big this will be. > > In the worst case scenario it will fit in 3*4 = 12 bytes. > > If the server is built without UNICODE enabled, then it will > definitely fit in 3 bytes. Unless it's some other multibyte encoding. And nowadays, the server is always "unicode enabled". The stuff sent down the wire is unicode or not depending on a configuration parameter. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
I think perhaps we have lost sight of the main issue: 1) libpq can properly describe the maximum internal data size of any numeric or char column in a table via Pqfsize 2) libpq can properly describe the maximum internal data size of any varchar column via Pqfmod 3) libpq can properly describe the maximum internal data size of any numeric constant in a SQL statement via Pqfsize 4) libpq **cannot** describe the maximum internal data size of a char or varchar constant! Example: select '123' from <any table> This is clearly a bug or serious oversight in libpq that should be addressed. The database *knows* this size of the char constant (obviously), and should report the size via a metadata call, as all other relational databases do. Thanks lm -----Original Message----- From: Alvaro Herrera [mailto:alvherre@commandprompt.com] Sent: Monday, June 11, 2007 3:44 PM To: Dann Corbit Cc: Tom Lane; Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org; Larry McGhaw Subject: Re: [HACKERS] Selecting a constant question Dann Corbit wrote: > If the server bound the data as UNICODE, then it will tell me > UNICODE(3). I know how big this will be. > > In the worst case scenario it will fit in 3*4 = 12 bytes. > > If the server is built without UNICODE enabled, then it will > definitely fit in 3 bytes. Unless it's some other multibyte encoding. And nowadays, the server is always "unicode enabled". The stuff sent down the wire is unicode or not depending on a configuration parameter. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, 11 Jun 2007, Larry McGhaw wrote: > I think perhaps we have lost sight of the main issue: > > 2) libpq can properly describe the maximum internal data size of any > varchar column via Pqfmod SELECT cola || colb FROM tab; > 3) libpq can properly describe the maximum internal data size of any > numeric constant in a SQL statement via Pqfsize SELECT 3::numeric; Kris Jurka
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Kris Jurka > Sent: Monday, June 11, 2007 5:04 PM > To: Larry McGhaw > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > > > On Mon, 11 Jun 2007, Larry McGhaw wrote: > > > I think perhaps we have lost sight of the main issue: > > > > 2) libpq can properly describe the maximum internal data size of any > > varchar column via Pqfmod > > SELECT cola || colb FROM tab; Suggestion: Return (column size of cola) + (column size of colb) in the maximum length field. > > 3) libpq can properly describe the maximum internal data size of any > > numeric constant in a SQL statement via Pqfsize > > SELECT 3::numeric; Suggestion: Return sizeof (numeric(1,0)) -- after all, it's a constant here. In the words of the great poet "Spike Lee": 'Always do the right thing.'
Larry McGhaw wrote: > 4) libpq **cannot** describe the maximum internal data size of a char or > varchar constant! > Example: select '123' from <any table> > > This is clearly a bug or serious oversight in libpq that should be > addressed. > > The database *knows* this size of the char constant (obviously), and > should report the size via a metadata call, as all other relational > databases do. > > > What is not clear to me is why it is so important for you to know the length of a piece of data you are supplying. If it is so vitally important, you could always cast it, e.g. select '123'::varchar(3) cheers andrew
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Andrew Dunstan > Sent: Monday, June 11, 2007 5:12 PM > To: Larry McGhaw > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > > > Larry McGhaw wrote: > > 4) libpq **cannot** describe the maximum internal data size of a char or > > varchar constant! > > Example: select '123' from <any table> > > > > This is clearly a bug or serious oversight in libpq that should be > > addressed. > > > > The database *knows* this size of the char constant (obviously), and > > should report the size via a metadata call, as all other relational > > databases do. > > > > > > > > What is not clear to me is why it is so important for you to know the > length of a piece of data you are supplying. If it is so vitally > important, you could always cast it, e.g. select '123'::varchar(3) We're a middleware company. We are not in control of the queries that are sent. We can intercept and reformat them, and perhaps that is what we will need to do for PostgreSQL
"Larry McGhaw" <lmcghaw@connx.com> writes: > I think perhaps we have lost sight of the main issue: > 1) libpq can properly describe the maximum internal data size of any > numeric or char column in a table via Pqfsize > 2) libpq can properly describe the maximum internal data size of any > varchar column via Pqfmod > 3) libpq can properly describe the maximum internal data size of any > numeric constant in a SQL statement via Pqfsize None of the above statements are actually true, at least not when you take off your blinders and note the existence of unconstrained-width numeric and text columns. > The database *knows* this size of the char constant (obviously), No, what it knows (and reports) is type information. There are a small number of datatypes where you can infer a maximum width from knowledge of the datatype. There are many others where you can't set an upper bound from this knowledge --- at least not a usefully tight one. Anyway, if we were to cast those constants to something other than unknown, it would be text, not varchar, and you'd still have the same issue. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, June 11, 2007 5:32 PM > To: Larry McGhaw > Cc: Alvaro Herrera; Dann Corbit; Gregory Stark; Martijn van Oosterhout; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > "Larry McGhaw" <lmcghaw@connx.com> writes: > > I think perhaps we have lost sight of the main issue: > > 1) libpq can properly describe the maximum internal data size of any > > numeric or char column in a table via Pqfsize > > 2) libpq can properly describe the maximum internal data size of any > > varchar column via Pqfmod > > 3) libpq can properly describe the maximum internal data size of any > > numeric constant in a SQL statement via Pqfsize > > None of the above statements are actually true, at least not when you > take off your blinders and note the existence of unconstrained-width > numeric and text columns. Unconstrained width columns are not what are being discussed here. It is constant expressions of known width. > > The database *knows* this size of the char constant (obviously), > > No, what it knows (and reports) is type information. There are a small > number of datatypes where you can infer a maximum width from knowledge > of the datatype. There are many others where you can't set an upper > bound from this knowledge --- at least not a usefully tight one. If you do not know how large 1::numeric is, then how can you know whether it is safe or not to insert it into a column of type numeric(12,4)? If you do not know how large 'Joe'::varchar is, then how can you know whether it is safe to insert it into a column of type varchar(256)? Clearly, neither of these operations will cause any problems and so the size of a constant can be determined. > Anyway, if we were to cast those constants to something other than > unknown, it would be text, not varchar, and you'd still have the same > issue. Other database systems can manage this, and the programmers of those database systems are not smarter than the programmers of the PostgreSQL group. Therefore I can conclude that if the PostgreSQL group decides it is important, then they can figure out the size of a string or numeric constant.
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Mon 6/11/2007 5:32 PM
To: Larry McGhaw
Cc: Alvaro Herrera; Dann Corbit; Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Selecting a constant question
"Larry McGhaw" <lmcghaw@connx.com> writes:
> I think perhaps we have lost sight of the main issue:
> 1) libpq can properly describe the maximum internal data size of any
> numeric or char column in a table via Pqfsize
> 2) libpq can properly describe the maximum internal data size of any
> varchar column via Pqfmod
> 3) libpq can properly describe the maximum internal data size of any
> numeric constant in a SQL statement via Pqfsize
None of the above statements are actually true, at least not when you
take off your blinders and note the existence of unconstrained-width
numeric and text columns.
> The database *knows* this size of the char constant (obviously),
No, what it knows (and reports) is type information. There are a small
number of datatypes where you can infer a maximum width from knowledge
of the datatype. There are many others where you can't set an upper
bound from this knowledge --- at least not a usefully tight one.
Anyway, if we were to cast those constants to something other than
unknown, it would be text, not varchar, and you'd still have the same
issue.
regards, tom lane
Ühel kenal päeval, E, 2007-06-11 kell 13:38, kirjutas Dann Corbit: > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: Monday, June 11, 2007 1:32 PM > > To: Dann Corbit > > Cc: Gregory Stark; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Selecting a constant question ... > > You should be treating typlen as signed not unsigned, and not assuming > a > > fixed width for any negative value. > > > > Since the width refers to the server internal representation, and not > to > > what comes down the wire, I find it pretty strange for an application > to > > be using typlen for anything at all actually. > > Thanks for the response. > > Since libpq function PQfsize returns -2 for all constant character > strings in SQL statements ... What is the proper procedure to determine > the length of a constant character column after query execution but > before fetching the first row of data? Why not just get the first row and determine the width from it before you actually use any of tha data ? ---------- Hannu
Ühel kenal päeval, E, 2007-06-11 kell 22:11, kirjutas Larry McGhaw: > As far as I am aware these statements are true. If you have a > specific example you could provide to the contrary that would be > interesting. > > Even if there are such conditions it does not change the fact that > libpq and/or postgresql is deficient in this area. > > For any query, the database should be capable of describing the > metadata for the columns, which includes > 1) the column type > and > 2) the column maximum length. > > This is such a basic database interface principle that I very > disappointed that someone has not recognized this and simply said " > yes, we see the issue we will work on it". > > Again, *all* other major relational databases do this ... even blob > fields have a maximum length reported from the database. > > I hope someone who truly understands database interfaces will read > this thread and address the issue. > For now we will have to "special case" postgres in our application > until it is addressed. > or redesign your application so that it allocates memory as needed and won't waste client memory by allocating maximum possible amount for each and every grid cell weather needed or not ;) As I understand from this discussion you are writing some kind of middleware (i.e. tools), and I'd expect toolmakers to do the right thing. allocating as much as possibly ever needed is something that would be excusable in quick-n-dirty end user application, but not in a tool. ---------------- Hannu
> -----Original Message----- > From: Hannu Krosing [mailto:hannu@skype.net] > Sent: Monday, June 11, 2007 8:42 PM > To: Dann Corbit > Cc: Tom Lane; Gregory Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > Ühel kenal päeval, E, 2007-06-11 kell 13:38, kirjutas Dann Corbit: > > > -----Original Message----- > > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > > Sent: Monday, June 11, 2007 1:32 PM > > > To: Dann Corbit > > > Cc: Gregory Stark; pgsql-hackers@postgresql.org > > > Subject: Re: [HACKERS] Selecting a constant question > ... > > > You should be treating typlen as signed not unsigned, and not assuming > > a > > > fixed width for any negative value. > > > > > > Since the width refers to the server internal representation, and not > > to > > > what comes down the wire, I find it pretty strange for an application > > to > > > be using typlen for anything at all actually. > > > > Thanks for the response. > > > > Since libpq function PQfsize returns -2 for all constant character > > strings in SQL statements ... What is the proper procedure to determine > > the length of a constant character column after query execution but > > before fetching the first row of data? > > Why not just get the first row and determine the width from it before > you actually use any of tha data ? What if the second row is 1000x longer?
> -----Original Message----- > From: Hannu Krosing [mailto:hannu@skype.net] > Sent: Monday, June 11, 2007 10:43 PM > To: Larry McGhaw > Cc: Tom Lane; Alvaro Herrera; Dann Corbit; Gregory Stark; Martijn van > Oosterhout; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Selecting a constant question > > Ühel kenal päeval, E, 2007-06-11 kell 22:11, kirjutas Larry McGhaw: > > As far as I am aware these statements are true. If you have a > > specific example you could provide to the contrary that would be > > interesting. > > > > Even if there are such conditions it does not change the fact that > > libpq and/or postgresql is deficient in this area. > > > > For any query, the database should be capable of describing the > > metadata for the columns, which includes > > 1) the column type > > and > > 2) the column maximum length. > > > > This is such a basic database interface principle that I very > > disappointed that someone has not recognized this and simply said " > > yes, we see the issue we will work on it". > > > > Again, *all* other major relational databases do this ... even blob > > fields have a maximum length reported from the database. > > > > I hope someone who truly understands database interfaces will read > > this thread and address the issue. > > For now we will have to "special case" postgres in our application > > until it is addressed. > > > > or redesign your application so that it allocates memory as needed and > won't waste client memory by allocating maximum possible amount for each > and every grid cell weather needed or not ;) > > As I understand from this discussion you are writing some kind of > middleware (i.e. tools), and I'd expect toolmakers to do the right > thing. In this case the middleware is: ODBC/JDBC/OLEDB/.NET data drivers for PostgreSQL. There are other related tools, but the above is the product for which the bug needs corrected. > allocating as much as possibly ever needed is something that would be > excusable in quick-n-dirty end user application, but not in a tool. It's a requirement of the ODBC/JDBC/OLEDB/.NET specifications. I suppose we could scan the table twice to figure out howlarge a column might be, but that would make the PostgreSQL driver run at 1/2 speed. Not a very appetizing solution. None of the other database vendors has any trouble reporting this information correctly.
Ühel kenal päeval, E, 2007-06-11 kell 22:55, kirjutas Dann Corbit: > > -----Original Message----- ... > > > I hope someone who truly understands database interfaces will read > > > this thread and address the issue. > > > For now we will have to "special case" postgres in our application > > > until it is addressed. > > > > > > > or redesign your application so that it allocates memory as needed and > > won't waste client memory by allocating maximum possible amount for each > > and every grid cell weather needed or not ;) > > > > As I understand from this discussion you are writing some kind of > > middleware (i.e. tools), and I'd expect toolmakers to do the right > > thing. > > In this case the middleware is: > ODBC/JDBC/OLEDB/.NET data drivers for PostgreSQL. > > There are other related tools, but the above is the product for which the bug needs corrected. You mean you use some kind of "Grid" inside JDBC/.NET drivers , and it needs to know max size for a column ? can't you replace it with a dynamically allocated Grid component, which would also work well for other expressions, not just constants ? > > allocating as much as possibly ever needed is something that would be > > excusable in quick-n-dirty end user application, but not in a tool. > > It's a requirement of the ODBC/JDBC/OLEDB/.NET specifications. Is that a requirement only for "constants" or for any expression, say "SELECT substring(reallybigblob, 1, random(1000000)) from somebigtable" ? > I suppose we could scan the > table twice to figure out how large a column might be, but that would make the PostgreSQL > driver run at 1/2 speed. Not a very appetizing solution. by scanninc twice you find out how big the largest column _is_, not might be . > None of the other database vendors has any trouble reporting this information correctly. By "this information" you mean the max possible size of data returned by and expression ? ----------------- Hannu
From: Hannu Krosing [mailto:hannu@skype.net]
Sent: Mon 6/11/2007 10:43 PM
To: Larry McGhaw
Cc: Tom Lane; Alvaro Herrera; Dann Corbit; Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Selecting a constant question
Ühel kenal päeval, E, 2007-06-11 kell 22:11, kirjutas Larry McGhaw:
> As far as I am aware these statements are true. If you have a
> specific example you could provide to the contrary that would be
> interesting.
>
> Even if there are such conditions it does not change the fact that
> libpq and/or postgresql is deficient in this area.
>
> For any query, the database should be capable of describing the
> metadata for the columns, which includes
> 1) the column type
> and
> 2) the column maximum length.
>
> This is such a basic database interface principle that I very
> disappointed that someone has not recognized this and simply said "
> yes, we see the issue we will work on it".
>
> Again, *all* other major relational databases do this ... even blob
> fields have a maximum length reported from the database.
>
> I hope someone who truly understands database interfaces will read
> this thread and address the issue.
> For now we will have to "special case" postgres in our application
> until it is addressed.
>
or redesign your application so that it allocates memory as needed and
won't waste client memory by allocating maximum possible amount for each
and every grid cell weather needed or not ;)
As I understand from this discussion you are writing some kind of
middleware (i.e. tools), and I'd expect toolmakers to do the right
thing.
allocating as much as possibly ever needed is something that would be
excusable in quick-n-dirty end user application, but not in a tool.
----------------
Hannu
Larry McGhaw wrote: > Again, *all* other major relational databases do this ... even blob fields have a maximum length reported from the database. So what are you doing with the max length? Not all data types and values have a meaningful max length, so you have to be able to deal with variable length data anyway. For blobs, exactly what max length would you like to get; 1GB? 1TB? Why, what good is that for? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> > Again, *all* other major relational databases do this ... > even blob fields have a maximum length reported from the database. > > So what are you doing with the max length? Not all data types > and values have a meaningful max length, so you have to be > able to deal with variable length data anyway. Imho it has a lot to do with optimizing the interface. If you know, that the max length is e.g. 16 bytes in UTF-8 for the 3 chars, you will probably not want any on the fly allocation smarts and preallocate and bind those 16 bytes. When the max length value gets larger, and it is a variable lenght type, the overhead of varlen allocation smarts starts to pay off. A generic interface should keep the sql parsing smarts at a minimum, thus it cannot know that a returned column is actually a text constant. Imho the request for a max length is very reasonable, but has no value once it exceeds a certain size e.g. 64k. Andreas
"Larry McGhaw" <lmcghaw@connx.com> writes: > The database *knows* this size of the char constant (obviously), and > should report the size via a metadata call, as all other relational > databases do. I'm not even clear whether you and Dan are talking about the same thing. He's talking about the number of bytes required hold the constant. You seem to be talking about the character length of strings. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Dann Corbit wrote: >> -----Original Message----- >> From: Hannu Krosing [mailto:hannu@skype.net] >>> Since libpq function PQfsize returns -2 for all constant character >>> strings in SQL statements ... What is the proper procedure to determine >>> the length of a constant character column after query execution but >>> before fetching the first row of data? >> Why not just get the first row and determine the width from it before >> you actually use any of tha data ? > > What if the second row is 1000x longer? Thats exactly the point. Consider select mytext from mytable ; How can PostgreSQL possibly know the maximum length of the returned values *before* it has scanned the whole table? greetings, Florian Pflug
> Thats exactly the point. Consider > select mytext from mytable ; > > How can PostgreSQL possibly know the maximum length of the > returned values *before* it has scanned the whole table? I think this focuses too much on those cases where it is not possible. When it is not feasible like with a text column, clients deal with it already (obviously some better than others). It is for those cases where it would be feasible, like constants (or concateneted columns), where the max length if properly returned could be used to improve performance. Andreas
Zeugswetter Andreas ADI SD wrote: >> Thats exactly the point. Consider >> select mytext from mytable ; >> >> How can PostgreSQL possibly know the maximum length of the >> returned values *before* it has scanned the whole table? > > I think this focuses too much on those cases where it is not possible. > When it is not feasible like with a text column, clients deal with it > already (obviously some better than others). > It is for those cases where it would be feasible, like constants (or > concateneted columns), where the max length if properly returned could > be used to improve performance. I doubt there's any measurable performance benefit here. You might as well allocate a buffer of say 128 bytes, and enlarge it from there when you see a value larger than that. Even in the worst case, you'll only need to enlarge the buffer a few times per query until you reach the real max length. Actually, if you're in such a high throughput, client-side CPU-intensive situation that this makes any difference, why areyou copying the value to another buffer in the first place? Just access it directly in the libpq buffer returned by PQgetvalue, and move on. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Actually, if you're in such a high throughput, client-side CPU-intensive > situation that this makes any difference, why are you copying the value > to another buffer in the first place? Just access it directly in the > libpq buffer returned by PQgetvalue, and move on. That's a *very* good point. The original design for the pgAdmin query tool made it's own copy of the data to display in the grid which is exactly why we used to get complaints about having a query time and a display time. The modern versions use a virtual table which enables the grid to retrieve the data directly from the libpq buffer when it needs to draw each cell which has effectively eliminated that display time. Regards, Dave.
On Tue, Jun 12, 2007 at 12:47:55PM +0200, Zeugswetter Andreas ADI SD wrote: > I think this focuses too much on those cases where it is not possible. > When it is not feasible like with a text column, clients deal with it > already (obviously some better than others). > It is for those cases where it would be feasible, like constants (or > concateneted columns), where the max length if properly returned could > be used to improve performance. For constants there is a basic problem that Postgres, if at all possible, doesn't even analyse the string at all. If it's not part of a join or sort, then in every likelyhood it's passed through the entire execution untouched and comes out the other end as type unknown. The length indicator of -2 indicates a null-terminated string, postgres never even bothered calculating the length of it. For the situation of concatinating varchar columns, it's a fairly special case. The typmod, in the *special case* of varchar is the maximum length, but for other types it means something else. Additionally, the planner doesn't know that || is concatination, a consequence of the user-defined operators. So to make this work you need to change the planner so that: 1. It special cases varchar to know what the typmod means 2. It special cases the || operator to add the typmods together. 3. Has to take special care not to break user-defined operators All a pile of hacks and special cases to handle something that, to be honest, the vast majority of people never notice. So no, no patch is going to be accepted to handle this special case, because it's far too hacky for a corner case. On the other hand, if you can piggyback it into something like the "user-defined typmod" stuff, it may have a better chance, though I really think the first problem is basically "won't fix" from an optimisation point of view. Hope this clarifies things a bit, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Larry McGhaw wrote: > Again, the issue is not our tool, but the deficiency in libpq/postgres > ... even mysql gets its right .. why not Postgres? > > Its not hard for a database to report metadata properly. > > if I issue a sql statement: > select '123' from <any table> > the database should report that the maximum length of the 1st column > in the resultset is 3 ... it cant be any more plain than that. > > > Making assertions like this does not make your case for you. If you think it's that easy then send in a patch. I suspect that doing what you want in the cases where it could be supported would require a protocol change, with possibly an extra field in the RowDescription object. If that's true you'd need to make a very good and compelling case indeed for such a change. If this is so vital I'm curious to know why driver authors haven't been screaming about it until now. I'm not dismissing what you want, but just waving your hand and saying "it's not hard" really won't do. cheers andrew
I'm really frustrated by this process I'm not trying to attack anyone here. I'm just surprised that no one will even entertain the idea that this is an issue that needs to be addressed. Instead nearly all of the responses have been attacking the applications that rely on the metadata. Let me back up and explain the situation. This issue came to light for us when we were using a query tool to examine performance of postgres queries. We were not only measuring the performance of the database itself, but also the TCP/IP transport, And the rendering of the data .. Comparing SQL Server, Oracle, and Postgres head to head with the same queries. We noticed inexplicably that when we used a constant with a postgres query, our records per second dropped From 60,000 records per second to 600 records per second, so we started digging into the issue. We discovered that libpq was not describing the metadata properly for the constant column, and it appears That the 3rd party grid control was relying on that metadata somehow .. The bottom line is that there was A huge performance drag. * OK ... I agree that the memory handling in the grid control could be better, but I would imagine that this issue is not an isolated to this one particular control, and that other applications and controls that rely on resultset metadata may have this issue. Bottom line, we only reported this problem because we thought you would be interested in doing everything possible to make postgres more mainstream and conform to SQL standards. In the past such suggestions have been absorbed with zeal. I have no vested interest in you improving the interface or not, and I'm not going to "plead a case" for you To do something that every other commercial database has done out of the box. It is in your hands now :) Thanks lm -----Original Message----- From: Andrew Dunstan [mailto:andrew@dunslane.net] Sent: Tuesday, June 12, 2007 4:50 AM To: Larry McGhaw Cc: Hannu Krosing; Tom Lane; Alvaro Herrera; Dann Corbit; Gregory Stark; Martijn van Oosterhout; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Selecting a constant question Larry McGhaw wrote: > Again, the issue is not our tool, but the deficiency in libpq/postgres > ... even mysql gets its right .. why not Postgres? > > Its not hard for a database to report metadata properly. > > if I issue a sql statement: > select '123' from <any table> > the database should report that the maximum length of the 1st column > in the resultset is 3 ... it cant be any more plain than that. > > > Making assertions like this does not make your case for you. If you think it's that easy then send in a patch. I suspect that doing what you want in the cases where it could be supported would require a protocol change, with possibly an extra field in the RowDescription object. If that's true you'd need to make a very good and compelling case indeed for such a change. If this is so vital I'm curious to know why driver authors haven't been screaming about it until now. I'm not dismissing what you want, but just waving your hand and saying "it's not hard" really won't do. cheers andrew
Hi, Nobody is tring to attack anyone, but we're all surprised this is an issue since you're the first person to have mentioned it. I have some a query to test below: On Tue, Jun 12, 2007 at 10:21:09AM -0700, Larry McGhaw wrote: > We noticed inexplicably that when we used a constant with a postgres > query, our records per second dropped > From 60,000 records per second to 600 records per second, so we started > digging into the issue. > > We discovered that libpq was not describing the metadata properly for > the constant column, and it appears > That the 3rd party grid control was relying on that metadata somehow .. > The bottom line is that there was > A huge performance drag. What I don't understand is *why* it's complaining about the constant column and not, for example, any other variable length column. There are a very small number of cases where a useful length is returned, 99% of the time it doesn't, yet you're obviously not get any performance problems there. Just a quick test, does the problem go away if you do: SELECT '1'::varchar FROM table; If that fixes it then the bug is (probably) that the middleware thinks that a length of -2 means it's 65534 bytes long. Note, in the test query I gave, it will return -1 for the length. I don't want to blame the middleware, but I want to make sure we're diagnosing the problem correctly. If that query has the same problem, then we really need to think of something else. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Larry McGhaw wrote: >I'm really frustrated by this process I'm not trying to attack anyone >here. I'm just surprised that no one will even entertain the idea that >this is an issue that needs to be addressed. > >Instead nearly all of the responses have been attacking the applications >that rely on the metadata. > > Having been following this debate, I think what people have really been attacking is the idea that the metadata for: SELECT '1' AS varchar_column; should be different from the metadata for: SELECT varchar_column FROM really_big_table; or for: SELECT varchar_column FROM really_small_table; Or at least that's what I've taken away from the dicussion- it's not so much that the metadata shouldn't be relied on, it's that the metadata may be more generic than theoretically necessary. And that the metadata may not contain the length of a variable length field even when that length is known. Brian
>> What I don't understand is *why* it's complaining about the constant column >> and not, for example, any other variable length column. There are a very small >> number of cases where a useful length is returned, 99% of the time it doesn't, >> yet you're obviously not get any performance problems there. The statement above is contrary to my actual results. The proper length is returned in all non-const cases. Here is a specific example: test=# create table test1 ( a varchar(20), b char(10), c integer ); CREATE TABLE test=# Note .. The table is empty, and contains no data at this point: Select a, b, c, '123' , '123'::char(3), '123'::varchar(3) from test1 For column a libpq returns the following: Pqfsize returns -1 Pqfmod (-4) returns 20 For column b libpq returns the following: Pqfsize returns -1 Pqfmod (-4) returns 10 For column c libpq returns the following: Pqfsize returns 4 For constant '123' libpq returns the following: Pqfsize returns -2 Pqfmod returns -1 For constant '123'::char(3) libpq returns the following: Pqfsize returns -1 Pqfmod (-4) returns 3 For constant '123'::varchar(3) libpq returns the following: Pqfsize returns -1 Pqfmod returns -1 Thanks lm -----Original Message----- From: Brian Hurt [mailto:bhurt@janestcapital.com] Sent: Tuesday, June 12, 2007 11:09 AM To: Larry McGhaw Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Selecting a constant question Larry McGhaw wrote: >I'm really frustrated by this process I'm not trying to attack anyone >here. I'm just surprised that no one will even entertain the idea that >this is an issue that needs to be addressed. > >Instead nearly all of the responses have been attacking the >applications that rely on the metadata. > > Having been following this debate, I think what people have really been attacking is the idea that the metadata for: SELECT '1' AS varchar_column; should be different from the metadata for: SELECT varchar_column FROM really_big_table; or for: SELECT varchar_column FROM really_small_table; Or at least that's what I've taken away from the dicussion- it's not so much that the metadata shouldn't be relied on, it's that the metadata may be more generic than theoretically necessary. And that the metadata may not contain the length of a variable length field even when that length is known. Brian
Larry McGhaw wrote: > For constant '123'::varchar(3) libpq returns the following: > Pqfsize returns -1 > Pqfmod returns -1 > > That one certainly looks odd. cheers andrew
That one surprised me as well. Thanks lm -----Original Message----- From: Andrew Dunstan [mailto:andrew@dunslane.net] Sent: Tuesday, June 12, 2007 12:00 PM To: Larry McGhaw Cc: Brian Hurt; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Selecting a constant question Larry McGhaw wrote: > For constant '123'::varchar(3) libpq returns the following: > Pqfsize returns -1 > Pqfmod returns -1 > > That one certainly looks odd. cheers andrew
"Larry McGhaw" <lmcghaw@connx.com> writes: > The statement above is contrary to my actual results. The proper length > is returned in all non-const cases. > > Here is a specific example: > > test=# create table test1 ( a varchar(20), b char(10), c integer ); > CREATE TABLE It's not returning a length at all though. It's returning the typmod, ie, the thing in the parentheses above. In that respect it's perfectly correct to return -1 for the '123' case as well since it's interpreted as an unbounded string and has no maximum length. It happens to only be three characters but then the values in the table could happen to be much less than the 10 or 20 characters you declared them as. The reason you might want to get this has more to do with understanding the semantics of the data you're receiving than optimizing storage. If you queried a Numeric column you would get something very different from the length from which you could extract the maximum precision and scale. This might help you display or work with the results maintaining the precision and scale a user expects. One reason why it might be useful to add an actual measure of the expected length (Postgres does make guesses about the length for planning purposes) would be to so a driver could size buffers appropriately. For example, in psql where we use cursors to process rows, we might want to automatically use a fetch count calculated to be large enough to receive approximately one ethernet frame of data. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
For what its worth .. Your statement about why we are the first people to mention this problem really got me thinking. Anyone who would attempt to write an ODBC driver for Postgres would run into the exact same issue. So I installed the official Postgres ODBC driver, and ran the identical query and here are my results: I probably should have looked at this first .... There is a whole Postgres ODBC dialog dedicated to the very subject of this thread: Handling of "unknown" data sizes. The pgodbc driver is configured to treat unknowns as varchar(255) by default, As shown by my example below. This can be configured up or down as desired. SQLExecDirect: In: hstmt = 0x003C18E0, szSqlStr = "Select a,b,c, '123' , '123'::char(3), '123'::varchar(3) from...", cbSqlStr = -3 Return: SQL_SUCCESS=0 Describe Column All: icol, szColName, *pcbColName, *pfSqlType, *pcbColDef, *pibScale, *pfNullable 1, a, 1, SQL_VARCHAR=12, 20, 0, SQL_NULLABLE=1 2, b, 1, SQL_CHAR=1, 10, 0, SQL_NULLABLE=1 3, c, 1, SQL_INTEGER=4, 10, 0, SQL_NULLABLE=1 4, ?column?, 8, SQL_VARCHAR=12, 255, 0, SQL_NULLABLE=1 5, bpchar, 6, SQL_CHAR=1, 3, 0, SQL_NULLABLE=1 6, varchar, 7, SQL_VARCHAR=12, 255, 0, SQL_NULLABLE=1 From psqlodbc.h #define MAX_VARCHAR_SIZE 255 /* default maximum size of * varchar fields (not including null term) */ So I guess the bottom line is that we are not the first to encounter this problem .. Its just been covered up by assigning An arbitrary maximum size .. So I guess we will do the same and make it configurable like the official postgres driver. Thanks lm -----Original Message----- From: Martijn van Oosterhout [mailto:kleptog@svana.org] Sent: Tuesday, June 12, 2007 10:43 AM To: Larry McGhaw Cc: Andrew Dunstan; Hannu Krosing; Tom Lane; Alvaro Herrera; Dann Corbit; Gregory Stark; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Selecting a constant question Hi, Nobody is tring to attack anyone, but we're all surprised this is an issue since you're the first person to have mentioned it. I have some a query to test below: On Tue, Jun 12, 2007 at 10:21:09AM -0700, Larry McGhaw wrote: > We noticed inexplicably that when we used a constant with a postgres > query, our records per second dropped From 60,000 records per second > to 600 records per second, so we started digging into the issue. > > We discovered that libpq was not describing the metadata properly for > the constant column, and it appears That the 3rd party grid control > was relying on that metadata somehow .. > The bottom line is that there was > A huge performance drag. What I don't understand is *why* it's complaining about the constant column and not, for example, any other variable length column. There are a very small number of cases where a useful length is returned, 99% of the time it doesn't, yet you're obviously not get any performance problems there. Just a quick test, does the problem go away if you do: SELECT '1'::varchar FROM table; If that fixes it then the bug is (probably) that the middleware thinks that a length of -2 means it's 65534 bytes long. Note, in the test query I gave, it will return -1 for the length. I don't want to blame the middleware, but I want to make sure we're diagnosing the problem correctly. If that query has the same problem, then we really need to think of something else. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
> -----Original Message----- > From: Larry McGhaw > Sent: Tuesday, June 12, 2007 1:40 PM > To: Martijn van Oosterhout > Cc: Andrew Dunstan; Hannu Krosing; Tom Lane; Alvaro Herrera; Dann Corbit; > Gregory Stark; pgsql-hackers@postgresql.org > Subject: RE: [HACKERS] Selecting a constant question > > For what its worth .. Your statement about why we are the first people to > mention this problem really got me thinking. Anyone who would attempt to > write an ODBC driver for Postgres would run into the exact same issue. > So I installed the official Postgres ODBC driver, and ran the identical > query and here are my results: > > I probably should have looked at this first .... There is a whole Postgres > ODBC dialog dedicated to the very subject of this thread: > Handling of "unknown" data sizes. The pgodbc driver is configured to > treat unknowns as varchar(255) by default, > As shown by my example below. This can be configured up or down as > desired. > > SQLExecDirect: > In: hstmt = 0x003C18E0, szSqlStr = "Select a,b,c, '123' , '123'::char(3), > '123'::varchar(3) from...", cbSqlStr = -3 > Return: SQL_SUCCESS=0 > > Describe Column All: > icol, szColName, *pcbColName, *pfSqlType, *pcbColDef, *pibScale, > *pfNullable > 1, a, 1, SQL_VARCHAR=12, 20, 0, SQL_NULLABLE=1 > 2, b, 1, SQL_CHAR=1, 10, 0, SQL_NULLABLE=1 > 3, c, 1, SQL_INTEGER=4, 10, 0, SQL_NULLABLE=1 > 4, ?column?, 8, SQL_VARCHAR=12, 255, 0, SQL_NULLABLE=1 > 5, bpchar, 6, SQL_CHAR=1, 3, 0, SQL_NULLABLE=1 > 6, varchar, 7, SQL_VARCHAR=12, 255, 0, SQL_NULLABLE=1 > > From psqlodbc.h > > #define MAX_VARCHAR_SIZE 255 /* default maximum size of > * varchar fields (not including null > term) */ > > So I guess the bottom line is that we are not the first to encounter this > problem .. Its just been covered up by assigning > An arbitrary maximum size .. So I guess we will do the same and make it > configurable like the official postgres driver. Of course, the downside here is that choosing a default will truncate the data when the actual data is larger than the default chosen.
Ühel kenal päeval, T, 2007-06-12 kell 13:40, kirjutas Larry McGhaw: > For what its worth .. Your statement about why we are the first people > to mention this problem really got me thinking. Anyone who would > attempt to write an ODBC driver for Postgres would run into the exact > same issue. So I installed the official Postgres ODBC driver, and ran > the identical query and here are my results: > > I probably should have looked at this first .... There is a whole > Postgres ODBC dialog dedicated to the very subject of this thread: > Handling of "unknown" data sizes. The pgodbc driver is configured to > treat unknowns as varchar(255) by default, > As shown by my example below. This can be configured up or down as > desired. BTW, what is the reason you are writing your own ODBC driver ? What problems in the official one are you trying to solve ? -------------- Hannu
Hannu Krosing wrote: > Ühel kenal päeval, T, 2007-06-12 kell 13:40, kirjutas Larry McGhaw: >> For what its worth .. Your statement about why we are the first people >> to mention this problem really got me thinking. Anyone who would >> attempt to write an ODBC driver for Postgres would run into the exact >> same issue. So I installed the official Postgres ODBC driver, and ran >> the identical query and here are my results: >> >> I probably should have looked at this first .... There is a whole >> Postgres ODBC dialog dedicated to the very subject of this thread: >> Handling of "unknown" data sizes. The pgodbc driver is configured to >> treat unknowns as varchar(255) by default, >> As shown by my example below. This can be configured up or down as >> desired. > > BTW, what is the reason you are writing your own ODBC driver ? They aren't I don't think. I think they are using the ODBC driver as an example. Joshua D. Drake > > What problems in the official one are you trying to solve ? > > -------------- > Hannu > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/