Thread: Performance differences using varchar, char and text
In the Postgres 7.3.3. User Guide section 5.3, it says that there's no performance difference between the three type. But in 'PostgreSQL Database Performance Tuning' by Jean-Paul ARGUDO (section 5: use correct datatype), it says there is a performance difference because of you might need more I/O to read the data (which makes sense). So, is there a performance difference?
Yusuf wrote: > In the Postgres 7.3.3. User Guide section 5.3, it says that there's no > performance difference between the three type. But in 'PostgreSQL > Database Performance Tuning' by Jean-Paul ARGUDO (section 5: use correct > datatype), it says there is a performance difference because of you > might need more I/O to read the data (which makes sense). > > So, is there a performance difference? Yusuf, Here is what I noticed - I am using Postgres 7.3.2 (on a Sun box, not sure what OS version) with the latest ODBC driver on a Windows XP Pro box. I am in the process of conversion of a legacy VB app from using Access 97 to PostgreSQL via an ODBC connection (DSN-less). I had in my Access DB several tables which utilized Memo-type fields to store data in a "multi-value" delimited format (will never do that again). I tried to first convert them to TEXT type fields on the PostgreSQL tables. These didn't work right (more on that later), so I converted them to large VARCHAR field (ie, VARCHAR(100000), and larger, in some cases). I noticed when doing selects (via the ODBC driver) that any accesses to these fields caused MASSIVE slowdowns on the select - whether I was selecting for them, or if the field was part of the WHERE clause of the SQL statement. I found that if I decreased the size of the field, the speed would increase. One thing I did note was that in using psql, selects on these fields didn't matter - they seemed fast. I tend to think it was the ODBC driver in some manner. Later, I determined that the reason I was having troubles with the TEXT fields was because I was using DAO. Switching the code to use ADO calls instead fixed the issue, and I switched to using the TEXT type on the fields. However, by then I had installed many "workarounds" to avoid those fields as much as possible in my code that I don't know if the selects on them would be the same, faster, or slower... Hope this helps a little... Andrew -- CONFIDENTIALITY NOTICE -- This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy,disclose or distribute to anyone the message or any information contained in the message. If you have received thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.
Andrew Ayers <aayers@eldocomp.com> writes: > I had in my Access DB several tables which utilized Memo-type fields to > store data in a "multi-value" delimited format (will never do that > again). I tried to first convert them to TEXT type fields on the > PostgreSQL tables. These didn't work right (more on that later), so I > converted them to large VARCHAR field (ie, VARCHAR(100000), and larger, > in some cases). > I noticed when doing selects (via the ODBC driver) that any accesses to > these fields caused MASSIVE slowdowns on the select - whether I was > selecting for them, or if the field was part of the WHERE clause of the > SQL statement. I found that if I decreased the size of the field, the > speed would increase. I believe what you're reporting here is problems on the Access side, not problems in the underlying database. (That doesn't make them any less of a real problem if you're using Access, of course.) Access doesn't work very well with datatypes that aren't found in MS SQL Server... I think if you look in the pgsql-odbc list archives you will find some discussion of workarounds for Access with TEXT fields. As far as the original question goes: there is no reason within Postgres to choose one of these three types on performance grounds; you should make the choice based on the semantics you want. Do you really want every value blank-padded to exactly N characters? Use char(N). If you don't want padding, but do want a specific upper limit on the field width, use varchar(N). If you haven't got any specific upper limit in mind (and if you're putting in numbers like 100000 then you don't ;-)) then use text. The performance differences that exist come directly from the cycles expended to add padding blanks, check that the width limit is not exceeded, etc. regards, tom lane
Tom Lane wrote: > I believe what you're reporting here is problems on the Access side, not > problems in the underlying database. (That doesn't make them any less > of a real problem if you're using Access, of course.) Access doesn't > work very well with datatypes that aren't found in MS SQL Server... No - I had converted an Access 97 DB over to Postgres, and my VB app was updating the PG database via the ODBC driver. My application is written in VB (version 6, Enterprise Edition) - it isn't a case of an Access/VB for Applications front-end to PG backend. > I think if you look in the pgsql-odbc list archives you will find some > discussion of workarounds for Access with TEXT fields. I believe my problems extended from using DAO, instead of ADO - once I switched to using ADO, my problems went away (of course, other problems cropped up, but that was due to code differences between DAO and ADO usage). I could use TEXT datatypes on the PG database/tables without problems. Honestly, most of my issues was my bad decision to "bastardize" relational DBs - I have a lot of background in PICK/MultiValue databases, where you can store and retrieve from data-cubes, and I wanted this functionality for my application. I have since learned that this approach is far from correct (and I probably have Codd or Cobb or whatever his name is spinning in his new grave - RIP). In the process of converting to PG, a few of the tables I *had* to normalize and split into proper relational tables in order for it to work properly and quickly when I was using VARCHAR fields - so some good has come out of this. > As far as the original question goes: there is no reason within Postgres > to choose one of these three types on performance grounds; you should > make the choice based on the semantics you want. Do you really want > every value blank-padded to exactly N characters? Use char(N). If you > don't want padding, but do want a specific upper limit on the field > width, use varchar(N). If you haven't got any specific upper limit in > mind (and if you're putting in numbers like 100000 then you don't ;-)) > then use text. The performance differences that exist come directly > from the cycles expended to add padding blanks, check that the width > limit is not exceeded, etc. I agree in practice, when you are creating and maintaining a proper relational database, and accessing it properly. However, because of DAO I was hitting a big problem on updates of TEXT fields, which I didn't have in regards to VARCHAR, or any other datatype field in PG - only the TEXT datatype. Furthermore as I have noted, many of these large fields were being used in a manner inconsistant with the rules of relational database structure. I shouldn't have done that. I first thought it was the ODBC driver, but after much discussion and such, I finally tried using ADO instead of DAO, and the update issue with TEXT fields went away. I don't know if the performance issue is still there, though. I kinda suspect it is. However, I have since coded around it (when I was using VARCHAR fields), so the problem isn't a problem for me anymore (I had fields that were very large holding descriptions, and I wanted to do a "keyword" LIKE search on them - I ended up with a different strategy, which keeps me away from TEXT and large VARCHAR fields, and have a keyword table for allowances of lexical stats and such now). Andrew -- CONFIDENTIALITY NOTICE -- This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy,disclose or distribute to anyone the message or any information contained in the message. If you have received thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.