Thread: A simple question about text fields
Are there performance issues with the choice of 'text' vs. varchar and some character limit? For instance, if I have a tablewith ten million records and text fields that may range in length from 15 to 150, can I expect a measurable improvementin response time for using varchar(150) or will text do just or nearly as well. If the latter is the case, using text across the board is a simpler choice
Martin Mueller <martinmueller@northwestern.edu> writes: > Are there performance issues with the choice of 'text' vs. varchar and some character limit? For instance, if I have atable with ten million records and text fields that may range in length from 15 to 150, can I expect a measurable improvementin response time for using varchar(150) or will text do just or nearly as well. There is no situation where varchar outperforms text in Postgres. If you need to apply a length constraint for application semantic reasons, do so ... otherwise, text is the native type. It's useful to think of varchar as being a domain over text, though for various reasons it's not implemented quite that way. regards, tom lane
On 17 Jun 2021, at 1:08, Tom Lane wrote: > Martin Mueller <martinmueller@northwestern.edu> writes: > >> Are there performance issues with the choice of 'text' vs. varchar and some character limit? For instance, if I havea table with ten million records and text fields that may range in length from 15 to 150, can I expect a measurable improvementin response time for using varchar(150) or will text do just or nearly as well. > > There is no situation where varchar outperforms text in Postgres. > If you need to apply a length constraint for application semantic > reasons, do so ... otherwise, text is the native type. It's > useful to think of varchar as being a domain over text, though > for various reasons it's not implemented quite that way. > This reminds of my days converting from MySQL to PostgreSQL. MySQL, along with other databases, seemed to have a strong preferencefor setting a length on character strings. And all this from before the advent of UTF encoding which has made theconcept of string ‘length’ very messy. Database guru and SQL author Joe Celko asserts in his ’SQL for Smarties’ that if he finds a text field without a length limithe will input the Heart Sutra (presumably in ASCII :) to demonstrate the design error. (Of course he is ignoring thepotential for this input to help the database achieve inner consistency. :) . But taking Joe’s central point there doseem to be grounds for restricting user input text fields to a reasonable length according to the business need… if onlyto limit the damage of a cat sitting on the keyboard. My approach is to define such fields as ‘text’ and set a constraint using char_length(). This allows PG to do the businesswith the text in native form, and only imposes the cost of any length check when the field is updated… best of bothworlds. Gavan Schneider —— Gavan Schneider, Sodwalls, NSW, Australia Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat,plausible, and wrong. — H. L. Mencken, 1920
On Thursday, June 17, 2021, Gavan Schneider <list.pg.gavan@pendari.org> wrote:
My approach is to define such fields as ‘text’ and set a constraint using char_length(). This allows PG to do the business with the text in native form, and only imposes the cost of any length check when the field is updated… best of both worlds.
Those are basically the same world…your alternative probably is strictly worse than varchar(n) because of its novel way of implementing the same functionality.
For most text fields any good constraint is going be done in the form of a regular expression, one that at minimum prevents non-printable characters (linefeed and carriage return being obvious examples).
David J.
On 18 Jun 2021, at 9:34, David G. Johnston wrote: > On Thursday, June 17, 2021, Gavan Schneider <list.pg.gavan@pendari.org> > wrote: > >> >> My approach is to define such fields as ‘text’ and set a constraint using >> char_length(). This allows PG to do the business with the text in native >> form, and only imposes the cost of any length check when the field is >> updated… best of both worlds. >> > > Those are basically the same world…your alternative probably is strictly > worse than varchar(n) because of its novel way of implementing the same > functionality. > Not sure if this is strictly true. Novelty per se is not always worse. :) The design advantage is in all text fields being defined the same — no built in length. When it becomes apparent a length constraint is needed it can be added for the relevant field(s), e.g., when the system doesnot impose proper restraint at the input stage. > For most text fields any good constraint is going be done in the form of a > regular expression, one that at minimum prevents non-printable characters > (linefeed and carriage return being obvious examples). > Agree. If the business rules need some additional restrictions they can go here as well. Not so obvious that newlines, etc. are unwelcome. Consider the need for human readable text in comment fields (or other annotationwhere the readability is enhanced by such layout). There is always the consideration of SQL injection (but it’smostly too late if we’re leaving this to a constraint) and other toxic string sequences. But this is all business logic.The database just needs to handle the ‘text’ and we need to design the restraint around the content. Gavan Schneider —— Gavan Schneider, Sodwalls, NSW, Australia Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat,plausible, and wrong. — H. L. Mencken, 1920
On Fri, 2021-06-18 at 10:28 +1000, Gavan Schneider wrote: > On 18 Jun 2021, at 9:34, David G. Johnston wrote: > > On Thursday, June 17, 2021, Gavan Schneider <list.pg.gavan@pendari.org> wrote: > > > > > My approach is to define such fields as ‘text’ and set a constraint using > > > char_length(). This allows PG to do the business with the text in native > > > form, and only imposes the cost of any length check when the field is > > > updated… best of both worlds. > > > > > > > Those are basically the same world…your alternative probably is strictly > > worse than varchar(n) because of its novel way of implementing the same > > functionality. > > Not sure if this is strictly true. Novelty per se is not always worse. :) True in general, but not in this case. There is no advantage in a "text" with a check constraint on the length, that is, no added functionality. And it is worse for these reasons: - the performance will be worse (big reason) - the length limit is less obvious if you look at the table definition (small reason) Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com