Thread: Optimization of this SQL sentence
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This SQL sentence is very simple. I need to get better results. I have tried some posibilities and I didn't get good results. SELECT max(idcomment) FROM ficha vf INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR idestado=4)) WHERE idstatus=3 AND ctype=1 QUERY PLAN Aggregate (cost=2730.75..2730.76 rows=1 width=4) (actual time=188.463..188.469 rows=1 loops=1) -> Hash Join (cost=1403.44..2730.72 rows=11 width=4) (actual time=141.464..185.404 rows=513 loops=1) Hash Cond: ("outer".idfile = "inner".idficha) -> Seq Scan on "comment" c (cost=0.00..1321.75 rows=1083 width=8) (actual time=0.291..36.112 rows=642 loops=1) Filter: ((idstatus = 3) AND (ctype = 1)) -> Hash (cost=1403.00..1403.00 rows=178 width=4) (actual time=141.004..141.004 rows=6282 loops=1) -> Seq Scan on ficha vf (cost=0.00..1403.00 rows=178 width=4) (actual time=0.071..97.885 rows=6282 loops=1) Filter: (((idestado)::text = '3'::text) OR ((idestado)::text = '4'::text)) Total runtime: 188.809 ms Thanks in advance, Ruben Rubio -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFFNJzfIo1XmbAXRboRAgPRAJ99+S9wL21b+JN14bQbAoREFXYUcQCYpfEZ p1MCcDMWqTxzSdtssUFWOw== =rUHB -----END PGP SIGNATURE-----
Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and some infoon what version of postgres you are using. Are the tables recently analyzed ? How many rows in them ? Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: pgsql-performance-owner@postgresql.org on behalf of Ruben Rubio Sent: Tue 10/17/2006 2:05 AM To: pgsql-performance@postgresql.org Cc: Subject: [PERFORM] Optimization of this SQL sentence -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This SQL sentence is very simple. I need to get better results. I have tried some posibilities and I didn't get good results. SELECT max(idcomment) FROM ficha vf INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR idestado=4)) WHERE idstatus=3 AND ctype=1 QUERY PLAN Aggregate (cost=2730.75..2730.76 rows=1 width=4) (actual time=188.463..188.469 rows=1 loops=1) -> Hash Join (cost=1403.44..2730.72 rows=11 width=4) (actual time=141.464..185.404 rows=513 loops=1) Hash Cond: ("outer".idfile = "inner".idficha) -> Seq Scan on "comment" c (cost=0.00..1321.75 rows=1083 width=8) (actual time=0.291..36.112 rows=642 loops=1) Filter: ((idstatus = 3) AND (ctype = 1)) -> Hash (cost=1403.00..1403.00 rows=178 width=4) (actual time=141.004..141.004 rows=6282 loops=1) -> Seq Scan on ficha vf (cost=0.00..1403.00 rows=178 width=4) (actual time=0.071..97.885 rows=6282 loops=1) Filter: (((idestado)::text = '3'::text) OR ((idestado)::text = '4'::text)) Total runtime: 188.809 ms Thanks in advance, Ruben Rubio -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFFNJzfIo1XmbAXRboRAgPRAJ99+S9wL21b+JN14bQbAoREFXYUcQCYpfEZ p1MCcDMWqTxzSdtssUFWOw== =rUHB -----END PGP SIGNATURE----- ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1" !DSPAM:45349c86275246672479766! -------------------------------------------------------
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Indexes in comment Comment rows: 17.250 CREATE INDEX usuariofichaoncommnet ON "comment" USING btree (idusuarioficha); Comment structure: CREATE TABLE "comment" ( idcomment int4 NOT NULL DEFAULT nextval('comment_idcomment_seq'::regclass), score int4, title varchar, ctext varchar, idusuarioficha int4, galleta varchar, navlang int4, cdate timestamp DEFAULT now(), idstatus int4, ctype int4 NOT NULL, idfile int4 NOT NULL, nick varchar, nombre varchar, apellidos varchar, dni varchar, nacionalidad varchar, email varchar, telefono varchar, code varchar, memo varchar, c_ip varchar(30), codpais char(2), replay varchar, replaydate timestamp, advsent int4, usrwarn int4, nouserlink int4, aviso_confirmacion_15 timestamp, aviso_confirmacion_60 timestamp, CONSTRAINT comment_pkey PRIMARY KEY (idcomment) ) Ficha structure: No indexes in ficha Ficha rows: 17.850 CREATE TABLE ficha ( idficha int4 NOT NULL DEFAULT nextval('ficha_idficha_seq'::regclass), email varchar(255), web varchar(255), capacidadmin int4, capacidadmax int4, preciotb float4, preciota float4, cp varchar(20), telefono1 varchar(50), telefono2 varchar(50), fax varchar(50), uprecio varchar, udireccion varchar(512), comentarios varchar, ucapacidad varchar(512), upresentacion varchar, utipoaloj varchar(50), ulugares varchar, ucaracteristica varchar, idusuario int4, idlocacion int4, contacto varchar(255), fuente varchar(512), prefijopais varchar(10), idestado char(1), nombre varchar(255), idtipoalojamiento int4, ulocalidad varchar(255), creado timestamp DEFAULT now(), cachefault int4 DEFAULT 0, idpromotiontype_pc int4 NOT NULL DEFAULT 0, idpromotiontype_ant_pc int4, promostartdate_pc timestamp, promoenddate_pc timestamp, localidadruta varchar(255), urlsufix varchar(32), searchengine1 int4, searchengine2 int4, searchengine3 int4, searchengine4 int4, searchengine5 int4, searchengine6 int4, deseo1 int4, deseo2 int4, deseo3 int4, deseo4 int4, deseo5 int4, deseo6 int4, otherspecs varchar(510), lastchange timestamp, idsubestado int4, environment int4, prefijopais2 varchar, web_agencia varchar(255), lat varchar(25), long varchar(25), zoom int4, swzoombloq bool DEFAULT true, titulomapa_l0 varchar(255), titulomapa_l1 varchar(255), titulomapa_l2 varchar(255), titulomapa_l3 varchar(255), titulomapa_l4 varchar(255), titulomapa_l5 varchar(255), titulomapa_l6 varchar(255), titulomapa_l7 varchar(255), titulomapa_l8 varchar(255), titulomapa_l9 varchar(255), CONSTRAINT pk_ficha PRIMARY KEY (idficha), CONSTRAINT fk_ficha_geonivel6 FOREIGN KEY (idlocacion) REFERENCES geonivel6 (idgeonivel6) ON UPDATE NO ACTION ON DELETE NO ACTION ) Gregory S. Williamson escribió: > Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and someinfo on what version of postgres you are using. > > Are the tables recently analyzed ? How many rows in them ? > > Greg Williamson > DBA > GlobeXplorer LLC > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org on behalf of Ruben Rubio > Sent: Tue 10/17/2006 2:05 AM > To: pgsql-performance@postgresql.org > Cc: > Subject: [PERFORM] Optimization of this SQL sentence > > This SQL sentence is very simple. I need to get better results. I have > tried some posibilities and I didn't get good results. > > SELECT max(idcomment) > FROM ficha vf > INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR > idestado=4)) > WHERE idstatus=3 > AND ctype=1 > > > QUERY PLAN > > Aggregate (cost=2730.75..2730.76 rows=1 width=4) (actual > time=188.463..188.469 rows=1 loops=1) > > -> Hash Join (cost=1403.44..2730.72 rows=11 width=4) (actual > time=141.464..185.404 rows=513 loops=1) > > Hash Cond: ("outer".idfile = "inner".idficha) > > -> Seq Scan on "comment" c (cost=0.00..1321.75 rows=1083 > width=8) (actual time=0.291..36.112 rows=642 loops=1) > > Filter: ((idstatus = 3) AND (ctype = 1)) > > -> Hash (cost=1403.00..1403.00 rows=178 width=4) (actual > time=141.004..141.004 rows=6282 loops=1) > > -> Seq Scan on ficha vf (cost=0.00..1403.00 rows=178 > width=4) (actual time=0.071..97.885 rows=6282 loops=1) > > Filter: (((idestado)::text = '3'::text) OR > ((idestado)::text = '4'::text)) > > Total runtime: 188.809 ms > > > Thanks in advance, > Ruben Rubio - ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq - ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1" !DSPAM:45349c86275246672479766! - ------------------------------------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNKNeIo1XmbAXRboRAsiDAKCce+BeyYK63r24w2E1QNq/3maMJQCeNpNw GiwJ/KixMHH76919wQR31g8= =g/re -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If just just realized that is a litlle faster (15% faster) with this: CREATE INDEX idx_statustype ON "comment" USING btree (idstatus, ctype); Any other ideas? Gregory S. Williamson escribió: > Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and someinfo on what version of postgres you are using. > > Are the tables recently analyzed ? How many rows in them ? > > Greg Williamson > DBA > GlobeXplorer LLC > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org on behalf of Ruben Rubio > Sent: Tue 10/17/2006 2:05 AM > To: pgsql-performance@postgresql.org > Cc: > Subject: [PERFORM] Optimization of this SQL sentence > > This SQL sentence is very simple. I need to get better results. I have > tried some posibilities and I didn't get good results. > > SELECT max(idcomment) > FROM ficha vf > INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR > idestado=4)) > WHERE idstatus=3 > AND ctype=1 > > > QUERY PLAN > > Aggregate (cost=2730.75..2730.76 rows=1 width=4) (actual > time=188.463..188.469 rows=1 loops=1) > > -> Hash Join (cost=1403.44..2730.72 rows=11 width=4) (actual > time=141.464..185.404 rows=513 loops=1) > > Hash Cond: ("outer".idfile = "inner".idficha) > > -> Seq Scan on "comment" c (cost=0.00..1321.75 rows=1083 > width=8) (actual time=0.291..36.112 rows=642 loops=1) > > Filter: ((idstatus = 3) AND (ctype = 1)) > > -> Hash (cost=1403.00..1403.00 rows=178 width=4) (actual > time=141.004..141.004 rows=6282 loops=1) > > -> Seq Scan on ficha vf (cost=0.00..1403.00 rows=178 > width=4) (actual time=0.071..97.885 rows=6282 loops=1) > > Filter: (((idestado)::text = '3'::text) OR > ((idestado)::text = '4'::text)) > > Total runtime: 188.809 ms > > > Thanks in advance, > Ruben Rubio - ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq - ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1" !DSPAM:45349c86275246672479766! - ------------------------------------------------------- - ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNKT4Io1XmbAXRboRAurtAKC8YWjgzytaqkPjLfrohZ1aceZivwCgpDii wzxc4fktzIHTZRhPuJLi2Wc= =Korn -----END PGP SIGNATURE-----
am Tue, dem 17.10.2006, um 11:33:18 +0200 mailte Ruben Rubio folgendes: > > > > SELECT max(idcomment) > > FROM ficha vf > > INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR > > idestado=4)) > > WHERE idstatus=3 > > AND ctype=1 check for indexes on vf.idficha, c.idfile, idstatus and ctype. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47215, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Oct 17, 2006, at 11:33 , Ruben Rubio wrote: > CREATE TABLE "comment" > ( > idcomment int4 NOT NULL DEFAULT > nextval('comment_idcomment_seq'::regclass), [snip 28 columns] > CONSTRAINT comment_pkey PRIMARY KEY (idcomment) > ) > > Ficha structure: > No indexes in ficha > Ficha rows: 17.850 > > CREATE TABLE ficha > ( > idficha int4 NOT NULL DEFAULT nextval > ('ficha_idficha_seq'::regclass), [snip 67 (!) columns] > CONSTRAINT pk_ficha PRIMARY KEY (idficha), > CONSTRAINT fk_ficha_geonivel6 FOREIGN KEY (idlocacion) REFERENCES > geonivel6 (idgeonivel6) ON UPDATE NO ACTION ON DELETE NO ACTION > ) These tables are particularly egregious examples of ignorant database design. You need to understand the relational model (http:// en.wikipedia.org/wiki/Relational_model), specifically data normalization (http://en.wikipedia.org/wiki/Database_normalization) and 3NF (http://en.wikipedia.org/wiki/3NF). These columns are particularly telling: searchengine1 int4, searchengine2 int4, searchengine3 int4, searchengine4 int4, searchengine5 int4, searchengine6 int4, deseo1 int4, deseo2 int4, deseo3 int4, deseo4 int4, deseo5 int4, deseo6 int4, titulomapa_l0 varchar(255), titulomapa_l1 varchar(255), titulomapa_l2 varchar(255), titulomapa_l3 varchar(255), titulomapa_l4 varchar(255), titulomapa_l5 varchar(255), titulomapa_l6 varchar(255), titulomapa_l7 varchar(255), titulomapa_l8 varchar(255), titulomapa_l9 varchar(255), Refactor into three separate tables: create table searchengine ( idficha int references ficha (idficha), searchengine int, primary key (idficha, searchengine) ); create table deseo ( idficha int references ficha (idficha), deseo int, primary key (idficha, deseo) ); create table titulomapa ( idficha int references ficha (idficha), titulomapa int, primary key (idficha, titulomapa) ); Now you can find all search engines for a single ficha row: select searchengine from searchengine where idficha = n This design allows for more than 5 search engines per ficha row, and allows expressive joins such as: select ficha.idficha, searchengine.searchengine inner join searchengine on searchengine.idfciha = ficha.idficha Also, most of your columns are nullable. This alone shows that you don't understand your own data. Lastly, note that in PostgreSQL these length declarations are not necessary: contacto varchar(255), fuente varchar(512), prefijopais varchar(10) Instead, use: contacto text, fuente text, prefijopais text See the PostgreSQL manual for an explanation of varchar vs. text. Alexander.
You could try rewriting the query like this: SELECT MAX(idcomment) FROM comment c WHERE idstatus=3 AND ctype=1 AND EXISTS (SELECT 1 FROM ficha vf WHERE idestado IN ('3', '4') AND vf.idficha = c.idfile); The planner can then try a backward scan on the comment_pkey index, which should be quicker than the seq scan assuming that there's a lot of rows that match the restrictions (idstatus=3 ctype=1 idestado IN ('3', '4')). But see comments inline below: Ruben Rubio wrote: > CREATE TABLE "comment" > ( > idcomment int4 NOT NULL DEFAULT > nextval('comment_idcomment_seq'::regclass), > score int4, > title varchar, > ctext varchar, > idusuarioficha int4, > galleta varchar, > navlang int4, > cdate timestamp DEFAULT now(), > idstatus int4, > ctype int4 NOT NULL, > idfile int4 NOT NULL, > nick varchar, > nombre varchar, > apellidos varchar, > dni varchar, > nacionalidad varchar, > email varchar, > telefono varchar, > code varchar, > memo varchar, > c_ip varchar(30), > codpais char(2), > replay varchar, > replaydate timestamp, > advsent int4, > usrwarn int4, > nouserlink int4, > aviso_confirmacion_15 timestamp, > aviso_confirmacion_60 timestamp, > CONSTRAINT comment_pkey PRIMARY KEY (idcomment) > ) Without knowing anything about you're application, it looks like there's a some fields in the comment-table that are duplicates of fields in the ficha-table. Telefono and email for example. You should consider doing some normalization. > No indexes in ficha Except for the implicit idficha_pkey index. > CREATE TABLE ficha > ( > ... > idestado char(1), If idestado contains numbers (codes of some kind, I presume), you're better off using the smallint data type. > .... > searchengine1 int4, > searchengine2 int4, > searchengine3 int4, > searchengine4 int4, > searchengine5 int4, > searchengine6 int4, Normalization?! > deseo1 int4, > deseo2 int4, > deseo3 int4, > deseo4 int4, > deseo5 int4, > deseo6 int4, For these as well... > ... > lat varchar(25), > long varchar(25), Isn't there's a better data type for latitude and longitude? Decimal, perhaps? > titulomapa_l0 varchar(255), > titulomapa_l1 varchar(255), > titulomapa_l2 varchar(255), > titulomapa_l3 varchar(255), > titulomapa_l4 varchar(255), > titulomapa_l5 varchar(255), > titulomapa_l6 varchar(255), > titulomapa_l7 varchar(255), > titulomapa_l8 varchar(255), > titulomapa_l9 varchar(255), Again, normalization... - Heikki
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi to everyone, First of all I have to say that I now the database is not ok. There was a people before me that didn't do the thinks right. I would like to normalize the database, but it takes too much time (there is is hundred of SQLs to change and there is not enough technical resources). Also, datacolumns in some places has same names, but the data that is stores has different usages. Thanks everyone for all hints, I ll try to do my best performing the database structure. By other hand, I was able to create the corrects index and with this AND EXISTS (SELECT 1 FROM ficha vf WHERE idestado IN ('3', '4') AND vf.idficha = c.idfile); it is really fast. Thanks to everybody. Regards, Ruben Rubio Heikki Linnakangas escribió: > You could try rewriting the query like this: > > SELECT MAX(idcomment) > FROM comment c > WHERE idstatus=3 AND ctype=1 > AND EXISTS (SELECT 1 FROM ficha vf WHERE idestado IN ('3', '4') AND > vf.idficha = c.idfile); > > The planner can then try a backward scan on the comment_pkey index, > which should be quicker than the seq scan assuming that there's a lot of > rows that match the restrictions (idstatus=3 ctype=1 idestado IN ('3', > '4')). > > But see comments inline below: > > Ruben Rubio wrote: >> CREATE TABLE "comment" >> ( >> idcomment int4 NOT NULL DEFAULT >> nextval('comment_idcomment_seq'::regclass), >> score int4, >> title varchar, >> ctext varchar, >> idusuarioficha int4, >> galleta varchar, >> navlang int4, >> cdate timestamp DEFAULT now(), >> idstatus int4, >> ctype int4 NOT NULL, >> idfile int4 NOT NULL, >> nick varchar, >> nombre varchar, >> apellidos varchar, >> dni varchar, >> nacionalidad varchar, >> email varchar, >> telefono varchar, >> code varchar, >> memo varchar, >> c_ip varchar(30), >> codpais char(2), >> replay varchar, >> replaydate timestamp, >> advsent int4, >> usrwarn int4, >> nouserlink int4, >> aviso_confirmacion_15 timestamp, >> aviso_confirmacion_60 timestamp, >> CONSTRAINT comment_pkey PRIMARY KEY (idcomment) >> ) > > Without knowing anything about you're application, it looks like there's > a some fields in the comment-table that are duplicates of fields in the > ficha-table. Telefono and email for example. You should consider doing > some normalization. > >> No indexes in ficha > > Except for the implicit idficha_pkey index. > >> CREATE TABLE ficha >> ( >> ... >> idestado char(1), > > If idestado contains numbers (codes of some kind, I presume), you're > better off using the smallint data type. > >> .... >> searchengine1 int4, >> searchengine2 int4, >> searchengine3 int4, >> searchengine4 int4, >> searchengine5 int4, >> searchengine6 int4, > > Normalization?! > >> deseo1 int4, >> deseo2 int4, >> deseo3 int4, >> deseo4 int4, >> deseo5 int4, >> deseo6 int4, > > For these as well... > >> ... >> lat varchar(25), >> long varchar(25), > > Isn't there's a better data type for latitude and longitude? Decimal, > perhaps? > >> titulomapa_l0 varchar(255), >> titulomapa_l1 varchar(255), >> titulomapa_l2 varchar(255), >> titulomapa_l3 varchar(255), >> titulomapa_l4 varchar(255), >> titulomapa_l5 varchar(255), >> titulomapa_l6 varchar(255), >> titulomapa_l7 varchar(255), >> titulomapa_l8 varchar(255), >> titulomapa_l9 varchar(255), > > Again, normalization... > > - Heikki > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNK+jIo1XmbAXRboRAu6cAKCMUWHjcAYwN4DhVl1tSjMirgRAawCgvk8c gSB/4p1ZBOrDEwU9EW/yxw8= =yFoD -----END PGP SIGNATURE-----
> These tables are particularly egregious examples of ignorant database > design. You need to understand the relational model This email is a *particularly* egregious example of rudeness. You owe Mr. Staubo, and the Postgress community, an apology. There is absolutely no reason to insult people who come to this forum for help. That's why the forum is here, to help peoplewho are "ignorant" and want to improve their knowledge. Craig
On Oct 17, 2006, at 17:10 , Craig A. James wrote: >> These tables are particularly egregious examples of ignorant >> database design. You need to understand the relational model > > This email is a *particularly* egregious example of rudeness. You > owe Mr. Staubo, and the Postgress community, an apology. I'm sorry you feel that way, but I don't think I was out of line. I did point to several informative sources of documentation, and described some of the problems (but by no means all) with the person's schema and how to solve them. If you think the database design in question is *not* ignorant database design, please do explain why, but on technical grounds. (Ignorance, of course, is not a sin.) Alexander.
On Oct 17, 2006, at 17:29 , Mario Weilguni wrote: > Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: >> Lastly, note that in PostgreSQL these length declarations are not >> necessary: >> >> contacto varchar(255), >> fuente varchar(512), >> prefijopais varchar(10) > > Enforcing length constraints with varchar(xyz) is good database > design, not a > bad one. Using text everywhere might be tempting because it works, > but it's > not a good idea. Enforcing length constraints is generally a bad idea because it assumes you know the data domain as expressed in a quantity of characters. Off the top of your head, do you know the maximum length of a zip code? A street address? The name of a city? In almost all cases the limit you invent is arbitrary, and the probability of being incompatible with any given input is inversely proportional to that arbitrary limit. Encoding specific length constraints in the database makes sense when they relate explicitly to business logic, but I can think of only a few cases where it would make sense: restricting the length of passwords, user names, and so on. In a few cases you do know with 100% certainty the limit of your field, such as with standardized abbreviations: ISO 3166 country codes, for example. And sometimes you want to cap data due to storage or transmission costs. The length constraint on text fields is primarily a historical artifact stemming from the way databases have traditionally been implemented, as fixed-length fields in fixed-length row structures. The inexplicable, improbable space-padded (!) "character" data type in ANSI SQL is a vestige of this legacy. PostgreSQL's variable-length rows and TOAST mechanism makes the point moot. Quoth the PostgreSQL manual, section 8.3: > There are no performance differences between these three types, > apart from the increased storage size when using the blank-padded > type. While character(n) has performance advantages in some other > database systems, it has no such advantages in PostgreSQL. In most > situations text or character varying should be used instead. Alexander.
Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: > Lastly, note that in PostgreSQL these length declarations are not > necessary: > > contacto varchar(255), > fuente varchar(512), > prefijopais varchar(10) > > Instead, use: > > contacto text, > fuente text, > prefijopais text > > See the PostgreSQL manual for an explanation of varchar vs. text. Enforcing length constraints with varchar(xyz) is good database design, not a bad one. Using text everywhere might be tempting because it works, but it's not a good idea.
On 10/17/06, Mario Weilguni <mweilguni@sime.com> wrote: > Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: > > Lastly, note that in PostgreSQL these length declarations are not > > necessary: > > > > contacto varchar(255), > > fuente varchar(512), > > prefijopais varchar(10) > > > > Instead, use: > > > > contacto text, > > fuente text, > > prefijopais text > > > > See the PostgreSQL manual for an explanation of varchar vs. text. > > Enforcing length constraints with varchar(xyz) is good database design, not a > bad one. Using text everywhere might be tempting because it works, but it's > not a good idea. while you are correct, i think the spirit of the argument is wrong becuase there is no constraint to be enforced in those fields. a length constraint of n is only valid is n + 1 characters are an error and should be rejected by the database. anything else is IMO bad form. There are practial exceptions to this rule though, for example client technology that might require a length. so, imo alexander is correct: contacto varchar(255) ...is a false constraint, why exactly 255? is that were the dart landed? specifically limiting text fields so users 'don't enter too much data' is a manifestation c programmer's disease :) note I am not picking on the OP here, just weighing in on the constraint argument. merlin
mmoncure@gmail.com ("Merlin Moncure") writes: > On 10/17/06, Mario Weilguni <mweilguni@sime.com> wrote: >> Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: >> > Lastly, note that in PostgreSQL these length declarations are not >> > necessary: >> > >> > contacto varchar(255), >> > fuente varchar(512), >> > prefijopais varchar(10) >> > >> > Instead, use: >> > >> > contacto text, >> > fuente text, >> > prefijopais text >> > >> > See the PostgreSQL manual for an explanation of varchar vs. text. >> >> Enforcing length constraints with varchar(xyz) is good database design, not a >> bad one. Using text everywhere might be tempting because it works, but it's >> not a good idea. > > while you are correct, i think the spirit of the argument is wrong > becuase there is no constraint to be enforced in those fields. a > length constraint of n is only valid is n + 1 characters are an error > and should be rejected by the database. anything else is IMO bad > form. There are practial exceptions to this rule though, for example > client technology that might require a length. > > so, imo alexander is correct: > contacto varchar(255) > > ...is a false constraint, why exactly 255? is that were the dart landed? Yeah, 255 seems silly to me. If I'm going to be arbitrary, there are two better choices: 1. 80, because that's how many characters one can fit across a piece of paper whilst keeping things pretty readable; 2. 64, because that will fit on a screen, and leave some space for a field name/description. > specifically limiting text fields so users 'don't enter too much > data' is a manifestation c programmer's disease :) No, I can't agree. I'm pretty accustomed to languages that don't pinch you the ways C does, and I still dislike having over-wide columns because it makes it more difficult to generate readable reports. -- output = ("cbbrowne" "@" "linuxfinances.info") http://linuxdatabases.info/info/unix.html "Instant coffee is like pouring hot water over the cremated remains of a good friend."
alex@purefiction.net (Alexander Staubo) writes: > On Oct 17, 2006, at 17:29 , Mario Weilguni wrote: > >> Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: >>> Lastly, note that in PostgreSQL these length declarations are not >>> necessary: >>> >>> contacto varchar(255), >>> fuente varchar(512), >>> prefijopais varchar(10) >> >> Enforcing length constraints with varchar(xyz) is good database >> design, not a >> bad one. Using text everywhere might be tempting because it works, >> but it's >> not a good idea. > > Enforcing length constraints is generally a bad idea because it > assumes you know the data domain as expressed in a quantity of > characters. Off the top of your head, do you know the maximum length > of a zip code? A street address? The name of a city? In the case of a zip code? Sure. US zip codes are integer values either 5 or 9 characters long. In the case of some of our internal applications, we need to conform to some IETF and ITU standards which actually do enforce some maximum lengths on these sorts of things. > In almost all cases the limit you invent is arbitrary, and the > probability of being incompatible with any given input is inversely > proportional to that arbitrary limit. I'd be quite inclined to limit things like addresses to somewhat smaller sizes than you might expect. If addresses are to be used to generate labels for envelopes, for instance, it's reasonably important to limit sizes to those that might fit on a label or an envelope. > Encoding specific length constraints in the database makes sense > when they relate explicitly to business logic, but I can think of > only a few cases where it would make sense: restricting the length > of passwords, user names, and so on. In a few cases you do know with > 100% certainty the limit of your field, such as with standardized > abbreviations: ISO 3166 country codes, for example. And sometimes > you want to cap data due to storage or transmission costs. There's another reason: Open things up wide, and some people will fill the space with rubbish. -- "cbbrowne","@","acm.org" http://linuxfinances.info/info/internet.html "The Amiga is proof that if you build a better mousetrap, the rats will gang up on you." -- Bill Roberts bill.roberts@ensco.com
Chris Browne wrote: > In the case of a zip code? Sure. US zip codes are integer values > either 5 or 9 characters long. So your app will only work in the US? And only for US companies that only have US clients? Sorry had to dig at that ;-P -- Shane Ambler Postgres@007Marketing.com Get Sheeky @ http://Sheeky.Biz
The world rejoiced as pgsql@007Marketing.com (Shane Ambler) wrote: > Chris Browne wrote: >> In the case of a zip code? Sure. US zip codes are integer values >> either 5 or 9 characters long. > > So your app will only work in the US? > And only for US companies that only have US clients? > > > Sorry had to dig at that ;-P Heh. I'm not in the US, so that's not the sort of mistake I'd be likely to make... The thing is, the only place where they call this sort of thing a "zip code" is the US. Elsewhere, it's called a postal code. -- (reverse (concatenate 'string "gro.mca" "@" "enworbbc")) http://linuxfinances.info/info/finances.html Rules of the Evil Overlord #159. "If I burst into rebel headquarters and find it deserted except for an odd, blinking device, I will not walk up and investigate; I'll run like hell." <http://www.eviloverlord.com/>
Christopher Browne wrote: > The world rejoiced as pgsql@007Marketing.com (Shane Ambler) wrote: >> Chris Browne wrote: >>> In the case of a zip code? Sure. US zip codes are integer values >>> either 5 or 9 characters long. >> So your app will only work in the US? >> And only for US companies that only have US clients? >> >> >> Sorry had to dig at that ;-P > > Heh. I'm not in the US, so that's not the sort of mistake I'd be > likely to make... > > The thing is, the only place where they call this sort of thing a "zip > code" is the US. Elsewhere, it's called a postal code. Same meaning/use different name (that's a locale issue for the client displaying the data) they will all use the same column for that data. -- Shane Ambler Postgres@007Marketing.com Get Sheeky @ http://Sheeky.Biz
Alexander Staubo wrote: > On Oct 17, 2006, at 17:10 , Craig A. James wrote: > >>> These tables are particularly egregious examples of ignorant >>> database design. You need to understand the relational model >> >> This email is a *particularly* egregious example of rudeness. You owe >> Mr. Staubo, and the Postgress community, an apology. > > I'm sorry you feel that way, but I don't think I was out of line. > ... If you think the database design in question is *not* > ignorant database design, please do explain why, but on technical > grounds. (Ignorance, of course, is not a sin.) This is not about design. It's about someone who came for help, and got a derogatory remark. Is it really so hard to behelpful *and* use polite, encouraging language? Craig
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> so, imo alexander is correct: >> contacto varchar(255) Why do we have limits on this, for example? contacto varchar(255) 1) First of all, this is a web application. People use to enter really strange thinks there, and a lot of rubbish. So, as someone commented before, I am ok with the idea of limit the field. 2) Again, this is a web application. We could just limit the "field front end length" but this will be not secure. We could limit the field front end, and limit it with the code that process the data, but is much secure if we just limit field in database. Why 255? This is free field. We are interested in users enter as much data as needed (with a limit, we no not want "The Quijote" in that field) People use to imput as spected: "Contact: Baltolomé Peralez." But people also inserts thinks as: "Contact: In the mornings, Bartolomé Perales, in the afternoons Juan Perales in the same telephone number" In the context of the application, we are not interested in stopping the user with a message "Hey, Your contact is too long". We want the user to go on. That's why 255. We could insert 260. Or maybe 250. But someone decided 255 and for me its OK. Please remember that I just put some data of a large applications, and there is thinks there that has sense in context. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNdJsIo1XmbAXRboRAuGVAKCupfXOHwxXOPHFdq+K6S0lXWNZUwCgml1i CS0eEJcQndEJb7h7Nsfh1CM= =0gpW -----END PGP SIGNATURE-----
Am Dienstag, 17. Oktober 2006 17:50 schrieb Alexander Staubo: > On Oct 17, 2006, at 17:29 , Mario Weilguni wrote: > > > > Enforcing length constraints with varchar(xyz) is good database > > design, not a > > bad one. Using text everywhere might be tempting because it works, > > but it's > > not a good idea. > > Enforcing length constraints is generally a bad idea because it > assumes you know the data domain as expressed in a quantity of > characters. Off the top of your head, do you know the maximum length > of a zip code? A street address? The name of a city? It's not a bad idea. Usually I use postal codes with 25 chars, and never had any problem. With text, the limit would be ~1 GB. No matter how much testing in the application happens, the varchar(25) as last resort is a good idea. And in most cases, the application itself limits the length, and thus it's good to reflect this in the database design. Feel free to use text anywhere for your application, and feel free to use numeric(1000) instead of numeric(4) if you want to be prepared for really long numbers, but don't tell other people it's bad database design - it isn't.
Mario Weilguni wrote: >> >> Â Â contacto varchar(255), >> Â Â fuente varchar(512), >> Â Â prefijopais varchar(10) >> >> Instead, use: >> >> Â Â contacto text, >> Â Â fuente text, >> Â Â prefijopais text >> >> See the PostgreSQL manual for an explanation of varchar vs. text. > > Enforcing length constraints with varchar(xyz) is good database design, not a > bad one. Using text everywhere might be tempting because it works, but it's > not a good idea. > I've always used the rationale: If you *know* that the data is length constrained, then it is ok to reflect this in the domain you use - err, thats why they have length limits! e.g. if you know that 'prefijopais' can *never* be > 10 chars in length, then varchar(10) is a good choice. If the data length is unknown or known to be unlimited, then reflect that in the domain you use - e.g if 'fuente' or 'contacto' have no reason to be constrained, then just use text. best wishes Mark
On Wed, Oct 18, 2006 at 11:31:44AM +0200, Mario Weilguni wrote: > It's not a bad idea. Usually I use postal codes with 25 chars, and never had > any problem. With text, the limit would be ~1 GB. No matter how much testing > in the application happens, the varchar(25) as last resort is a good idea. > And in most cases, the application itself limits the length, and thus it's > good to reflect this in the database design. > Feel free to use text anywhere for your application, and feel free to use > numeric(1000) instead of numeric(4) if you want to be prepared for really > long numbers, but don't tell other people it's bad database design - it > isn't. It's unnecessary design. Suggestions in this regard lead towards the user seeing a database error, instead of a nice specific message provided by the application. I used to use varchar instead of text, but have since softened, as the number of times it has ever actually saved me is zero, and the number of times it has screwed me up (picking too small of a limit too early) has been a few. It's kind of like pre-optimization before there is a problem. Sometimes it works for you, sometimes it works against. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
On Tue, Oct 17, 2006 at 12:51:19PM -0400, Merlin Moncure wrote: > so, imo alexander is correct: > contacto varchar(255) > > ...is a false constraint, why exactly 255? is that were the dart landed? BTW, if we get variable-length varlena headers at some point, then setting certain limits might make sense to keep performance more consistent. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tue, Oct 17, 2006 at 12:25:39PM +0200, Ruben Rubio wrote: > First of all I have to say that I now the database is not ok. There was > a people before me that didn't do the thinks right. I would like to > normalize the database, but it takes too much time (there is is hundred > of SQLs to change and there is not enough technical resources). Also, > datacolumns in some places has same names, but the data that is stores > has different usages. FWIW, things like views and rules make those transations a lot easier to tackle. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On 10/18/06, Jim C. Nasby <jim@nasby.net> wrote: > On Tue, Oct 17, 2006 at 12:51:19PM -0400, Merlin Moncure wrote: > > so, imo alexander is correct: > > contacto varchar(255) > > > > ...is a false constraint, why exactly 255? is that were the dart landed? > > BTW, if we get variable-length varlena headers at some point, then > setting certain limits might make sense to keep performance more > consistent. I would argue that it is assumptions about the underlying architecture that got everyone into trouble in the first place :). I would prefer to treat length constraint as a constraint (n + 1 = error), unless there was a *compelling* reason to do otherwise, which currently there isn't (or hasn't been since we got toast) a lot of this stuff s due to legacy thinking, a lot of dbf products had limts to varchar around 255 or so. imo, a proper constraint system would apply everything at the domain level, and minlength and maxlength would get equal weight, and be optional for all types. merlin