Thread: Optimization of this SQL sentence

Optimization of this SQL sentence

From
Ruben Rubio
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This SQL sentence is very simple. I need to get better results. I have
tried some posibilities and I didn't get good results.

SELECT max(idcomment)
 FROM ficha vf
 INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR
idestado=4))
 WHERE idstatus=3
 AND ctype=1


QUERY PLAN

Aggregate  (cost=2730.75..2730.76 rows=1 width=4) (actual
time=188.463..188.469 rows=1 loops=1)

  ->  Hash Join  (cost=1403.44..2730.72 rows=11 width=4) (actual
time=141.464..185.404 rows=513 loops=1)

        Hash Cond: ("outer".idfile = "inner".idficha)

        ->  Seq Scan on "comment" c  (cost=0.00..1321.75 rows=1083
width=8) (actual time=0.291..36.112 rows=642 loops=1)

              Filter: ((idstatus = 3) AND (ctype = 1))

        ->  Hash  (cost=1403.00..1403.00 rows=178 width=4) (actual
time=141.004..141.004 rows=6282 loops=1)

              ->  Seq Scan on ficha vf  (cost=0.00..1403.00 rows=178
width=4) (actual time=0.071..97.885 rows=6282 loops=1)

                    Filter: (((idestado)::text = '3'::text) OR
((idestado)::text = '4'::text))

Total runtime: 188.809 ms


Thanks in advance,
Ruben Rubio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFFNJzfIo1XmbAXRboRAgPRAJ99+S9wL21b+JN14bQbAoREFXYUcQCYpfEZ
p1MCcDMWqTxzSdtssUFWOw==
=rUHB
-----END PGP SIGNATURE-----

Re: Optimization of this SQL sentence

From
"Gregory S. Williamson"
Date:
Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and some
infoon what version of postgres you are using. 

Are the tables recently analyzed ? How many rows in them ?

Greg Williamson
DBA
GlobeXplorer LLC


-----Original Message-----
From:    pgsql-performance-owner@postgresql.org on behalf of Ruben Rubio
Sent:    Tue 10/17/2006 2:05 AM
To:    pgsql-performance@postgresql.org
Cc:
Subject:    [PERFORM] Optimization of this SQL sentence

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This SQL sentence is very simple. I need to get better results. I have
tried some posibilities and I didn't get good results.

SELECT max(idcomment)
 FROM ficha vf
 INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR
idestado=4))
 WHERE idstatus=3
 AND ctype=1


QUERY PLAN

Aggregate  (cost=2730.75..2730.76 rows=1 width=4) (actual
time=188.463..188.469 rows=1 loops=1)

  ->  Hash Join  (cost=1403.44..2730.72 rows=11 width=4) (actual
time=141.464..185.404 rows=513 loops=1)

        Hash Cond: ("outer".idfile = "inner".idficha)

        ->  Seq Scan on "comment" c  (cost=0.00..1321.75 rows=1083
width=8) (actual time=0.291..36.112 rows=642 loops=1)

              Filter: ((idstatus = 3) AND (ctype = 1))

        ->  Hash  (cost=1403.00..1403.00 rows=178 width=4) (actual
time=141.004..141.004 rows=6282 loops=1)

              ->  Seq Scan on ficha vf  (cost=0.00..1403.00 rows=178
width=4) (actual time=0.071..97.885 rows=6282 loops=1)

                    Filter: (((idestado)::text = '3'::text) OR
((idestado)::text = '4'::text))

Total runtime: 188.809 ms


Thanks in advance,
Ruben Rubio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFFNJzfIo1XmbAXRboRAgPRAJ99+S9wL21b+JN14bQbAoREFXYUcQCYpfEZ
p1MCcDMWqTxzSdtssUFWOw==
=rUHB
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:45349c86275246672479766!
-------------------------------------------------------






Re: Optimization of this SQL sentence

From
Ruben Rubio
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1




Indexes in comment
Comment rows: 17.250

CREATE INDEX usuariofichaoncommnet
  ON "comment"
  USING btree
  (idusuarioficha);

Comment structure:
CREATE TABLE "comment"
(
  idcomment int4 NOT NULL DEFAULT
nextval('comment_idcomment_seq'::regclass),
  score int4,
  title varchar,
  ctext varchar,
  idusuarioficha int4,
  galleta varchar,
  navlang int4,
  cdate timestamp DEFAULT now(),
  idstatus int4,
  ctype int4 NOT NULL,
  idfile int4 NOT NULL,
  nick varchar,
  nombre varchar,
  apellidos varchar,
  dni varchar,
  nacionalidad varchar,
  email varchar,
  telefono varchar,
  code varchar,
  memo varchar,
  c_ip varchar(30),
  codpais char(2),
  replay varchar,
  replaydate timestamp,
  advsent int4,
  usrwarn int4,
  nouserlink int4,
  aviso_confirmacion_15 timestamp,
  aviso_confirmacion_60 timestamp,
  CONSTRAINT comment_pkey PRIMARY KEY (idcomment)
)

Ficha structure:
No indexes in ficha
Ficha rows: 17.850

CREATE TABLE ficha
(
  idficha int4 NOT NULL DEFAULT nextval('ficha_idficha_seq'::regclass),
  email varchar(255),
  web varchar(255),
  capacidadmin int4,
  capacidadmax int4,
  preciotb float4,
  preciota float4,
  cp varchar(20),
  telefono1 varchar(50),
  telefono2 varchar(50),
  fax varchar(50),
  uprecio varchar,
  udireccion varchar(512),
  comentarios varchar,
  ucapacidad varchar(512),
  upresentacion varchar,
  utipoaloj varchar(50),
  ulugares varchar,
  ucaracteristica varchar,
  idusuario int4,
  idlocacion int4,
  contacto varchar(255),
  fuente varchar(512),
  prefijopais varchar(10),
  idestado char(1),
  nombre varchar(255),
  idtipoalojamiento int4,
  ulocalidad varchar(255),
  creado timestamp DEFAULT now(),
  cachefault int4 DEFAULT 0,
  idpromotiontype_pc int4 NOT NULL DEFAULT 0,
  idpromotiontype_ant_pc int4,
  promostartdate_pc timestamp,
  promoenddate_pc timestamp,
  localidadruta varchar(255),
  urlsufix varchar(32),
  searchengine1 int4,
  searchengine2 int4,
  searchengine3 int4,
  searchengine4 int4,
  searchengine5 int4,
  searchengine6 int4,
  deseo1 int4,
  deseo2 int4,
  deseo3 int4,
  deseo4 int4,
  deseo5 int4,
  deseo6 int4,
  otherspecs varchar(510),
  lastchange timestamp,
  idsubestado int4,
  environment int4,
  prefijopais2 varchar,
  web_agencia varchar(255),
  lat varchar(25),
  long varchar(25),
  zoom int4,
  swzoombloq bool DEFAULT true,
  titulomapa_l0 varchar(255),
  titulomapa_l1 varchar(255),
  titulomapa_l2 varchar(255),
  titulomapa_l3 varchar(255),
  titulomapa_l4 varchar(255),
  titulomapa_l5 varchar(255),
  titulomapa_l6 varchar(255),
  titulomapa_l7 varchar(255),
  titulomapa_l8 varchar(255),
  titulomapa_l9 varchar(255),
  CONSTRAINT pk_ficha PRIMARY KEY (idficha),
  CONSTRAINT fk_ficha_geonivel6 FOREIGN KEY (idlocacion) REFERENCES
geonivel6 (idgeonivel6) ON UPDATE NO ACTION ON DELETE NO ACTION
)



Gregory S. Williamson escribió:
> Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and
someinfo on what version of postgres you are using. 
>
> Are the tables recently analyzed ? How many rows in them ?
>
> Greg Williamson
> DBA
> GlobeXplorer LLC
>
>
> -----Original Message-----
> From:    pgsql-performance-owner@postgresql.org on behalf of Ruben Rubio
> Sent:    Tue 10/17/2006 2:05 AM
> To:    pgsql-performance@postgresql.org
> Cc:
> Subject:    [PERFORM] Optimization of this SQL sentence
>
> This SQL sentence is very simple. I need to get better results. I have
> tried some posibilities and I didn't get good results.
>
> SELECT max(idcomment)
>  FROM ficha vf
>  INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR
> idestado=4))
>  WHERE idstatus=3
>  AND ctype=1
>
>
> QUERY PLAN
>
> Aggregate  (cost=2730.75..2730.76 rows=1 width=4) (actual
> time=188.463..188.469 rows=1 loops=1)
>
>   ->  Hash Join  (cost=1403.44..2730.72 rows=11 width=4) (actual
> time=141.464..185.404 rows=513 loops=1)
>
>         Hash Cond: ("outer".idfile = "inner".idficha)
>
>         ->  Seq Scan on "comment" c  (cost=0.00..1321.75 rows=1083
> width=8) (actual time=0.291..36.112 rows=642 loops=1)
>
>               Filter: ((idstatus = 3) AND (ctype = 1))
>
>         ->  Hash  (cost=1403.00..1403.00 rows=178 width=4) (actual
> time=141.004..141.004 rows=6282 loops=1)
>
>               ->  Seq Scan on ficha vf  (cost=0.00..1403.00 rows=178
> width=4) (actual time=0.071..97.885 rows=6282 loops=1)
>
>                     Filter: (((idestado)::text = '3'::text) OR
> ((idestado)::text = '4'::text))
>
> Total runtime: 188.809 ms
>
>
> Thanks in advance,
> Ruben Rubio

- ---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq


- -------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:45349c86275246672479766!
- -------------------------------------------------------








-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNKNeIo1XmbAXRboRAsiDAKCce+BeyYK63r24w2E1QNq/3maMJQCeNpNw
GiwJ/KixMHH76919wQR31g8=
=g/re
-----END PGP SIGNATURE-----

Re: Optimization of this SQL sentence

From
Ruben Rubio
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



If just just realized that is a litlle faster (15% faster) with this:

CREATE INDEX idx_statustype
   ON "comment" USING btree (idstatus, ctype);

Any other ideas?


Gregory S. Williamson escribió:
> Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and
someinfo on what version of postgres you are using. 
>
> Are the tables recently analyzed ? How many rows in them ?
>
> Greg Williamson
> DBA
> GlobeXplorer LLC
>
>
> -----Original Message-----
> From:    pgsql-performance-owner@postgresql.org on behalf of Ruben Rubio
> Sent:    Tue 10/17/2006 2:05 AM
> To:    pgsql-performance@postgresql.org
> Cc:
> Subject:    [PERFORM] Optimization of this SQL sentence
>
> This SQL sentence is very simple. I need to get better results. I have
> tried some posibilities and I didn't get good results.
>
> SELECT max(idcomment)
>  FROM ficha vf
>  INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR
> idestado=4))
>  WHERE idstatus=3
>  AND ctype=1
>
>
> QUERY PLAN
>
> Aggregate  (cost=2730.75..2730.76 rows=1 width=4) (actual
> time=188.463..188.469 rows=1 loops=1)
>
>   ->  Hash Join  (cost=1403.44..2730.72 rows=11 width=4) (actual
> time=141.464..185.404 rows=513 loops=1)
>
>         Hash Cond: ("outer".idfile = "inner".idficha)
>
>         ->  Seq Scan on "comment" c  (cost=0.00..1321.75 rows=1083
> width=8) (actual time=0.291..36.112 rows=642 loops=1)
>
>               Filter: ((idstatus = 3) AND (ctype = 1))
>
>         ->  Hash  (cost=1403.00..1403.00 rows=178 width=4) (actual
> time=141.004..141.004 rows=6282 loops=1)
>
>               ->  Seq Scan on ficha vf  (cost=0.00..1403.00 rows=178
> width=4) (actual time=0.071..97.885 rows=6282 loops=1)
>
>                     Filter: (((idestado)::text = '3'::text) OR
> ((idestado)::text = '4'::text))
>
> Total runtime: 188.809 ms
>
>
> Thanks in advance,
> Ruben Rubio

- ---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq


- -------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:45349c86275246672479766!
- -------------------------------------------------------






- ---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNKT4Io1XmbAXRboRAurtAKC8YWjgzytaqkPjLfrohZ1aceZivwCgpDii
wzxc4fktzIHTZRhPuJLi2Wc=
=Korn
-----END PGP SIGNATURE-----

Re: Optimization of this SQL sentence

From
"A. Kretschmer"
Date:
am  Tue, dem 17.10.2006, um 11:33:18 +0200 mailte Ruben Rubio folgendes:
> >
> > SELECT max(idcomment)
> >  FROM ficha vf
> >  INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR
> > idestado=4))
> >  WHERE idstatus=3
> >  AND ctype=1

check for indexes on vf.idficha, c.idfile, idstatus and ctype.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Optimization of this SQL sentence

From
Alexander Staubo
Date:
On Oct 17, 2006, at 11:33 , Ruben Rubio wrote:

> CREATE TABLE "comment"
> (
>   idcomment int4 NOT NULL DEFAULT
> nextval('comment_idcomment_seq'::regclass),
[snip 28 columns]
>   CONSTRAINT comment_pkey PRIMARY KEY (idcomment)
> )
>
> Ficha structure:
> No indexes in ficha
> Ficha rows: 17.850
>
> CREATE TABLE ficha
> (
>   idficha int4 NOT NULL DEFAULT nextval
> ('ficha_idficha_seq'::regclass),
[snip 67 (!) columns]
>   CONSTRAINT pk_ficha PRIMARY KEY (idficha),
>   CONSTRAINT fk_ficha_geonivel6 FOREIGN KEY (idlocacion) REFERENCES
> geonivel6 (idgeonivel6) ON UPDATE NO ACTION ON DELETE NO ACTION
> )

  These tables are particularly egregious examples of ignorant
database design. You need to understand the relational model (http://
en.wikipedia.org/wiki/Relational_model), specifically data
normalization (http://en.wikipedia.org/wiki/Database_normalization)
and 3NF (http://en.wikipedia.org/wiki/3NF).

These columns are particularly telling:

   searchengine1 int4,
   searchengine2 int4,
   searchengine3 int4,
   searchengine4 int4,
   searchengine5 int4,
   searchengine6 int4,
   deseo1 int4,
   deseo2 int4,
   deseo3 int4,
   deseo4 int4,
   deseo5 int4,
   deseo6 int4,
   titulomapa_l0 varchar(255),
   titulomapa_l1 varchar(255),
   titulomapa_l2 varchar(255),
   titulomapa_l3 varchar(255),
   titulomapa_l4 varchar(255),
   titulomapa_l5 varchar(255),
   titulomapa_l6 varchar(255),
   titulomapa_l7 varchar(255),
   titulomapa_l8 varchar(255),
   titulomapa_l9 varchar(255),

Refactor into three separate tables:

   create table searchengine (
     idficha int references ficha (idficha),
     searchengine int,
     primary key (idficha, searchengine)
   );

   create table deseo (
     idficha int references ficha (idficha),
     deseo int,
     primary key (idficha, deseo)
   );

   create table titulomapa (
     idficha int references ficha (idficha),
     titulomapa int,
     primary key (idficha, titulomapa)
   );

Now you can find all search engines for a single ficha row:

   select searchengine from searchengine where idficha = n

This design allows for more than 5 search engines per ficha row, and
allows expressive joins such as:

   select ficha.idficha, searchengine.searchengine
   inner join searchengine on searchengine.idfciha = ficha.idficha

Also, most of your columns are nullable. This alone shows that you
don't understand your own data.

Lastly, note that in PostgreSQL these length declarations are not
necessary:

   contacto varchar(255),
   fuente varchar(512),
   prefijopais varchar(10)

Instead, use:

   contacto text,
   fuente text,
   prefijopais text

See the PostgreSQL manual for an explanation of varchar vs. text.

Alexander.


Re: Optimization of this SQL sentence

From
"Heikki Linnakangas"
Date:
You could try rewriting the query like this:

SELECT MAX(idcomment)
FROM comment c
WHERE idstatus=3 AND ctype=1
AND EXISTS (SELECT 1 FROM ficha vf WHERE idestado IN ('3', '4') AND
vf.idficha = c.idfile);

The planner can then try a backward scan on the comment_pkey index,
which should be quicker than the seq scan assuming that there's a lot of
rows that match the restrictions (idstatus=3 ctype=1 idestado IN ('3',
'4')).

But see comments inline below:

Ruben Rubio wrote:
> CREATE TABLE "comment"
> (
>   idcomment int4 NOT NULL DEFAULT
> nextval('comment_idcomment_seq'::regclass),
>   score int4,
>   title varchar,
>   ctext varchar,
>   idusuarioficha int4,
>   galleta varchar,
>   navlang int4,
>   cdate timestamp DEFAULT now(),
>   idstatus int4,
>   ctype int4 NOT NULL,
>   idfile int4 NOT NULL,
>   nick varchar,
>   nombre varchar,
>   apellidos varchar,
>   dni varchar,
>   nacionalidad varchar,
>   email varchar,
>   telefono varchar,
>   code varchar,
>   memo varchar,
>   c_ip varchar(30),
>   codpais char(2),
>   replay varchar,
>   replaydate timestamp,
>   advsent int4,
>   usrwarn int4,
>   nouserlink int4,
>   aviso_confirmacion_15 timestamp,
>   aviso_confirmacion_60 timestamp,
>   CONSTRAINT comment_pkey PRIMARY KEY (idcomment)
> )

Without knowing anything about you're application, it looks like there's
a some fields in the comment-table that are duplicates of fields in the
ficha-table. Telefono and email for example. You should consider doing
some normalization.

> No indexes in ficha

Except for the implicit idficha_pkey index.

> CREATE TABLE ficha
> (
 > ...
 >   idestado char(1),

If idestado contains numbers (codes of some kind, I presume), you're
better off using the smallint data type.

 > ....
>   searchengine1 int4,
>   searchengine2 int4,
>   searchengine3 int4,
>   searchengine4 int4,
>   searchengine5 int4,
>   searchengine6 int4,

Normalization?!

>   deseo1 int4,
>   deseo2 int4,
>   deseo3 int4,
>   deseo4 int4,
>   deseo5 int4,
>   deseo6 int4,

For these as well...

 > ...
>   lat varchar(25),
>   long varchar(25),

Isn't there's a better data type for latitude and longitude? Decimal,
perhaps?

>   titulomapa_l0 varchar(255),
>   titulomapa_l1 varchar(255),
>   titulomapa_l2 varchar(255),
>   titulomapa_l3 varchar(255),
>   titulomapa_l4 varchar(255),
>   titulomapa_l5 varchar(255),
>   titulomapa_l6 varchar(255),
>   titulomapa_l7 varchar(255),
>   titulomapa_l8 varchar(255),
>   titulomapa_l9 varchar(255),

Again, normalization...

- Heikki

Re: Optimization of this SQL sentence (SOLVED)

From
Ruben Rubio
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi to everyone,

First of all I have to say that I now the database is not ok. There was
a people before me that didn't do the thinks right. I would like to
normalize the database, but it takes too much time (there is is hundred
of SQLs to change and there is not enough technical resources). Also,
datacolumns in some places has same names, but the data that is stores
has different usages.

Thanks everyone for all hints, I ll try to do my best performing the
database structure.

By other hand, I was able to create the corrects index and with this

AND EXISTS (SELECT 1 FROM ficha vf WHERE idestado IN ('3', '4') AND
vf.idficha = c.idfile);

it is really fast.

Thanks to everybody.

Regards,
Ruben Rubio



Heikki Linnakangas escribió:
> You could try rewriting the query like this:
>
> SELECT MAX(idcomment)
> FROM comment c
> WHERE idstatus=3 AND ctype=1
> AND EXISTS (SELECT 1 FROM ficha vf WHERE idestado IN ('3', '4') AND
> vf.idficha = c.idfile);
>
> The planner can then try a backward scan on the comment_pkey index,
> which should be quicker than the seq scan assuming that there's a lot of
> rows that match the restrictions (idstatus=3 ctype=1 idestado IN ('3',
> '4')).
>
> But see comments inline below:
>
> Ruben Rubio wrote:
>> CREATE TABLE "comment"
>> (
>>   idcomment int4 NOT NULL DEFAULT
>> nextval('comment_idcomment_seq'::regclass),
>>   score int4,
>>   title varchar,
>>   ctext varchar,
>>   idusuarioficha int4,
>>   galleta varchar,
>>   navlang int4,
>>   cdate timestamp DEFAULT now(),
>>   idstatus int4,
>>   ctype int4 NOT NULL,
>>   idfile int4 NOT NULL,
>>   nick varchar,
>>   nombre varchar,
>>   apellidos varchar,
>>   dni varchar,
>>   nacionalidad varchar,
>>   email varchar,
>>   telefono varchar,
>>   code varchar,
>>   memo varchar,
>>   c_ip varchar(30),
>>   codpais char(2),
>>   replay varchar,
>>   replaydate timestamp,
>>   advsent int4,
>>   usrwarn int4,
>>   nouserlink int4,
>>   aviso_confirmacion_15 timestamp,
>>   aviso_confirmacion_60 timestamp,
>>   CONSTRAINT comment_pkey PRIMARY KEY (idcomment)
>> )
>
> Without knowing anything about you're application, it looks like there's
> a some fields in the comment-table that are duplicates of fields in the
> ficha-table. Telefono and email for example. You should consider doing
> some normalization.
>
>> No indexes in ficha
>
> Except for the implicit idficha_pkey index.
>
>> CREATE TABLE ficha
>> (
>> ...
>>   idestado char(1),
>
> If idestado contains numbers (codes of some kind, I presume), you're
> better off using the smallint data type.
>
>> ....
>>   searchengine1 int4,
>>   searchengine2 int4,
>>   searchengine3 int4,
>>   searchengine4 int4,
>>   searchengine5 int4,
>>   searchengine6 int4,
>
> Normalization?!
>
>>   deseo1 int4,
>>   deseo2 int4,
>>   deseo3 int4,
>>   deseo4 int4,
>>   deseo5 int4,
>>   deseo6 int4,
>
> For these as well...
>
>> ...
>>   lat varchar(25),
>>   long varchar(25),
>
> Isn't there's a better data type for latitude and longitude? Decimal,
> perhaps?
>
>>   titulomapa_l0 varchar(255),
>>   titulomapa_l1 varchar(255),
>>   titulomapa_l2 varchar(255),
>>   titulomapa_l3 varchar(255),
>>   titulomapa_l4 varchar(255),
>>   titulomapa_l5 varchar(255),
>>   titulomapa_l6 varchar(255),
>>   titulomapa_l7 varchar(255),
>>   titulomapa_l8 varchar(255),
>>   titulomapa_l9 varchar(255),
>
> Again, normalization...
>
> - Heikki
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNK+jIo1XmbAXRboRAu6cAKCMUWHjcAYwN4DhVl1tSjMirgRAawCgvk8c
gSB/4p1ZBOrDEwU9EW/yxw8=
=yFoD
-----END PGP SIGNATURE-----

Re: Optimization of this SQL sentence

From
"Craig A. James"
Date:
>  These tables are particularly egregious examples of ignorant database
> design. You need to understand the relational model

This email is a *particularly* egregious example of rudeness.  You owe Mr. Staubo, and the Postgress community, an
apology.

There is absolutely no reason to insult people who come to this forum for help.  That's why the forum is here, to help
peoplewho are "ignorant" and want to improve their knowledge. 

Craig


Re: Optimization of this SQL sentence

From
Alexander Staubo
Date:
On Oct 17, 2006, at 17:10 , Craig A. James wrote:

>>  These tables are particularly egregious examples of ignorant
>> database design. You need to understand the relational model
>
> This email is a *particularly* egregious example of rudeness.  You
> owe Mr. Staubo, and the Postgress community, an apology.

I'm sorry you feel that way, but I don't think I was out of line. I
did point to several informative sources of documentation, and
described some of the problems (but by no means all) with the
person's schema and how to solve them. If you think the database
design in question is *not* ignorant database design, please do
explain why, but on technical grounds. (Ignorance, of course, is not
a sin.)

Alexander.

Re: Optimization of this SQL sentence

From
Alexander Staubo
Date:
On Oct 17, 2006, at 17:29 , Mario Weilguni wrote:

> Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:
>> Lastly, note that in PostgreSQL these length declarations are not
>> necessary:
>>
>>    contacto varchar(255),
>>    fuente varchar(512),
>>    prefijopais varchar(10)
>
> Enforcing length constraints with varchar(xyz) is good database
> design, not a
> bad one. Using text everywhere might be tempting because it works,
> but it's
> not a good idea.

Enforcing length constraints is generally a bad idea because it
assumes you know the data domain as expressed in a quantity of
characters. Off the top of your head, do you know the maximum length
of a zip code? A street address? The name of a city?

In almost all cases the limit you invent is arbitrary, and the
probability of being incompatible with any given input is inversely
proportional to that arbitrary limit.

Encoding specific length constraints in the database makes sense when
they relate explicitly to business logic, but I can think of only a
few cases where it would make sense: restricting the length of
passwords, user names, and so on. In a few cases you do know with
100% certainty the limit of your field, such as with standardized
abbreviations: ISO 3166 country codes, for example. And sometimes you
want to cap data due to storage or transmission costs.

The length constraint on text fields is primarily a historical
artifact stemming from the way databases have traditionally been
implemented, as fixed-length fields in fixed-length row structures.
The inexplicable, improbable space-padded (!) "character" data type
in ANSI SQL is a vestige of this legacy. PostgreSQL's variable-length
rows and TOAST mechanism makes the point moot.

Quoth the PostgreSQL manual, section 8.3:

> There are no performance differences between these three types,
> apart from the increased storage size when using the blank-padded
> type. While character(n) has performance advantages in some other
> database systems, it has no such advantages in PostgreSQL. In most
> situations text or character varying should be used instead.

Alexander.

Re: Optimization of this SQL sentence

From
Mario Weilguni
Date:
Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:
> Lastly, note that in PostgreSQL these length declarations are not  
> necessary:
>
>    contacto varchar(255),
>    fuente varchar(512),
>    prefijopais varchar(10)
>
> Instead, use:
>
>    contacto text,
>    fuente text,
>    prefijopais text
>
> See the PostgreSQL manual for an explanation of varchar vs. text.

Enforcing length constraints with varchar(xyz) is good database design, not a
bad one. Using text everywhere might be tempting because it works, but it's
not a good idea.

Re: Optimization of this SQL sentence

From
"Merlin Moncure"
Date:
On 10/17/06, Mario Weilguni <mweilguni@sime.com> wrote:
> Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:
> > Lastly, note that in PostgreSQL these length declarations are not
> > necessary:
> >
> > contacto varchar(255),
> > fuente varchar(512),
> > prefijopais varchar(10)
> >
> > Instead, use:
> >
> > contacto text,
> > fuente text,
> > prefijopais text
> >
> > See the PostgreSQL manual for an explanation of varchar vs. text.
>
> Enforcing length constraints with varchar(xyz) is good database design, not a
> bad one. Using text everywhere might be tempting because it works, but it's
> not a good idea.

while you are correct, i think the spirit of the argument is wrong
becuase there is no constraint to be enforced in those fields.  a
length constraint of n is only valid is n + 1 characters are an error
and should be rejected by the database.  anything else is IMO bad
form.  There are practial exceptions to this rule though, for example
client technology that might require a length.

so, imo alexander is correct:
contacto varchar(255)

...is a false constraint, why exactly 255? is that were the dart landed?

specifically limiting text fields so users 'don't enter too much data'
is a manifestation c programmer's disease :)

note I am not picking on the OP here, just weighing in on the
constraint argument.

merlin

Re: Optimization of this SQL sentence

From
Chris Browne
Date:
mmoncure@gmail.com ("Merlin Moncure") writes:
> On 10/17/06, Mario Weilguni <mweilguni@sime.com> wrote:
>> Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:
>> > Lastly, note that in PostgreSQL these length declarations are not
>> > necessary:
>> >
>> > contacto varchar(255),
>> > fuente varchar(512),
>> > prefijopais varchar(10)
>> >
>> > Instead, use:
>> >
>> > contacto text,
>> > fuente text,
>> > prefijopais text
>> >
>> > See the PostgreSQL manual for an explanation of varchar vs. text.
>>
>> Enforcing length constraints with varchar(xyz) is good database design, not a
>> bad one. Using text everywhere might be tempting because it works, but it's
>> not a good idea.
>
> while you are correct, i think the spirit of the argument is wrong
> becuase there is no constraint to be enforced in those fields.  a
> length constraint of n is only valid is n + 1 characters are an error
> and should be rejected by the database.  anything else is IMO bad
> form.  There are practial exceptions to this rule though, for example
> client technology that might require a length.
>
> so, imo alexander is correct:
> contacto varchar(255)
>
> ...is a false constraint, why exactly 255? is that were the dart landed?

Yeah, 255 seems silly to me.

If I'm going to be arbitrary, there are two better choices:

1.  80, because that's how many characters one can fit across a piece
    of paper whilst keeping things pretty readable;

2.  64, because that will fit on a screen, and leave some space for a
    field name/description.

> specifically limiting text fields so users 'don't enter too much
> data' is a manifestation c programmer's disease :)

No, I can't agree.  I'm pretty accustomed to languages that don't
pinch you the ways C does, and I still dislike having over-wide
columns because it makes it more difficult to generate readable
reports.
--
output = ("cbbrowne" "@" "linuxfinances.info")
http://linuxdatabases.info/info/unix.html
"Instant coffee is like pouring hot water over the cremated remains of
a good friend."

Re: Optimization of this SQL sentence

From
Chris Browne
Date:
alex@purefiction.net (Alexander Staubo) writes:

> On Oct 17, 2006, at 17:29 , Mario Weilguni wrote:
>
>> Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:
>>> Lastly, note that in PostgreSQL these length declarations are not
>>> necessary:
>>>
>>>    contacto varchar(255),
>>>    fuente varchar(512),
>>>    prefijopais varchar(10)
>>
>> Enforcing length constraints with varchar(xyz) is good database
>> design, not a
>> bad one. Using text everywhere might be tempting because it works,
>> but it's
>> not a good idea.
>
> Enforcing length constraints is generally a bad idea because it
> assumes you know the data domain as expressed in a quantity of
> characters. Off the top of your head, do you know the maximum length
> of a zip code? A street address? The name of a city?

In the case of a zip code?  Sure.  US zip codes are integer values
either 5 or 9 characters long.

In the case of some of our internal applications, we need to conform
to some IETF and ITU standards which actually do enforce some maximum
lengths on these sorts of things.

> In almost all cases the limit you invent is arbitrary, and the
> probability of being incompatible with any given input is inversely
> proportional to that arbitrary limit.

I'd be quite inclined to limit things like addresses to somewhat
smaller sizes than you might expect.  If addresses are to be used to
generate labels for envelopes, for instance, it's reasonably important
to limit sizes to those that might fit on a label or an envelope.

> Encoding specific length constraints in the database makes sense
> when they relate explicitly to business logic, but I can think of
> only a few cases where it would make sense: restricting the length
> of passwords, user names, and so on. In a few cases you do know with
> 100% certainty the limit of your field, such as with standardized
> abbreviations: ISO 3166 country codes, for example. And sometimes
> you want to cap data due to storage or transmission costs.

There's another reason: Open things up wide, and some people will fill
the space with rubbish.
--
"cbbrowne","@","acm.org"
http://linuxfinances.info/info/internet.html
"The Amiga  is proof that  if you build  a better mousetrap,  the rats
will gang up on you."  -- Bill Roberts bill.roberts@ensco.com

Re: Optimization of this SQL sentence

From
Shane Ambler
Date:
Chris Browne wrote:
> In the case of a zip code?  Sure.  US zip codes are integer values
> either 5 or 9 characters long.

So your app will only work in the US?
And only for US companies that only have US clients?


Sorry had to dig at that ;-P


--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

Re: Optimization of this SQL sentence

From
Christopher Browne
Date:
The world rejoiced as pgsql@007Marketing.com (Shane Ambler) wrote:
> Chris Browne wrote:
>> In the case of a zip code?  Sure.  US zip codes are integer values
>> either 5 or 9 characters long.
>
> So your app will only work in the US?
> And only for US companies that only have US clients?
>
>
> Sorry had to dig at that ;-P

Heh.  I'm not in the US, so that's not the sort of mistake I'd be
likely to make...

The thing is, the only place where they call this sort of thing a "zip
code" is the US.  Elsewhere, it's called a postal code.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://linuxfinances.info/info/finances.html
Rules of the  Evil Overlord #159. "If I  burst into rebel headquarters
and find  it deserted except for  an odd, blinking device,  I will not
walk up and investigate; I'll run like hell."
<http://www.eviloverlord.com/>

Re: Optimization of this SQL sentence

From
Shane Ambler
Date:
Christopher Browne wrote:
> The world rejoiced as pgsql@007Marketing.com (Shane Ambler) wrote:
>> Chris Browne wrote:
>>> In the case of a zip code?  Sure.  US zip codes are integer values
>>> either 5 or 9 characters long.
>> So your app will only work in the US?
>> And only for US companies that only have US clients?
>>
>>
>> Sorry had to dig at that ;-P
>
> Heh.  I'm not in the US, so that's not the sort of mistake I'd be
> likely to make...
>
> The thing is, the only place where they call this sort of thing a "zip
> code" is the US.  Elsewhere, it's called a postal code.

Same meaning/use different name (that's a locale issue for the client
displaying the data) they will all use the same column for that data.


--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

Re: Optimization of this SQL sentence

From
"Craig A. James"
Date:
Alexander Staubo wrote:
> On Oct 17, 2006, at 17:10 , Craig A. James wrote:
>
>>>  These tables are particularly egregious examples of ignorant
>>> database design. You need to understand the relational model
>>
>> This email is a *particularly* egregious example of rudeness.  You owe
>> Mr. Staubo, and the Postgress community, an apology.
>
> I'm sorry you feel that way, but I don't think I was out of line.
> ... If you think the database design in question is *not*
> ignorant database design, please do explain why, but on technical
> grounds. (Ignorance, of course, is not a sin.)

This is not about design.  It's about someone who came for help, and got a derogatory remark.  Is it really so hard to
behelpful *and* use polite, encouraging language? 

Craig



Re: Optimization of this SQL sentence

From
Ruben Rubio
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1




>> so, imo alexander is correct:
>> contacto varchar(255)

Why do we have limits on this, for example?
contacto varchar(255)

1) First of all, this is a web application. People use to enter really
strange thinks there, and a lot of rubbish. So, as someone commented
before, I am ok with the idea of limit the field.

2) Again, this is a web application. We could just limit the "field
front end length" but this will be not secure. We could limit the field
front end, and limit it with the code that process the data, but is much
secure if we just limit field in database.

Why 255?
This is free field. We are interested in users enter as much data as
needed (with a limit, we no not want "The Quijote" in that field)

People use to imput as spected:
"Contact: Baltolomé Peralez."
But people also inserts thinks as:
"Contact: In the mornings, Bartolomé Perales, in the afternoons Juan
Perales in the same telephone number"

In the context of the application, we are not interested in stopping the
user with a message "Hey, Your contact is too long". We want the user to
go on. That's why 255. We could insert 260. Or maybe 250. But someone
decided 255 and for me its OK.

Please remember that I just put some data of a large applications, and
there is thinks there that has sense in context.




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNdJsIo1XmbAXRboRAuGVAKCupfXOHwxXOPHFdq+K6S0lXWNZUwCgml1i
CS0eEJcQndEJb7h7Nsfh1CM=
=0gpW
-----END PGP SIGNATURE-----

Re: Optimization of this SQL sentence

From
Mario Weilguni
Date:
Am Dienstag, 17. Oktober 2006 17:50 schrieb Alexander Staubo:
> On Oct 17, 2006, at 17:29 , Mario Weilguni wrote:
> >
> > Enforcing length constraints with varchar(xyz) is good database
> > design, not a
> > bad one. Using text everywhere might be tempting because it works,
> > but it's
> > not a good idea.
>
> Enforcing length constraints is generally a bad idea because it
> assumes you know the data domain as expressed in a quantity of
> characters. Off the top of your head, do you know the maximum length
> of a zip code? A street address? The name of a city?

It's not a bad idea. Usually I use postal codes with 25 chars, and never had
any problem. With text, the limit would be ~1 GB. No matter how much testing
in the application happens, the varchar(25) as last resort is a good idea.

And in most cases, the application itself limits the length, and thus it's
good to reflect this in the database design.

Feel free to use text anywhere for your application, and feel free to use
numeric(1000) instead of numeric(4) if you want to be prepared for really
long numbers, but don't tell other people it's bad database design - it
isn't.






Re: Optimization of this SQL sentence

From
Mark Kirkwood
Date:
Mario Weilguni wrote:

>>
>> Â  Â contacto varchar(255),
>> Â  Â fuente varchar(512),
>> Â  Â prefijopais varchar(10)
>>
>> Instead, use:
>>
>> Â  Â contacto text,
>> Â  Â fuente text,
>> Â  Â prefijopais text
>>
>> See the PostgreSQL manual for an explanation of varchar vs. text.
>
> Enforcing length constraints with varchar(xyz) is good database design, not a
> bad one. Using text everywhere might be tempting because it works, but it's
> not a good idea.
>

I've always used the rationale:

If you *know* that the data is length constrained, then it is ok to
reflect this in the domain you use - err, thats why they have length
limits! e.g. if you know that 'prefijopais' can *never* be > 10 chars in
length, then varchar(10) is a good choice.

If the data length is unknown or known to be unlimited, then reflect
that in the domain you use - e.g if 'fuente' or 'contacto' have no
reason to be constrained, then just use text.

best wishes

Mark


Re: Optimization of this SQL sentence

From
mark@mark.mielke.cc
Date:
On Wed, Oct 18, 2006 at 11:31:44AM +0200, Mario Weilguni wrote:
> It's not a bad idea. Usually I use postal codes with 25 chars, and never had
> any problem. With text, the limit would be ~1 GB. No matter how much testing
> in the application happens, the varchar(25) as last resort is a good idea.

> And in most cases, the application itself limits the length, and thus it's
> good to reflect this in the database design.

> Feel free to use text anywhere for your application, and feel free to use
> numeric(1000) instead of numeric(4) if you want to be prepared for really
> long numbers, but don't tell other people it's bad database design - it
> isn't.

It's unnecessary design.

Suggestions in this regard lead towards the user seeing a database error,
instead of a nice specific message provided by the application.

I used to use varchar instead of text, but have since softened, as the
number of times it has ever actually saved me is zero, and the number of
times it has screwed me up (picking too small of a limit too early) has
been a few.

It's kind of like pre-optimization before there is a problem. Sometimes
it works for you, sometimes it works against.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Optimization of this SQL sentence

From
"Jim C. Nasby"
Date:
On Tue, Oct 17, 2006 at 12:51:19PM -0400, Merlin Moncure wrote:
> so, imo alexander is correct:
> contacto varchar(255)
>
> ...is a false constraint, why exactly 255? is that were the dart landed?

BTW, if we get variable-length varlena headers at some point, then
setting certain limits might make sense to keep performance more
consistent.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Optimization of this SQL sentence (SOLVED)

From
"Jim C. Nasby"
Date:
On Tue, Oct 17, 2006 at 12:25:39PM +0200, Ruben Rubio wrote:
> First of all I have to say that I now the database is not ok. There was
> a people before me that didn't do the thinks right. I would like to
> normalize the database, but it takes too much time (there is is hundred
> of SQLs to change and there is not enough technical resources). Also,
> datacolumns in some places has same names, but the data that is stores
> has different usages.

FWIW, things like views and rules make those transations a lot easier to
tackle.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Optimization of this SQL sentence

From
"Merlin Moncure"
Date:
On 10/18/06, Jim C. Nasby <jim@nasby.net> wrote:
> On Tue, Oct 17, 2006 at 12:51:19PM -0400, Merlin Moncure wrote:
> > so, imo alexander is correct:
> > contacto varchar(255)
> >
> > ...is a false constraint, why exactly 255? is that were the dart landed?
>
> BTW, if we get variable-length varlena headers at some point, then
> setting certain limits might make sense to keep performance more
> consistent.

I would argue that it is assumptions about the underlying architecture
that got everyone into trouble in the first place :).  I would prefer
to treat length constraint as a constraint (n + 1 = error), unless
there was a *compelling* reason to do otherwise, which currently there
isn't (or hasn't been since we got toast)  a  lot of this stuff s due
to legacy thinking, a lot of dbf products had limts to varchar around
255 or so.

imo, a proper constraint system would apply everything at the domain
level, and minlength and maxlength would get equal weight, and be
optional for all types.

merlin