Re: Load a database into memory - Mailing list pgsql-general
From | Guillaume Le'mery |
---|---|
Subject | Re: Load a database into memory |
Date | |
Msg-id | 3A71348B.9020000@comclick.com Whole thread Raw |
In response to | Load a database into memory (Guillaume Lémery <glemery@comclick.com>) |
Responses |
Re: Load a database into memory
(Denis Perchine <dyp@perchine.com>)
Re: Re: Load a database into memory (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
For memory : >> CREATE TABLE accord_editeur >> ( >> id_regie int4 not null, >> num_editeur int4 not null, >> num_site int4 not null, >> num_emplacement int4 not null, >> num_campagne int4 not null, >> num_publicite int4 not null, >> num_periode int4, >> par_id_technologie int4 not null, >> affichage_possible int4 default 0, >> ponderation_calculee int4, >> date_pilotage timestamp NULL, >> id_ct1 int4, >> PRIMARY >> KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicit >> e) ); >> >> (I've got a primary key on multiple fields because of the complexity of >> my database. It's the only way to have unique record.) >> >> CREATE TABLE parametre >> ( >> id_parametre int4 not null primary key, >> id_regie int4 NULL , >> par_id_parametre int4 NULL , >> type INT4 not null, >> valeur_str varchar null, >> valeur_int int4 null, >> valeur_fl float8 null, >> valeur_txt varchar, >> date_pilotage timestamp NULL, >> id_ct1 int4 >> ); >> CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int); >> >> The query : >> SELECT ae.id_regie, >> ae.num_campagne, >> ae.num_publicite, >> ae.ponderation_calculee * random(), >> FROM accord_editeur ae, >> parametre par >> WHERE ae.id_regie=1 >> AND ae.num_editeur = 1494 >> AND ae.num_site = 1 >> AND ae.num_emplacement = 1 >> AND ae.affichage_possible = 1 >> AND ae.par_id_technologie = par.id_parametre >> AND par.type = 10 >> AND par.valeur_int = 1 >> >> And the Explain : >> NOTICE: QUERY PLAN: >> >> Nested Loop (cost=0.00..228.27 rows=1 width=56) >> -> Index Scan using accord_editeur_pkey on accord_editeur ae >> (cost=0.00..225.50 rows=1 width=48) >> -> Index Scan using parametre_tracking_idx on parametre par >> (cost=0.00..2.02 rows=1 width=8) >> >> EXPLAIN > How much tuples do you usually get from this query? About 100. > random() can be quite expensive. Ok, Here is with the change : Query whitout the random() : SELECT ae.id_regie, ae.num_campagne, ae.num_publicite, ae.ponderation_calculee, ae.num_periode FROM accord_editeur ae, parametre par WHERE ae.id_regie=1 AND ae.num_editeur = 1494 AND ae.num_site = 1 AND ae.num_emplacement = 1 AND ae.affichage_possible = 1 AND ae.par_id_technologie = par.id_parametre AND par.type = 10 AND par.valeur_int = 1 Explain : NOTICE: QUERY PLAN: Nested Loop (cost=0.00..228.27 rows=1 width=56) -> Index Scan using parametre_tracking_idx on parametre par (cost=0.00..2.02 rows=1 width=8) -> Index Scan using accord_editeur_pkey on accord_editeur ae (cost=0.00..225.50 rows=60 width=48) EXPLAIN random() is not quite expensive... I'll do it in my C code... > Also please give us output of > vacuum verbose analyze parametre; NOTICE: --Relation parametre-- NOTICE: Pages 3: Changed 0, reaped 0, Empty 0, New 0; Tup 235: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 148; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: Index parametre_pkey: Pages 2; Tuples 235. CPU 0.00s/0.00u sec. NOTICE: Index parametre_tracking_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u sec. NOTICE: Index parametre_idct1_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u sec. VACUUM > vacuum verbose analyze accord_editeur; NOTICE: --Relation accord_editeur-- NOTICE: Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124; Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU 0.21s/1.85u sec. NOTICE: Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted 0. CPU 0.19s/1.01u sec. VACUUM > But it seems that the problem is in your too large index. It is quite > expensive to traverse it (note that all cost flows from index scan over it). > > I do not know the exact statictics for your data, but if you create an index > with small amount of fields, which account vast of the data, query should > became much faster, if optimizer will decide to use this index. > As a first try you can create index on num_editeur. Do not forget to do > vacuum analyze accord_editeur; after this. OK, I dropped the large index and only set one on num_editeur. Vacuum verbose analyze accord_editeur : NOTICE: --Relation accord_editeur-- NOTICE: Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124; Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU 0.33s/1.74u sec. NOTICE: Index accord_editeur_tracking_idx: Pages 1330; Tuples 447032: Deleted 0. CPU 0.05s/0.93u sec. VACUUM Here is the new Explain (always whithout the random()) : NOTICE: QUERY PLAN: Nested Loop (cost=0.00..512.65 rows=1 width=56) -> Index Scan using parametre_tracking_idx on parametre par (cost=0.00..2.02 rows=1 width=8) -> Index Scan using accord_editeur_tracking_idx on accord_editeur ae (cost=0.00..509.88 rows=60 width=48) EXPLAIN So, the large index seems to be better... Or maybe I should try with a hash index ? If I hash the num_editeur in another field and I set my index on these 2 fields, it gives me better cost, but performance is always slow :-( >> In fact it is a CPU-Bound... But why ? >> One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660. >> And Id'like to run more than 100 at the same time... >> >> Maybe if I set hash indices on single fields instead of one index on >> multiple columns ? >> >> Thanks, >> >> Guillaume.
pgsql-general by date: