Re: Load a database into memory - Mailing list pgsql-general

From Guillaume Le'mery
Subject Re: Load a database into memory
Date
Msg-id 3A71348B.9020000@comclick.com
Whole thread Raw
In response to Load a database into memory  (Guillaume Lémery <glemery@comclick.com>)
Responses Re: Load a database into memory  (Denis Perchine <dyp@perchine.com>)
Re: Re: Load a database into memory  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
For memory :

>> CREATE TABLE accord_editeur
>> (
>> id_regie int4 not null,
>> num_editeur int4 not null,
>> num_site int4 not null,
>> num_emplacement int4 not null,
>> num_campagne int4 not null,
>> num_publicite int4 not null,
>> num_periode int4,
>> par_id_technologie int4 not null,
>> affichage_possible int4 default 0,
>> ponderation_calculee int4,
>> date_pilotage timestamp NULL,
>> id_ct1 int4,
>> PRIMARY
>> KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicit
>> e) );
>>
>> (I've got a primary key on multiple fields because of the complexity of
>> my database. It's the only way to have unique record.)
>>
>> CREATE TABLE parametre
>> (
>> id_parametre int4 not null primary key,
>> id_regie int4 NULL ,
>> par_id_parametre int4 NULL ,
>> type INT4 not null,
>> valeur_str varchar null,
>> valeur_int int4 null,
>> valeur_fl float8 null,
>> valeur_txt varchar,
>> date_pilotage timestamp NULL,
>> id_ct1 int4
>> );
>> CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);
>>
>> The query :
>> SELECT ae.id_regie,
>> ae.num_campagne,
>> ae.num_publicite,
>> ae.ponderation_calculee * random(),
 >> FROM accord_editeur ae,
 >> parametre par
 >> WHERE ae.id_regie=1
 >> AND ae.num_editeur = 1494
 >> AND ae.num_site = 1
 >> AND ae.num_emplacement = 1
 >> AND ae.affichage_possible = 1
 >> AND ae.par_id_technologie = par.id_parametre
 >> AND par.type = 10
 >> AND par.valeur_int = 1
 >>
 >> And the Explain :
 >> NOTICE:  QUERY PLAN:
 >>
 >> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
 >> -> Index Scan using accord_editeur_pkey on accord_editeur ae
 >> (cost=0.00..225.50 rows=1 width=48)
 >> -> Index Scan using parametre_tracking_idx on parametre par
 >> (cost=0.00..2.02 rows=1 width=8)
 >>
 >> EXPLAIN

> How much tuples do you usually get from this query?
About 100.

> random() can be quite expensive.
Ok, Here is with the change :

Query whitout the random() :
SELECT ae.id_regie,
ae.num_campagne,
ae.num_publicite,
ae.ponderation_calculee,
ae.num_periode
FROM accord_editeur ae,
parametre par
WHERE ae.id_regie=1
AND ae.num_editeur = 1494
AND ae.num_site = 1
AND ae.num_emplacement = 1
AND ae.affichage_possible = 1
AND ae.par_id_technologie = par.id_parametre
AND par.type = 10
AND par.valeur_int = 1

Explain :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..228.27 rows=1 width=56)
   ->  Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
   ->  Index Scan using accord_editeur_pkey on accord_editeur ae
(cost=0.00..225.50 rows=60 width=48)

EXPLAIN

random() is not quite expensive... I'll do it in my C code...


> Also please give us output of
> vacuum verbose analyze parametre;
NOTICE:  --Relation parametre--
NOTICE:  Pages 3: Changed 0, reaped 0, Empty 0, New 0; Tup 235: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 148; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index parametre_pkey: Pages 2; Tuples 235. CPU 0.00s/0.00u sec.
NOTICE:  Index parametre_tracking_idx: Pages 2; Tuples 235. CPU
0.00s/0.00u sec.
NOTICE:  Index parametre_idct1_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u
sec.
VACUUM

> vacuum verbose analyze accord_editeur;
NOTICE:  --Relation accord_editeur--
NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.21s/1.85u sec.
NOTICE:  Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted
0. CPU 0.19s/1.01u sec.
VACUUM

> But it seems that the problem is in your too large index. It is quite
> expensive to traverse it (note that all cost flows from index scan over it).
>
> I do not know the exact statictics for your data, but if you create an index
> with small amount of fields, which account vast of the data, query should
> became much faster, if optimizer will decide to use this index.
> As a first try you can create index on num_editeur. Do not forget to do
> vacuum analyze accord_editeur; after this.

OK, I dropped the large index and only set one on num_editeur.

Vacuum verbose analyze accord_editeur :
NOTICE:  --Relation accord_editeur--
NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.33s/1.74u sec.
NOTICE:  Index accord_editeur_tracking_idx: Pages 1330; Tuples 447032:
Deleted 0. CPU 0.05s/0.93u sec.
VACUUM

Here is the new Explain (always whithout the random()) :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..512.65 rows=1 width=56)
   ->  Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
   ->  Index Scan using accord_editeur_tracking_idx on accord_editeur ae
  (cost=0.00..509.88 rows=60 width=48)

EXPLAIN

So, the large index seems to be better...
Or maybe I should try with a hash index ?

If I hash the num_editeur in another field and I set my index on these 2
fields, it gives me better cost, but performance is always slow :-(


>> In fact it is a CPU-Bound... But why ?
>> One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660.
>> And Id'like to run more than 100 at the same time...
>>
>> Maybe if I set hash indices on single fields instead of one index on
>> multiple columns ?
>>
>> Thanks,
>>
>> Guillaume.


pgsql-general by date:

Previous
From: "Dr R.Adscheid"
Date:
Subject: vacuum
Next
From: Denis Perchine
Date:
Subject: Re: Load a database into memory