Thread: Load a database into memory

Load a database into memory

From

Guillaume Lémery

Date:

25 January 2001, 12:22:11

Hi,

To increase the performance of my database, I tried to tune options.
So I disabled fsync(), increased the shared memory on my Linux AND on
PostGreSQL.
But I'd like to know if it is possible to load the database into memory
or maybe is there other options that can change the memory use ?

What else should I know about memory with PostGreSQL ?

I'm open to any suggestion,

Many thanks...

Guillaume.

Re: Load a database into memory

From

"Steve Wolfe"

Date:

25 January 2001, 12:44:23

> To increase the performance of my database, I tried to tune options.
> So I disabled fsync(), increased the shared memory on my Linux AND on
> PostGreSQL.
> But I'd like to know if it is possible to load the database into memory
> or maybe is there other options that can change the memory use ?

  If you have enough RAM, the database will already be in memory, in a
manner of speaking - your kernel will have all of the files held in disk
cache.

> What else should I know about memory with PostGreSQL ?

   You can increase the shared memory, and you can increase the amount of
memory used for sorts/joins, but I haven't run across much more than that.
Postgres is just a very memory-efficient piece of work.  I laugh when I hear
of people using MS's database, and adding gigs and gigs of RAM for even
modest databases to perform well.  Our server has a half of a gig, and I
just can't get Postgres to use it all.  I guess I better come up with a few
more large tables to join to. : )

steve

Re: Load a database into memory

From

Guillaume Lémery

Date:

25 January 2001, 12:54:56

>   If you have enough RAM, the database will already be in memory, in a
> manner of speaking - your kernel will have all of the files held in disk
> cache.

Ok, but if so, why 10 simultaneous same queries on a well-indexed table with only 500 000 records take a so long time ?
:-/

>
>> What else should I know about memory with PostGreSQL ?
>
>
>    You can increase the shared memory, and you can increase the amount of
> memory used for sorts/joins, but I haven't run across much more than that.
> Postgres is just a very memory-efficient piece of work.  I laugh when I hear
> of people using MS's database, and adding gigs and gigs of RAM for even
> modest databases to perform well.  Our server has a half of a gig, and I
> just can't get Postgres to use it all.  I guess I better come up with a few
> more large tables to join to. : )

My server has 1G RAM and yes, I still have 800MB free when PG is running...
But I tried to avoid join or sorts in my queries, so I don't think I
need to tune the amount of memory used for sorts/joins...

Guillaume.

Re: Re: Load a database into memory

From

Denis Perchine

Date:

25 January 2001, 13:15:29

> >   If you have enough RAM, the database will already be in memory, in a
> > manner of speaking - your kernel will have all of the files held in disk
> > cache.
>
> Ok, but if so, why 10 simultaneous same queries on a well-indexed table
> with only 500 000 records take a so long time ? :-/

Which queries? Could you please provide explain of them? Could you please
provide execution stats for them under load?

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Re: Load a database into memory

From

Poul Laust Christiansen

Date:

25 January 2001, 13:38:44

You need to specify the memory usage when you start the postmaster.

My /etc/rc.d/init.d/postgresql script has this line:
 su -l postgres -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster -o
'-B 4096 -i' start >/dev/null 2>&1"

The -B option tells postgres to use 8KBx4096= 32MB pr. postmaster. The
default value is somewhere around 1MB.

I also have this line in my start script:
echo 104857600 >/proc/sys/kernel/shmmax

Which defines the maximum amount of shared memory.

HTH,
Poul L. Christiansen

On Thu, 25 Jan 2001, Guillaume Lémery wrote:

> >   If you have enough RAM, the database will already be in memory, in a
> > manner of speaking - your kernel will have all of the files held in disk
> > cache.
>
> Ok, but if so, why 10 simultaneous same queries on a well-indexed table with only 500 000 records take a so long time
?:-/ 
>
> >
> >> What else should I know about memory with PostGreSQL ?
> >
> >
> >    You can increase the shared memory, and you can increase the amount of
> > memory used for sorts/joins, but I haven't run across much more than that.
> > Postgres is just a very memory-efficient piece of work.  I laugh when I hear
> > of people using MS's database, and adding gigs and gigs of RAM for even
> > modest databases to perform well.  Our server has a half of a gig, and I
> > just can't get Postgres to use it all.  I guess I better come up with a few
> > more large tables to join to. : )
>
> My server has 1G RAM and yes, I still have 800MB free when PG is running...
> But I tried to avoid join or sorts in my queries, so I don't think I
> need to tune the amount of memory used for sorts/joins...
>
> Guillaume.
>
>

Re: Re: Load a database into memory

From

"Mitch Vincent"

Date:

25 January 2001, 13:46:04

> My /etc/rc.d/init.d/postgresql script has this line:
>  su -l postgres -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster -o
> '-B 4096 -i' start >/dev/null 2>&1"

Isn't that BLCKSZx4096 ? I'm almost sure it is..

Some people (me) have BLCKSZ set to 32k so .... Just an FYI.

-Mitch

Re: Re: Load a database into memory

From

Guillaume Lémery

Date:

26 January 2001, 02:24:11


Steve Wolfe wrote:

>>>   If you have enough RAM, the database will already be in memory, in a
>>> manner of speaking - your kernel will have all of the files held in disk
>>> cache.
>>
>> Ok, but if so, why 10 simultaneous same queries on a well-indexed table
>
> with only 500 000 records take a so long time ? :-/
>
>   Good question.  My first guess is that it's CPU-bound, or that the indexes
> aren't working for some reason.  We'd need to see the table structure, and
> the query.  And a "vacuum analyze" never hurts. : )

Here are the tables :
CREATE TABLE accord_editeur
(
id_regie int4 not null,
num_editeur int4 not null,
num_site int4 not null,
num_emplacement int4 not null,
num_campagne int4 not null,
num_publicite int4 not null,
num_periode int4,
par_id_technologie int4 not null,
affichage_possible int4 default 0,
ponderation_calculee int4,
date_pilotage timestamp NULL,
id_ct1 int4,
PRIMARY
KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicite)
);

(I've got a primary key on multiple fields because of the complexity of
my database. It's the only way to have unique record.)

CREATE TABLE parametre
(
id_parametre int4 not null primary key,
id_regie int4 NULL ,
par_id_parametre int4 NULL ,
type INT4 not null,
valeur_str varchar null,
valeur_int int4 null,
valeur_fl float8 null,
valeur_txt varchar,
date_pilotage timestamp NULL,
id_ct1 int4
);
CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);

The query :
SELECT ae.id_regie,
ae.num_campagne,
ae.num_publicite,
ae.ponderation_calculee * random(),
ae.num_periode
FROM accord_editeur ae,
parametre par
WHERE ae.id_regie=1
AND ae.num_editeur = 1494
AND ae.num_site = 1
AND ae.num_emplacement = 1
AND ae.affichage_possible = 1
AND ae.par_id_technologie = par.id_parametre
AND par.type = 10
AND par.valeur_int = 1

And the Explain :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..15422.73 rows=1 width=56)
   ->  Index Scan using accord_editeur_pkey on accord_editeur ae
(cost=0.00..15420.71 rows=1 width=48)
   ->  Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)

EXPLAIN


In fact it is a CPU-Bound... But why ?
One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660.
And Id'like to run more than 100 at the same time...

Maybe if I set hash indices on single fields instead of one index on
multiple columns ?

Thanks,

Guillaume.

Re: Re: Load a database into memory

From

Denis Perchine

Date:

26 January 2001, 02:49:40

> >   Good question.  My first guess is that it's CPU-bound, or that the
> > indexes aren't working for some reason.  We'd need to see the table
> > structure, and the query.  And a "vacuum analyze" never hurts. : )
>
> Here are the tables :
> CREATE TABLE accord_editeur
> (
> id_regie int4 not null,
> num_editeur int4 not null,
> num_site int4 not null,
> num_emplacement int4 not null,
> num_campagne int4 not null,
> num_publicite int4 not null,
> num_periode int4,
> par_id_technologie int4 not null,
> affichage_possible int4 default 0,
> ponderation_calculee int4,
> date_pilotage timestamp NULL,
> id_ct1 int4,
> PRIMARY
> KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicit
>e) );
>
> (I've got a primary key on multiple fields because of the complexity of
> my database. It's the only way to have unique record.)
>
> CREATE TABLE parametre
> (
> id_parametre int4 not null primary key,
> id_regie int4 NULL ,
> par_id_parametre int4 NULL ,
> type INT4 not null,
> valeur_str varchar null,
> valeur_int int4 null,
> valeur_fl float8 null,
> valeur_txt varchar,
> date_pilotage timestamp NULL,
> id_ct1 int4
> );
> CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);
>
> The query :
> SELECT ae.id_regie,
> ae.num_campagne,
> ae.num_publicite,
> ae.ponderation_calculee * random(),

How much tuples do you usually get from this query? random() can be quite
expensive.

Also please give us output of
vacuum verbose analyze parametre;
vacuum verbose analyze accord_editeur;

But it seems that the problem is in your too large index. It is quite
expensive to traverse it (note that all cost flows from index scan over it).

I do not know the exact statictics for your data, but if you create an index
with small amount of fields, which account vast of the data, query should
became much faster, if optimizer will decide to use this index.
As a first try you can create index on num_editeur. Do not forget to do
vacuum analyze accord_editeur; after this.

> ae.num_periode
> FROM accord_editeur ae,
> parametre par
> WHERE ae.id_regie=1
> AND ae.num_editeur = 1494
> AND ae.num_site = 1
> AND ae.num_emplacement = 1
> AND ae.affichage_possible = 1
> AND ae.par_id_technologie = par.id_parametre
> AND par.type = 10
> AND par.valeur_int = 1
>
> And the Explain :
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..15422.73 rows=1 width=56)
>    ->  Index Scan using accord_editeur_pkey on accord_editeur ae
> (cost=0.00..15420.71 rows=1 width=48)
>    ->  Index Scan using parametre_tracking_idx on parametre par
> (cost=0.00..2.02 rows=1 width=8)
>
> EXPLAIN
>
>
> In fact it is a CPU-Bound... But why ?
> One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660.
> And Id'like to run more than 100 at the same time...
>
> Maybe if I set hash indices on single fields instead of one index on
> multiple columns ?
>
> Thanks,
>
> Guillaume.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Load a database into memory

From

Guillaume Le'mery

Date:

26 January 2001, 03:32:53

For memory :

>> CREATE TABLE accord_editeur
>> (
>> id_regie int4 not null,
>> num_editeur int4 not null,
>> num_site int4 not null,
>> num_emplacement int4 not null,
>> num_campagne int4 not null,
>> num_publicite int4 not null,
>> num_periode int4,
>> par_id_technologie int4 not null,
>> affichage_possible int4 default 0,
>> ponderation_calculee int4,
>> date_pilotage timestamp NULL,
>> id_ct1 int4,
>> PRIMARY
>> KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicit
>> e) );
>>
>> (I've got a primary key on multiple fields because of the complexity of
>> my database. It's the only way to have unique record.)
>>
>> CREATE TABLE parametre
>> (
>> id_parametre int4 not null primary key,
>> id_regie int4 NULL ,
>> par_id_parametre int4 NULL ,
>> type INT4 not null,
>> valeur_str varchar null,
>> valeur_int int4 null,
>> valeur_fl float8 null,
>> valeur_txt varchar,
>> date_pilotage timestamp NULL,
>> id_ct1 int4
>> );
>> CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);
>>
>> The query :
>> SELECT ae.id_regie,
>> ae.num_campagne,
>> ae.num_publicite,
>> ae.ponderation_calculee * random(),
 >> FROM accord_editeur ae,
 >> parametre par
 >> WHERE ae.id_regie=1
 >> AND ae.num_editeur = 1494
 >> AND ae.num_site = 1
 >> AND ae.num_emplacement = 1
 >> AND ae.affichage_possible = 1
 >> AND ae.par_id_technologie = par.id_parametre
 >> AND par.type = 10
 >> AND par.valeur_int = 1
 >>
 >> And the Explain :
 >> NOTICE:  QUERY PLAN:
 >>
 >> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
 >> -> Index Scan using accord_editeur_pkey on accord_editeur ae
 >> (cost=0.00..225.50 rows=1 width=48)
 >> -> Index Scan using parametre_tracking_idx on parametre par
 >> (cost=0.00..2.02 rows=1 width=8)
 >>
 >> EXPLAIN

> How much tuples do you usually get from this query?
About 100.

> random() can be quite expensive.
Ok, Here is with the change :

Query whitout the random() :
SELECT ae.id_regie,
ae.num_campagne,
ae.num_publicite,
ae.ponderation_calculee,
ae.num_periode
FROM accord_editeur ae,
parametre par
WHERE ae.id_regie=1
AND ae.num_editeur = 1494
AND ae.num_site = 1
AND ae.num_emplacement = 1
AND ae.affichage_possible = 1
AND ae.par_id_technologie = par.id_parametre
AND par.type = 10
AND par.valeur_int = 1

Explain :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..228.27 rows=1 width=56)
   ->  Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
   ->  Index Scan using accord_editeur_pkey on accord_editeur ae
(cost=0.00..225.50 rows=60 width=48)

EXPLAIN

random() is not quite expensive... I'll do it in my C code...


> Also please give us output of
> vacuum verbose analyze parametre;
NOTICE:  --Relation parametre--
NOTICE:  Pages 3: Changed 0, reaped 0, Empty 0, New 0; Tup 235: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 148; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index parametre_pkey: Pages 2; Tuples 235. CPU 0.00s/0.00u sec.
NOTICE:  Index parametre_tracking_idx: Pages 2; Tuples 235. CPU
0.00s/0.00u sec.
NOTICE:  Index parametre_idct1_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u
sec.
VACUUM

> vacuum verbose analyze accord_editeur;
NOTICE:  --Relation accord_editeur--
NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.21s/1.85u sec.
NOTICE:  Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted
0. CPU 0.19s/1.01u sec.
VACUUM

> But it seems that the problem is in your too large index. It is quite
> expensive to traverse it (note that all cost flows from index scan over it).
>
> I do not know the exact statictics for your data, but if you create an index
> with small amount of fields, which account vast of the data, query should
> became much faster, if optimizer will decide to use this index.
> As a first try you can create index on num_editeur. Do not forget to do
> vacuum analyze accord_editeur; after this.

OK, I dropped the large index and only set one on num_editeur.

Vacuum verbose analyze accord_editeur :
NOTICE:  --Relation accord_editeur--
NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.33s/1.74u sec.
NOTICE:  Index accord_editeur_tracking_idx: Pages 1330; Tuples 447032:
Deleted 0. CPU 0.05s/0.93u sec.
VACUUM

Here is the new Explain (always whithout the random()) :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..512.65 rows=1 width=56)
   ->  Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
   ->  Index Scan using accord_editeur_tracking_idx on accord_editeur ae
  (cost=0.00..509.88 rows=60 width=48)

EXPLAIN

So, the large index seems to be better...
Or maybe I should try with a hash index ?

If I hash the num_editeur in another field and I set my index on these 2
fields, it gives me better cost, but performance is always slow :-(


>> In fact it is a CPU-Bound... But why ?
>> One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660.
>> And Id'like to run more than 100 at the same time...
>>
>> Maybe if I set hash indices on single fields instead of one index on
>> multiple columns ?
>>
>> Thanks,
>>
>> Guillaume.

Re: Load a database into memory

From

Denis Perchine

Date:

26 January 2001, 03:47:23

> > How much tuples do you usually get from this query?
>
> About 100.
>
> > random() can be quite expensive.

Should not change too much...

> Explain :
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
>    ->  Index Scan using parametre_tracking_idx on parametre par
> (cost=0.00..2.02 rows=1 width=8)
>    ->  Index Scan using accord_editeur_pkey on accord_editeur ae
> (cost=0.00..225.50 rows=60 width=48)
>
> EXPLAIN

But why do you have now cost for index scan 225.50, when in your last mail it
was 15420.71???

> > vacuum verbose analyze accord_editeur;
>
> NOTICE:  --Relation accord_editeur--
> NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
> Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
> Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
> 0.21s/1.85u sec.
> NOTICE:  Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted
> 0. CPU 0.19s/1.01u sec.
> VACUUM

As far as you can see, your index is almost half of table size. This is quite
huge crap.

> OK, I dropped the large index and only set one on num_editeur.
>
> Vacuum verbose analyze accord_editeur :
> NOTICE:  --Relation accord_editeur--
> NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
> Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
> Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
> 0.33s/1.74u sec.
> NOTICE:  Index accord_editeur_tracking_idx: Pages 1330; Tuples 447032:
> Deleted 0. CPU 0.05s/0.93u sec.
> VACUUM
>
> Here is the new Explain (always whithout the random()) :
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..512.65 rows=1 width=56)
>    ->  Index Scan using parametre_tracking_idx on parametre par
> (cost=0.00..2.02 rows=1 width=8)
>    ->  Index Scan using accord_editeur_tracking_idx on accord_editeur ae
>   (cost=0.00..509.88 rows=60 width=48)
>
> EXPLAIN
>
> So, the large index seems to be better...
> Or maybe I should try with a hash index ?
>
> If I hash the num_editeur in another field and I set my index on these 2
> fields, it gives me better cost, but performance is always slow :-(

Please give the output of:

SELECT count(id_regie) FROM accord_editeur WHERE
id_regie=1
AND num_editeur = 1494
AND num_site = 1
AND num_emplacement = 1
AND affichage_possible = 1;

SELECT count(id_regie) FROM accord_editeur WHERE
id_regie=1
AND num_editeur = 1494
AND num_site = 1
AND num_emplacement = 1;

SELECT count(id_regie) FROM accord_editeur WHERE
id_regie=1
AND num_editeur = 1494
AND num_site = 1;

SELECT count(id_regie) FROM accord_editeur WHERE
id_regie=1
AND num_editeur = 1494;

SELECT count(id_regie) FROM accord_editeur WHERE
id_regie=1;

SELECT count(id_regie) FROM accord_editeur WHERE
num_editeur = 1494;

Actually it is some sort of statistics to choose the fields to create index
on.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Load a database into memory

From

Guillaume Le'mery

Date:

26 January 2001, 04:08:33

>> Explain :
>> NOTICE:  QUERY PLAN:
>>
>> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
>>    ->  Index Scan using parametre_tracking_idx on parametre par
>> (cost=0.00..2.02 rows=1 width=8)
>>    ->  Index Scan using accord_editeur_pkey on accord_editeur ae
>> (cost=0.00..225.50 rows=60 width=48)
>>
>> EXPLAIN
>
>
> But why do you have now cost for index scan 225.50, when in your last mail it
> was 15420.71???
Cause I'm dumb...
In the first mail, I didn't have the good index set...
Sorry.

>
>>> vacuum verbose analyze accord_editeur;
>>
>> NOTICE:  --Relation accord_editeur--
>> NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
>> Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
>> Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
>> 0.21s/1.85u sec.
>> NOTICE:  Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted
>> 0. CPU 0.19s/1.01u sec.
>> VACUUM
>
>
> As far as you can see, your index is almost half of table size. This is quite
> huge crap.

I've seen that :-/

> Please give the output of:
>
> SELECT count(id_regie) FROM accord_editeur WHERE
> id_regie=1
> AND num_editeur = 1494
> AND num_site = 1
> AND num_emplacement = 1
> AND affichage_possible = 1;
  count
-------
    103
(1 row)

> SELECT count(id_regie) FROM accord_editeur WHERE
> id_regie=1
> AND num_editeur = 1494
> AND num_site = 1
> AND num_emplacement = 1;
  count
-------
    103
(1 row)

(because for a previous test, I've deleted the record where
affichage_possible = 0, because I don't need them...)

> SELECT count(id_regie) FROM accord_editeur WHERE
> id_regie=1
> AND num_editeur = 1494
> AND num_site = 1;
  count
-------
    179
(1 row)

> SELECT count(id_regie) FROM accord_editeur WHERE
> id_regie=1
> AND num_editeur = 1494;
  count
-------
    352
(1 row)

> SELECT count(id_regie) FROM accord_editeur WHERE
> id_regie=1;
  count
--------
  447032
(1 row)
(the complete table, it is normal...)

> SELECT count(id_regie) FROM accord_editeur WHERE
> num_editeur = 1494;
  count
-------
    352
(1 row)

Maybe I should create an index on (num_editeur, num_site, num_emplacement) ?

Re: Load a database into memory

From

Denis Perchine

Date:

26 January 2001, 04:10:25

> Maybe I should create an index on (num_editeur, num_site, num_emplacement)
> ?

Yes. Try to. This is the best choice.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Load a database into memory

From

Guillaume Le'mery

Date:

26 January 2001, 04:26:24

>> Maybe I should create an index on (num_editeur, num_site, num_emplacement)
>> ?
>
>
> Yes. Try to. This is the best choice.

Here it comes :

VACUUM VERBOSE ANALYZE accord_editeur;
NOTICE:  --Relation accord_editeur--
NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.39s/1.67u sec.
NOTICE:  Index ae_tracking_idx: Pages 2300; Tuples 447032: Deleted 0.
CPU 0.07s/0.90u sec.
VACUUM

EXPLAIN :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..228.27 rows=1 width=56)
   ->  Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
   ->  Index Scan using ae_tracking_idx on accord_editeur ae
(cost=0.00..225.50 rows=60 width=48)

EXPLAIN

So, no change for the cost...
Only for the number of pages...

So, if I can't more optimize my index, what else should I try ?
It uses too much CPU for me...

Re: Load a database into memory

From

Denis Perchine

Date:

26 January 2001, 04:28:33

> Here it comes :
>
> VACUUM VERBOSE ANALYZE accord_editeur;
> NOTICE:  --Relation accord_editeur--
> NOTICE:  Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
> Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
> Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
> 0.39s/1.67u sec.
> NOTICE:  Index ae_tracking_idx: Pages 2300; Tuples 447032: Deleted 0.
> CPU 0.07s/0.90u sec.
> VACUUM
>
> EXPLAIN :
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
>    ->  Index Scan using parametre_tracking_idx on parametre par
> (cost=0.00..2.02 rows=1 width=8)
>    ->  Index Scan using ae_tracking_idx on accord_editeur ae
> (cost=0.00..225.50 rows=60 width=48)
>
> EXPLAIN
>
> So, no change for the cost...
> Only for the number of pages...
>
> So, if I can't more optimize my index, what else should I try ?
> It uses too much CPU for me...

These numbers can be not completly correct. I would rather suggest you to
look on actual query speed.

Say:
set PG_Options TO "executorstats=1";

And run your query with different indices. You will see something like this
in your postgres log:

! Executor Stats:
! system usage stats:
!       1.409678 elapsed 0.480000 user 0.630000 system sec
!       [0.500000 user 0.640000 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       4/1790 [256/2238] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:       7035 read,          7 written, buffer hit rate =
1.24%
!       Local  blocks:          0 read,          0 written, buffer hit rate =
0.00%
!       Direct blocks:          0 read,          0 written

And the first number is amount of seconds.

This will be real measure.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Re: Load a database into memory

From

Tom Lane

Date:

26 January 2001, 10:33:16

Guillaume =?KOI8-R?Q?Le=27mery?= <glemery@comclick.com> writes:
> CREATE TABLE parametre
> (
> id_parametre int4 not null primary key,
> id_regie int4 NULL ,
> par_id_parametre int4 NULL ,
> type INT4 not null,
> valeur_str varchar null,
> valeur_int int4 null,
> valeur_fl float8 null,
> valeur_txt varchar,
> date_pilotage timestamp NULL,
> id_ct1 int4
> );
> CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);
>
> The query :
> SELECT ae.id_regie,
> ae.num_campagne,
> ae.num_publicite,
> ae.ponderation_calculee * random(),
> FROM accord_editeur ae,
> parametre par
> WHERE ae.id_regie=1
> AND ae.num_editeur = 1494
> AND ae.num_site = 1
> AND ae.num_emplacement = 1
> AND ae.affichage_possible = 1
> AND ae.par_id_technologie = par.id_parametre
> AND par.type = 10
> AND par.valeur_int = 1
>
> And the Explain :
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
>     -> Index Scan using accord_editeur_pkey on accord_editeur ae
>     (cost=0.00..225.50 rows=1 width=48)
>     -> Index Scan using parametre_tracking_idx on parametre par
>     (cost=0.00..2.02 rows=1 width=8)

I think the parametre_tracking_idx index is actually counterproductive
here.  You'd probably be better off without it, since then the
planner would (presumably) choose to use the primary-key index,
which is unique.  I suspect that "WHERE par.type = 10 AND par.valeur_int
= 1" retrieves multiple records which then get thrown away.  It'd be
better to use "WHERE ae.par_id_technologie = par.id_parametre" to pull
out the single candidate row --- but the planner is evidently getting
fooled by the availability of two clauses instead of one for the extra
index, and is mistakenly guessing that the extra one is more selective.

Other than that, perhaps you need to readjust your notion of what's
slow.  100 concurrent transactions on a PIII is a pretty stiff workload;
if you're expecting millisecond turnaround then I think you're out of
luck.

            regards, tom lane