Thread: Optimizing a request

Optimizing a request

From
Jean-Max Reymond
Date:
hi,
I have the following structure in my base 7.4.2

CREATE TABLE "public"."article" (
 "art_id" INTEGER NOT NULL,
 "rub_id" INTEGER DEFAULT '0' NOT NULL,
 "art_titre" VARCHAR(100) DEFAULT '' NOT NULL,
 "art_texte" TEXT NOT NULL,
 "art_date" DATE NOT NULL,
 "aut_id" INTEGER,
 CONSTRAINT "article_pkey" PRIMARY KEY("art_id")
) WITH OIDS;

CREATE INDEX "article_art_date_index" ON "public"."article"
USING btree ("art_date");


CREATE INDEX "article_aut_id_index" ON "public"."article"
USING btree ("aut_id");


CREATE INDEX "article_rub_id_index" ON "public"."article"
USING btree ("rub_id");


CREATE INDEX "article_titre" ON "public"."article"
USING btree ("art_id", "art_titre");


CREATE TABLE "public"."auteur" (
 "aut_id" INTEGER NOT NULL,
 "aut_name" VARCHAR(100) DEFAULT '' NOT NULL,
 CONSTRAINT "auteur_pkey" PRIMARY KEY("aut_id")
) WITH OIDS;


CREATE TABLE "public"."rubrique" (
 "rub_id" INTEGER NOT NULL,
 "rub_titre" VARCHAR(100) DEFAULT '' NOT NULL,
 "rub_parent" INTEGER DEFAULT '0' NOT NULL,
 "rub_date" DATE,
 CONSTRAINT "rubrique_pkey" PRIMARY KEY("rub_id")
) WITH OIDS;

CREATE INDEX "rub_rub" ON "public"."rubrique"
USING btree ("rub_parent");

CREATE INDEX "rubrique_rub_date_index" ON "public"."rubrique"
USING btree ("rub_date");

CREATE INDEX "rubrique_rub_titre_index" ON "public"."rubrique"
USING btree ("rub_titre");

I want to optimize the following request and avoid the seq scan on the
table article (10000000 rows).



explain SELECT art_id, art_titre, art_texte, rub_titre
FROM article inner join rubrique on article.rub_id = rubrique.rub_id
where rub_parent = 8;

Hash Join  (cost=8.27..265637.59 rows=25 width=130)
  Hash Cond: ("outer".rub_id = "inner".rub_id)
  ->  Seq Scan on article  (cost=0.00..215629.00 rows=10000000 width=108)
  ->  Hash  (cost=8.26..8.26 rows=3 width=22)
        ->  Index Scan using rubrique_parent on rubrique
(cost=0.00..8.26 rows=3 width=22)
              Index Cond: (rub_parent = 8)


thanks for your answers,

--
Jean-Max Reymond
CKR Solutions
http://www.ckr-solutions.com

Re: Optimizing a request

From
Josh Berkus
Date:
Jean,

> I have the following structure in my base 7.4.2

Upgrade to 7.4.5.  The version you're using has several known issues with data
restore in the event of system failure.

> Hash Join  (cost=8.27..265637.59 rows=25 width=130)
>   Hash Cond: ("outer".rub_id = "inner".rub_id)
>   ->  Seq Scan on article  (cost=0.00..215629.00 rows=10000000 width=108)
>   ->  Hash  (cost=8.26..8.26 rows=3 width=22)
>         ->  Index Scan using rubrique_parent on rubrique
> (cost=0.00..8.26 rows=3 width=22)
>               Index Cond: (rub_parent = 8)

Those look suspiciously like stock estimates.  When was the last time you ran
ANALYZE?


--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Optimizing a request

From
Jean-Max Reymond
Date:
On Tue, 31 Aug 2004 12:15:40 -0700, Josh Berkus <josh@agliodbs.com> wrote:

> Those look suspiciously like stock estimates.  When was the last time you ran
> ANALYZE?

the vacuum analyze ran just before the explain

--
Jean-Max Reymond
CKR Solutions
http://www.ckr-solutions.com

Re: Optimizing a request

From
"Gary Doades"
Date:
On 31 Aug 2004 at 20:59, Jean-Max Reymond wrote:

> hi,
>
> I want to optimize the following request and avoid the seq scan on the
> table article (10000000 rows).
>
> explain SELECT art_id, art_titre, art_texte, rub_titre
> FROM article inner join rubrique on article.rub_id = rubrique.rub_id
> where rub_parent = 8;
>
> Hash Join  (cost=8.27..265637.59 rows=25 width=130)
>   Hash Cond: ("outer".rub_id = "inner".rub_id)
>   ->  Seq Scan on article  (cost=0.00..215629.00 rows=10000000 width=108)
>   ->  Hash  (cost=8.26..8.26 rows=3 width=22)
>         ->  Index Scan using rubrique_parent on rubrique
> (cost=0.00..8.26 rows=3 width=22)
>               Index Cond: (rub_parent = 8)
>
>
> thanks for your answers,
>
> --

Have you run ANALYZE on this database after creating the indexes or loading the data?

What percentage of rows in the "article" table are likely to match the keys selected from the "rubrique" table?

If it is likely to fetch a high proportion of the rows from article then it may be best that a seq scan is performed.

What are your non-default postgresql.conf settings? It may be better to increase the default_statistics_target (to say 100 to 200) before running ANALYZE and then re-run the query.

Cheers,
Gary.

Re: Optimizing a request

From
Jean-Max Reymond
Date:
----- Original Message -----
From: Gary Doades <gpd@gpdnet.co.uk>
Date: Tue, 31 Aug 2004 20:21:49 +0100
Subject: Re: [PERFORM] Optimizing a request
To: pgsql-performance@postgresql.org



> Have you run ANALYZE on this database after creating the indexes or loading the data?

the indexes are created and the data loaded and then, I run vacuum analyze.

>What percentage of rows in the "article" table are likely to match
the keys selected from  the "rubrique" table?

only 1 record.

If it is likely to fetch a high proportion of the rows from article
then it may be best that a seq scan is performed.

What are your non-default postgresql.conf settings? It may be better
to increase the default_statistics_target (to say 100 to 200) before
running ANALYZE and then re-run the query.

yes,  default_statistics_target is set to the default_value.
I have just increased  shared_buffers and effective_cache_size to give
advantage of 1 Mb RAM




--
Jean-Max Reymond
CKR Solutions
http://www.ckr-solutions.com

Re: Optimizing a request

From
Tom Lane
Date:
Jean-Max Reymond <jmreymond@gmail.com> writes:
> explain SELECT art_id, art_titre, art_texte, rub_titre
> FROM article inner join rubrique on article.rub_id = rubrique.rub_id
> where rub_parent = 8;

> Hash Join  (cost=8.27..265637.59 rows=25 width=130)
>   Hash Cond: ("outer".rub_id = "inner".rub_id)
>   ->  Seq Scan on article  (cost=0.00..215629.00 rows=10000000 width=108)
>   ->  Hash  (cost=8.26..8.26 rows=3 width=22)
>         ->  Index Scan using rubrique_parent on rubrique
> (cost=0.00..8.26 rows=3 width=22)
>               Index Cond: (rub_parent = 8)

That seems like a very strange plan choice given those estimated row
counts.  I'd have expected it to use a nestloop with inner index scan
on article_rub_id_index.  You haven't done anything odd like disable
nestloop, have you?

What plan do you get if you turn off enable_hashjoin?  (If it's a merge
join, then turn off enable_mergejoin and try again.)  Also, could we see
EXPLAIN ANALYZE not just EXPLAIN output for all these cases?

            regards, tom lane

Re: Optimizing a request

From
"Gary Doades"
Date:
On 31 Aug 2004 at 21:42, Jean-Max Reymond wrote:

> ----- Original Message -----
> From: Gary Doades <gpd@gpdnet.co.uk>
> Date: Tue, 31 Aug 2004 20:21:49 +0100
> Subject: Re: [PERFORM] Optimizing a request
> To: pgsql-performance@postgresql.org
>
>
>
> > Have you run ANALYZE on this database after creating the indexes or loading the data?
>
> the indexes are created and the data loaded and then, I run vacuum analyze.
>
> >What percentage of rows in the "article" table are likely to match
> the keys selected from  the "rubrique" table?
>
> only 1 record.
>
> If it is likely to fetch a high proportion of the rows from article
> then it may be best that a seq scan is performed.
>
> What are your non-default postgresql.conf settings? It may be better
> to increase the default_statistics_target (to say 100 to 200) before
> running ANALYZE and then re-run the query.
>
> yes,  default_statistics_target is set to the default_value.
> I have just increased  shared_buffers and effective_cache_size to give
> advantage of 1 Mb RAM
>

I can only presume you mean 1 GB RAM. What exactly are your
settings for shared buffers and effective_cache_size?

Can you increase default_statistics_target and re-test? It is possible
that with such a large table that the distribution of values is skewed and
postgres does not realise that an index scan would be better.

It seems very odd otherwise that only on row out of 10,000,000 could
match and postgres does not realise this.

Can you post an explain analyse (not just explain) for this query?

Cheers,
Gary.


Re: Optimizing a request

From
Jean-Max Reymond
Date:
On Tue, 31 Aug 2004 21:16:46 +0100, Gary Doades <gpd@gpdnet.co.uk> wrote:

> I can only presume you mean 1 GB RAM. What exactly are your
> settings for shared buffers and effective_cache_size?

for 1 GB RAM,
shared_buffers = 65536
effective_cache_size = 16384

>
> Can you increase default_statistics_target and re-test? It is possible
> that with such a large table that the distribution of values is skewed and
> postgres does not realise that an index scan would be better.

OK, tomorrow, I'll try with the new value of default_statistics_target

> It seems very odd otherwise that only on row out of 10,000,000 could
> match and postgres does not realise this.
>
> Can you post an explain analyse (not just explain) for this query?

yes, of course


--
Jean-Max Reymond
CKR Solutions
http://www.ckr-solutions.com

Re: Optimizing a request

From
Jean-Max Reymond
Date:
On Tue, 31 Aug 2004 16:13:58 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> That seems like a very strange plan choice given those estimated row
> counts.  I'd have expected it to use a nestloop with inner index scan
> on article_rub_id_index.  You haven't done anything odd like disable
> nestloop, have you?
>

no optimizer disabled.

> What plan do you get if you turn off enable_hashjoin?  (If it's a merge
> join, then turn off enable_mergejoin and try again.)  Also, could we see
> EXPLAIN ANALYZE not just EXPLAIN output for all these cases?
>
>                         regards, tom lane
>

OK, TOM Thanks for your help

--
Jean-Max Reymond
CKR Solutions
http://www.ckr-solutions.com

Re: Optimizing a request

From
"Gary Doades"
Date:
On 31 Aug 2004 at 22:24, Jean-Max Reymond wrote:

> On Tue, 31 Aug 2004 21:16:46 +0100, Gary Doades <gpd@gpdnet.co.uk> wrote:
>
> > I can only presume you mean 1 GB RAM. What exactly are your
> > settings for shared buffers and effective_cache_size?
>
> for 1 GB RAM,
> shared_buffers = 65536
> effective_cache_size = 16384

This seems like the wrong way round also.

You might try:

shared_buffers = 10000
effective_cache_size = 60000

Cheers,
Gary.


Re: Optimizing a request

From
Hervé Piedvache
Date:
Hi,

Le Mardi 31 Août 2004 20:59, Jean-Max Reymond a écrit :
> explain SELECT art_id, art_titre, art_texte, rub_titre
> FROM article inner join rubrique on article.rub_id = rubrique.rub_id
> where rub_parent = 8;
>
> Hash Join  (cost=8.27..265637.59 rows=25 width=130)
>   Hash Cond: ("outer".rub_id = "inner".rub_id)
>   ->  Seq Scan on article  (cost=0.00..215629.00 rows=10000000 width=108)
>   ->  Hash  (cost=8.26..8.26 rows=3 width=22)
>         ->  Index Scan using rubrique_parent on rubrique
> (cost=0.00..8.26 rows=3 width=22)
>               Index Cond: (rub_parent = 8)
>

What are the values in rub_parent ... is their many disparity in the values ?
May be you have most of the value set to 8 ... and may be the optimizer think
a seq scan is better than the use of an index ...

Could you do a simple :
SELECT rub_parent, count(rub_id)
   FROM rubrique
 GROUP BY rub_parent;

Just to see the disparity of the values ...

regards,
--
Bill Footcow