Home > mailing lists

Thread: Query performance problem

Query performance problem

From

"Danilo Mota"

Date:

20 August 2004, 13:20:41

HI All,

I have a big performance issue concerning a PostgreSQL database.

I have the following server configuration:

Pentium 4 2.4 GHz

1 GB RAM

36 GB SCSI

And the following tables:

TABLES

--------------------------------------------------------------------------

================== r_cliente: 75816 records ============================

CREATE TABLE "public"."r_cliente" (

"pkcliente" INTEGER NOT NULL,

"cpfcnpj" VARCHAR(20) NOT NULL,

PRIMARY KEY("pkcliente")

) WITH OIDS;

CREATE UNIQUE INDEX "un_cliente_cpfcnpj" ON "public"."r_cliente"

USING btree ("cpfcnpj");

================== sav_cliente_lg: 65671 records =======================

CREATE TABLE "public"."sav_cliente_lg" (

"codigo" INTEGER NOT NULL,

"cpfcnpj" VARCHAR(15) NOT NULL,

PRIMARY KEY("codigo")

) WITH OIDS;

CREATE INDEX "ix_savclientelg_cpfcnpj" ON "public"."sav_cliente_lg"

USING btree ("cpfcnpj");

Which I would like to run the following query:

QUERY

--------------------------------------------------------------------------

SELECT

rc.pkcliente

FROM r_cliente AS rc

INNER JOIN sav_cliente_lg AS sc ON sc.cpfcnpj = sc.cpfcnpj;

The problem is, it takes a long time to run, I wait up to half an hour and I get no result.

So, I executed the explain on the query and got the following results:

QUERY PLAN

--------------------------------------------------------------------------

Nested Loop (cost=0.00..16696.87 rows=75816 width=4)

-> Seq Scan on sav_cliente_cf sc (cost=0.00..3047.55 rows=1 width=0)

Filter: ((cpfcnpj)::text = (cpfcnpj)::text)

-> Seq Scan on r_cliente rc (cost=0.00..12891.16 rows=75816 width=4)

And made the following modifications on my POSTGRESQL.CONF file:

POSTGRESQL.CONF

--------------------------------------------------------------------------

### VERSION: Postgresql 7.4.2 ###

shared_buffers = 7800

sort_mem = 4096

checkpoint_segments = 5

effective_cache_size = 12000

cpu_operator_cost = 0.0015

stats_start_collector = false

Hope you can help me, I really need to get this running faster, and I am out of ideas.

Since now, thanks a lot for your attention,

Danilo Mota

Re: Query performance problem

From

Tom Lane

Date:

20 August 2004, 13:34:53

"Danilo Mota" <dmota@nexen.com.br> writes:
> SELECT
>  rc.pkcliente
> FROM r_cliente AS rc
> INNER JOIN sav_cliente_lg AS sc ON sc.cpfcnpj = sc.cpfcnpj;

Surely you meant
  INNER JOIN sav_cliente_lg AS sc ON rc.cpfcnpj = sc.cpfcnpj;

I would also venture that your statistics are desperately out of date,
because if the planner's estimates are close to reality, even this
unconstrained-cross-product join shouldn't have taken that long.

            regards, tom lane

Re: Query performance problem

From

Bruno Wolff III

Date:

20 August 2004, 13:39:51

On Fri, Aug 20, 2004 at 13:25:30 -0300,
  Danilo Mota <dmota@nexen.com.br> wrote:
>
> And the following tables:
>                                     TABLES
> ------------------------------------------------------------------------
> --
> ==================  r_cliente: 75816 records
> ============================
> CREATE TABLE "public"."r_cliente" (
>   "pkcliente" INTEGER NOT NULL,
>   "cpfcnpj" VARCHAR(20) NOT NULL,
>   PRIMARY KEY("pkcliente")
> ) WITH OIDS;
>
> CREATE UNIQUE INDEX "un_cliente_cpfcnpj" ON "public"."r_cliente"
> USING btree ("cpfcnpj");
>
> ==================  sav_cliente_lg: 65671 records
> =======================
> CREATE TABLE "public"."sav_cliente_lg" (
>   "codigo" INTEGER NOT NULL,
>   "cpfcnpj" VARCHAR(15) NOT NULL,
>   PRIMARY KEY("codigo")
> ) WITH OIDS;
>
> CREATE INDEX "ix_savclientelg_cpfcnpj" ON "public"."sav_cliente_lg"
> USING btree ("cpfcnpj");
>
>
>
> Which I would like to run the following query:
>
>                                     QUERY
> ------------------------------------------------------------------------
> --
> SELECT
>  rc.pkcliente
> FROM r_cliente AS rc
> INNER JOIN sav_cliente_lg AS sc ON sc.cpfcnpj = sc.cpfcnpj;

I am going to assume that one of the sc.cpfcnpj's above is really rc.cpfcnpj
since that corresponds to the explain below.

sc.cpfcnpj and rc.cpfcnpj are different length varchars. You made need
an explicit cast to allow the use of indexes. (Unless there is a real
business rule that mandates the limits you have used, you probably want
to make them both type 'text'.)

Another potential problem is not having analyzed the tables. I don't think
this can be ruled out based on what you have showed us so far.

>
> So, I executed the explain on the query and got the following results:

Generally you want to run EXPLAIN ANALYZE results when submitting questions
about performance problems rather than just EXPLAIN results.

Re: Query performance problem

From

Tom Lane

Date:

20 August 2004, 13:50:01

Bruno Wolff III <bruno@wolff.to> writes:
> I am going to assume that one of the sc.cpfcnpj's above is really rc.cpfcnpj
> since that corresponds to the explain below.

No, actually the explain plan corresponds to the sc.cpfcnpj = sc.cpfcnpj
condition.  I didn't twig to the typo until I started to wonder why the
plan had the condition in the wrong place (attached to the seqscan and
not the join step).

> sc.cpfcnpj and rc.cpfcnpj are different length varchars. You made need
> an explicit cast to allow the use of indexes.

AFAIK the cross-type issues only apply to crossing actual types, not
lengths.  That does look like an error in the database schema, though.

            regards, tom lane