Thread: Large querie with several EXISTS which will be often runned

Large querie with several EXISTS which will be often runned

From
"Bruno BAGUETTE"
Date:
Hello,

I've a performance question that I would like to ask you :

I have to design a DB that will manage products, and I'm adding the
product's options management.

A box can be red or yellow, or with black rubber or with white rubber,
for example.
So I have a product (the box) and two options groups (the box color and
the rubber color) and four options (red,yellow,black,white).

Here's my tables :

/* PRODUCTS OPTIONS : */
/* ------------------ */

CREATE SEQUENCE seq_id_product_option START 1 MINVALUE 1;

CREATE TABLE products_options
(
    pk_prdopt_id INT4 DEFAULT NEXTVAL('seq_id_product_option') NOT
NULL,
    fk_prd_id INT4 NOT NULL,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price DOUBLE PRECISION NOT NULL,
    vat_rate NUMERIC(5,2) NOT NULL,
    internal_notes TEXT,
    CONSTRAINT products_options_pk PRIMARY KEY (pk_prdopt_id),
    CONSTRAINT products_options_fk_prdid FOREIGN KEY (fk_prd_id)
REFERENCES products (pk_prd_id),
    CONSTRAINT products_options_vatrate_value CHECK (vat_rate
BETWEEN 0 AND 100)
);

/* PRODUCTS OPTIONS GROUP NAMES : */
/* ------------------------------ */

CREATE SEQUENCE seq_id_product_option_group START 1 MINVALUE 1;

CREATE TABLE products_options_groups
(
    pk_prdoptgrp_id INT4 DEFAULT
NEXTVAL('seq_id_product_option_group') NOT NULL,
    prdoptgrp_name VARCHAR(100) NOT NULL,
    prdoptgrp_description TEXT NOT NULL,
    prdoptgrp_internal_notes TEXT,
    CONSTRAINT products_options_groups_pk PRIMARY
KEY(pk_prdoptgrp_id)
);

/* PRODUCTS OPTIONS CLASSIFICATION : */
/* ------------------------------ */

CREATE TABLE products_options_classification
(
    fk_prdoptgrp_id INT4 NOT NULL,
    fk_prdopt_id INT4 NOT NULL,
    CONSTRAINT products_options_classification_pk PRIMARY
KEY(fk_prdoptgrp_id,fk_prdopt_id),
    CONSTRAINT products_options_classification_fk_prdoptgrp FOREIGN
KEY (fk_prdoptgrp_id) REFERENCES products_options_groups
(pk_prdoptgrp_id),
    CONSTRAINT products_options_classification_fk_prdopt FOREIGN KEY
(fk_prdopt_id) REFERENCES products_options (pk_prdopt_id)
);


I'm worrying about the performances of the queries that will the most
often dones, especially the select of the available options groups
('Rubber color','Box color' in my example) on one product (The box).

SELECT products_options_groups.pk_prdoptgrp_id,
products_options_groups.prdoptgrp_name
FROM products_options_groups
WHERE EXISTS
(
    SELECT *
    FROM products_options_classification
    WHERE products_options_classification =
products_options_groups.pk_prdoptgrp_id
    AND EXISTS
    (
        SELECT *
        FROM products_options
        WHERE products_options.pk_prdopt_id =
products_options_classification.fk_prdopt_id
        AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN
HERE BY MY APP]
    )
)
ORDER BY products_options_groups.prdoptgrp_name;


I will have to manage more or less 10.000 products with more or less 2-3
options by products and more or less 40 options-groups.

Do you think that this query will be hard for PostgreSQL (currently
7.2.1 but I will migrate to 7.3.2 when going in production environment)
?
How can I improve that query to be faster ?

Thanks really much for your advices about this ! :-)

---------------------------------------
Bruno BAGUETTE - pgsql-ml@baguette.net


Re: Large querie with several EXISTS which will be often runned

From
Josh Berkus
Date:
Bruno,

> I will have to manage more or less 10.000 products with more or less 2-3
> options by products and more or less 40 options-groups.
>
> Do you think that this query will be hard for PostgreSQL (currently
> 7.2.1 but I will migrate to 7.3.2 when going in production environment)
> ?
> How can I improve that query to be faster ?

Collapse the inner EXISTS into a straight join in the outer EXISTS.  Since you
are merely checking for existence, there is no reason for the subquery
nesting.

--
Josh Berkus
Aglio Database Solutions
San Francisco

RE : Large querie with several EXISTS which will be often runned

From
"Bruno BAGUETTE"
Date:
Hello Josh,

> > I will have to manage more or less 10.000 products with
> more or less
> > 2-3 options by products and more or less 40 options-groups.
> >
> > Do you think that this query will be hard for PostgreSQL (currently
> > 7.2.1 but I will migrate to 7.3.2 when going in production
> > environment) ? How can I improve that query to be faster ?
>
> Collapse the inner EXISTS into a straight join in the outer
> EXISTS.  Since you
> are merely checking for existence, there is no reason for the
> subquery
> nesting.

Do you mean this query ?

SELECT
products_options_groups.pk_prdoptgrp_id,products_options_groups.prdoptgr
p_name
FROM products_options_groups
WHERE EXISTS
(
    SELECT *
    FROM products_options_classification
    INNER JOIN products_options ON products_options.pk_prdopt_id =
products_options_classification.fk_prdopt_id
    WHERE products_options_classification =
products_options_groups.pk_prdoptgrp_id
    AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY
MY APP]
)
ORDER BY products_options_groups.prdoptgrp_name;


An other question, do you think that my tables are OK or is there some
things I could change in order to have as much performance as possible
(without de-normalize it because I want to avoid redundancy in my
tables).

Thanks very much for your tips ! :-)

---------------------------------------
Bruno BAGUETTE - pgsql-ml@baguette.net


Re: Large querie with several EXISTS which will be often runned

From
Shridhar Daithankar
Date:
On Saturday 28 June 2003 14:47, Bruno BAGUETTE wrote:
> Do you mean this query ?
>
> SELECT
> products_options_groups.pk_prdoptgrp_id,products_options_groups.prdoptgr
> p_name
> FROM products_options_groups
> WHERE EXISTS
> (
>     SELECT *
>     FROM products_options_classification
>     INNER JOIN products_options ON products_options.pk_prdopt_id =
> products_options_classification.fk_prdopt_id
>     WHERE products_options_classification =
> products_options_groups.pk_prdoptgrp_id
>     AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY
> MY APP]
> )
> ORDER BY products_options_groups.prdoptgrp_name;

You can try

 SELECT
 products_options_groups.pk_prdoptgrp_id,products_options_groups.prdoptgr
 p_name
 FROM products_options_groups
 WHERE
 (
     SELECT  count(*)
     FROM products_options_classification
     INNER JOIN products_options ON products_options.pk_prdopt_id =
 products_options_classification.fk_prdopt_id
     WHERE products_options_classification =
 products_options_groups.pk_prdoptgrp_id
     AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY
 MY APP]
 )>0
 ORDER BY products_options_groups.prdoptgrp_name;

The count(*) trick will make it just another subquery and hopefully any
performance issues with exists/in does not figure. Some of those issues are
fixed in 7.4/CVS head though.

 HTH

 Shridhar