Thread: Large querie with several EXISTS which will be often runned
Hello, I've a performance question that I would like to ask you : I have to design a DB that will manage products, and I'm adding the product's options management. A box can be red or yellow, or with black rubber or with white rubber, for example. So I have a product (the box) and two options groups (the box color and the rubber color) and four options (red,yellow,black,white). Here's my tables : /* PRODUCTS OPTIONS : */ /* ------------------ */ CREATE SEQUENCE seq_id_product_option START 1 MINVALUE 1; CREATE TABLE products_options ( pk_prdopt_id INT4 DEFAULT NEXTVAL('seq_id_product_option') NOT NULL, fk_prd_id INT4 NOT NULL, name VARCHAR(100) NOT NULL, description TEXT, price DOUBLE PRECISION NOT NULL, vat_rate NUMERIC(5,2) NOT NULL, internal_notes TEXT, CONSTRAINT products_options_pk PRIMARY KEY (pk_prdopt_id), CONSTRAINT products_options_fk_prdid FOREIGN KEY (fk_prd_id) REFERENCES products (pk_prd_id), CONSTRAINT products_options_vatrate_value CHECK (vat_rate BETWEEN 0 AND 100) ); /* PRODUCTS OPTIONS GROUP NAMES : */ /* ------------------------------ */ CREATE SEQUENCE seq_id_product_option_group START 1 MINVALUE 1; CREATE TABLE products_options_groups ( pk_prdoptgrp_id INT4 DEFAULT NEXTVAL('seq_id_product_option_group') NOT NULL, prdoptgrp_name VARCHAR(100) NOT NULL, prdoptgrp_description TEXT NOT NULL, prdoptgrp_internal_notes TEXT, CONSTRAINT products_options_groups_pk PRIMARY KEY(pk_prdoptgrp_id) ); /* PRODUCTS OPTIONS CLASSIFICATION : */ /* ------------------------------ */ CREATE TABLE products_options_classification ( fk_prdoptgrp_id INT4 NOT NULL, fk_prdopt_id INT4 NOT NULL, CONSTRAINT products_options_classification_pk PRIMARY KEY(fk_prdoptgrp_id,fk_prdopt_id), CONSTRAINT products_options_classification_fk_prdoptgrp FOREIGN KEY (fk_prdoptgrp_id) REFERENCES products_options_groups (pk_prdoptgrp_id), CONSTRAINT products_options_classification_fk_prdopt FOREIGN KEY (fk_prdopt_id) REFERENCES products_options (pk_prdopt_id) ); I'm worrying about the performances of the queries that will the most often dones, especially the select of the available options groups ('Rubber color','Box color' in my example) on one product (The box). SELECT products_options_groups.pk_prdoptgrp_id, products_options_groups.prdoptgrp_name FROM products_options_groups WHERE EXISTS ( SELECT * FROM products_options_classification WHERE products_options_classification = products_options_groups.pk_prdoptgrp_id AND EXISTS ( SELECT * FROM products_options WHERE products_options.pk_prdopt_id = products_options_classification.fk_prdopt_id AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY MY APP] ) ) ORDER BY products_options_groups.prdoptgrp_name; I will have to manage more or less 10.000 products with more or less 2-3 options by products and more or less 40 options-groups. Do you think that this query will be hard for PostgreSQL (currently 7.2.1 but I will migrate to 7.3.2 when going in production environment) ? How can I improve that query to be faster ? Thanks really much for your advices about this ! :-) --------------------------------------- Bruno BAGUETTE - pgsql-ml@baguette.net
Bruno, > I will have to manage more or less 10.000 products with more or less 2-3 > options by products and more or less 40 options-groups. > > Do you think that this query will be hard for PostgreSQL (currently > 7.2.1 but I will migrate to 7.3.2 when going in production environment) > ? > How can I improve that query to be faster ? Collapse the inner EXISTS into a straight join in the outer EXISTS. Since you are merely checking for existence, there is no reason for the subquery nesting. -- Josh Berkus Aglio Database Solutions San Francisco
Hello Josh, > > I will have to manage more or less 10.000 products with > more or less > > 2-3 options by products and more or less 40 options-groups. > > > > Do you think that this query will be hard for PostgreSQL (currently > > 7.2.1 but I will migrate to 7.3.2 when going in production > > environment) ? How can I improve that query to be faster ? > > Collapse the inner EXISTS into a straight join in the outer > EXISTS. Since you > are merely checking for existence, there is no reason for the > subquery > nesting. Do you mean this query ? SELECT products_options_groups.pk_prdoptgrp_id,products_options_groups.prdoptgr p_name FROM products_options_groups WHERE EXISTS ( SELECT * FROM products_options_classification INNER JOIN products_options ON products_options.pk_prdopt_id = products_options_classification.fk_prdopt_id WHERE products_options_classification = products_options_groups.pk_prdoptgrp_id AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY MY APP] ) ORDER BY products_options_groups.prdoptgrp_name; An other question, do you think that my tables are OK or is there some things I could change in order to have as much performance as possible (without de-normalize it because I want to avoid redundancy in my tables). Thanks very much for your tips ! :-) --------------------------------------- Bruno BAGUETTE - pgsql-ml@baguette.net
On Saturday 28 June 2003 14:47, Bruno BAGUETTE wrote: > Do you mean this query ? > > SELECT > products_options_groups.pk_prdoptgrp_id,products_options_groups.prdoptgr > p_name > FROM products_options_groups > WHERE EXISTS > ( > SELECT * > FROM products_options_classification > INNER JOIN products_options ON products_options.pk_prdopt_id = > products_options_classification.fk_prdopt_id > WHERE products_options_classification = > products_options_groups.pk_prdoptgrp_id > AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY > MY APP] > ) > ORDER BY products_options_groups.prdoptgrp_name; You can try SELECT products_options_groups.pk_prdoptgrp_id,products_options_groups.prdoptgr p_name FROM products_options_groups WHERE ( SELECT count(*) FROM products_options_classification INNER JOIN products_options ON products_options.pk_prdopt_id = products_options_classification.fk_prdopt_id WHERE products_options_classification = products_options_groups.pk_prdoptgrp_id AND products_options.fk_prd_id = [A PRODUCT ID WRITTEN HERE BY MY APP] )>0 ORDER BY products_options_groups.prdoptgrp_name; The count(*) trick will make it just another subquery and hopefully any performance issues with exists/in does not figure. Some of those issues are fixed in 7.4/CVS head though. HTH Shridhar