Thread: index questions

index questions

From
Franco Bruno Borghesi
Date:
Hi everyone.

I have created a simplified example of a real case, to show you what I'm tryng to do. I have
a table, like this:

CREATE TABLE sales (
   saleId SERIAL,
   clientId INTEGER,
   branchId INTEGER,
   productId INTEGER,
   employeeId INTEGER,
   saleDate DATE,
   price NUMERIC(12, 2),
   qty INTEGER,
   PRIMARY KEY(saleId)
);
CREATE INDEX sales_k1 ON sales(clientId, branchId, productId, employeeId, saleDate, price, qty);

This table will grow to *many* rows in the future.

I want to make a function that returns the FIRS saleId of the sale that matches some conditions. I will
always receive the Client Id, but not always the other arguments (sent as NULLs).

The fetched resultset shoud prioritize the passed arguments, and after that, the saleDate, price
and quantity.


/**
* Finds the first sale that matches the conditions received.
* @param $1 Client Id.
* @param $2 Preferred Branch Id.
* @param $3 Preferred Product Id.
* @param $4 Preferred Employee Id.
* @return Sale Id if found, NULL if not.
*/
CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER) RETURNS INTEGER AS '
DECLARE
   a_clientId ALIAS FOR $1;
   a_branchId ALIAS FOR $1;
   a_productId ALIAS FOR $1;
   a_employeeId ALIAS FOR $1;
   r_result INTEGER;
BEGIN
   SELECT
      INTO r_result employeeId
   FROM
      sales
   WHERE
      clientId=a_clientId AND
      branchId=coalesce(a_branchId, branchId) AND /*branchId is null? anything will be ok*/
      productId=coalesce(a_productId, productId) AND /*productId is null? anything will be ok*/
      employeeId=coalesce(a_employeeId, employeeId) /*employeeId is null? anything will be ok*/
   ORDER BY
      clientId, branchId, productId, employeeId, saleDate, price, qty
   LIMIT 1;

   RETURN r_result;
END;
' LANGUAGE 'plpgsql';


Will findSale() in the future, when I have *many* rows still use the index when only the first couple of
arguments are passed to the function?
If not, should I create more indexes (and functions) for each possible argument combination? (of course, with
the given order)

The thing here is that I don't understand how postgreSQL solves the query when the COALESCEs are used... it uses
the index now, with a few thowsand records, but what will happen in a few months?

Thanks in advance.
Attachment

Re: index questions

From
Ron Johnson
Date:
On Fri, 2003-07-25 at 11:52, Franco Bruno Borghesi wrote:
[snip]
>
>
> Will findSale() in the future, when I have *many* rows still use the
> index when only the first couple of
> arguments are passed to the function?
> If not, should I create more indexes (and functions) for each possible
> argument combination? (of course, with
> the given order)
>
> The thing here is that I don't understand how postgreSQL solves the
> query when the COALESCEs are used... it uses
> the index now, with a few thowsand records, but what will happen in a
> few months?

When faced with cases like this, I cobble together a script/program
that generates a few million rows of random data (within the confines
of FKs, of course) to populate these tables like "sales", and then I
see how things perform.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+



Re: index questions

From
Josh Berkus
Date:
Franco,

> CREATE INDEX sales_k1 ON sales(clientId, branchId, productId,
> employeeId, saleDate, price, qty);

A 7-column index is unlikely to be effective -- the index will be almost as
large as the table.   Try indexing only the first 3-4 columns instead.

> I want to make a function that returns the FIRS saleId of the sale that
> matches some conditions. I will
> always receive the Client Id, but not always the other arguments (sent
> as NULLs).

Well, keep in mind that your multi-column index will only be useful if all
columns are queried starting from the left.  That is, the index will be
ignored if you have a "where productId = x" without a "where branchid = y".

> CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER)
> RETURNS INTEGER AS '
> DECLARE
>    a_clientId ALIAS FOR $1;
>    a_branchId ALIAS FOR $1;
>    a_productId ALIAS FOR $1;
>    a_employeeId ALIAS FOR $1;

Your aliases are wrong here.

>       branchId=coalesce(a_branchId, branchId) AND /*branchId is null?
> anything will be ok*/
>       productId=coalesce(a_productId, productId) AND /*productId is
> null? anything will be ok*/

On a very large table this will be very inefficient.   you'll be comparing the
productid, for example, even if no productid is passed ... and the index
won't do you any good because the planner should figure out that 100% of rows
match the condition.

Instead, I recommend that you build up a dynamic query as a string and then
pass only the conditions sent by the user.  You can then EXECUTE the query
and loop through it for a result.

Of course, YMMV.   My approach will require you to create more indexes which
could be a problem if you have limited disk space.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: index questions

From
Franco Bruno Borghesi
Date:
what you say is that the index is not effective because of its size, but it would still be used *if* the conditions are right... In this case, I care about performance, not space.

But what you say about the index not being good because 100% of rows match the condition confirms what I suspected.

Thanks for your help.


On Fri, 2003-07-25 at 14:28, Josh Berkus wrote:
Franco,

> CREATE INDEX sales_k1 ON sales(clientId, branchId, productId,
> employeeId, saleDate, price, qty);

A 7-column index is unlikely to be effective -- the index will be almost as 
large as the table.   Try indexing only the first 3-4 columns instead.  

> I want to make a function that returns the FIRS saleId of the sale that
> matches some conditions. I will
> always receive the Client Id, but not always the other arguments (sent
> as NULLs).

Well, keep in mind that your multi-column index will only be useful if all 
columns are queried starting from the left.  That is, the index will be 
ignored if you have a "where productId = x" without a "where branchid = y".

> CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER)
> RETURNS INTEGER AS '
> DECLARE
>    a_clientId ALIAS FOR $1;
>    a_branchId ALIAS FOR $1;
>    a_productId ALIAS FOR $1;
>    a_employeeId ALIAS FOR $1;

Your aliases are wrong here.

>       branchId=coalesce(a_branchId, branchId) AND /*branchId is null?
> anything will be ok*/
>       productId=coalesce(a_productId, productId) AND /*productId is
> null? anything will be ok*/

On a very large table this will be very inefficient.   you'll be comparing the 
productid, for example, even if no productid is passed ... and the index 
won't do you any good because the planner should figure out that 100% of rows 
match the condition.  

Instead, I recommend that you build up a dynamic query as a string and then 
pass only the conditions sent by the user.  You can then EXECUTE the query 
and loop through it for a result.

Of course, YMMV.   My approach will require you to create more indexes which 
could be a problem if you have limited disk space.
Attachment