Thread: index questions
Hi everyone.
I have created a simplified example of a real case, to show you what I'm tryng to do. I have
a table, like this:
CREATE TABLE sales (
saleId SERIAL,
clientId INTEGER,
branchId INTEGER,
productId INTEGER,
employeeId INTEGER,
saleDate DATE,
price NUMERIC(12, 2),
qty INTEGER,
PRIMARY KEY(saleId)
);
CREATE INDEX sales_k1 ON sales(clientId, branchId, productId, employeeId, saleDate, price, qty);
This table will grow to *many* rows in the future.
I want to make a function that returns the FIRS saleId of the sale that matches some conditions. I will
always receive the Client Id, but not always the other arguments (sent as NULLs).
The fetched resultset shoud prioritize the passed arguments, and after that, the saleDate, price
and quantity.
/**
* Finds the first sale that matches the conditions received.
* @param $1 Client Id.
* @param $2 Preferred Branch Id.
* @param $3 Preferred Product Id.
* @param $4 Preferred Employee Id.
* @return Sale Id if found, NULL if not.
*/
CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER) RETURNS INTEGER AS '
DECLARE
a_clientId ALIAS FOR $1;
a_branchId ALIAS FOR $1;
a_productId ALIAS FOR $1;
a_employeeId ALIAS FOR $1;
r_result INTEGER;
BEGIN
SELECT
INTO r_result employeeId
FROM
sales
WHERE
clientId=a_clientId AND
branchId=coalesce(a_branchId, branchId) AND /*branchId is null? anything will be ok*/
productId=coalesce(a_productId, productId) AND /*productId is null? anything will be ok*/
employeeId=coalesce(a_employeeId, employeeId) /*employeeId is null? anything will be ok*/
ORDER BY
clientId, branchId, productId, employeeId, saleDate, price, qty
LIMIT 1;
RETURN r_result;
END;
' LANGUAGE 'plpgsql';
Will findSale() in the future, when I have *many* rows still use the index when only the first couple of
arguments are passed to the function?
If not, should I create more indexes (and functions) for each possible argument combination? (of course, with
the given order)
The thing here is that I don't understand how postgreSQL solves the query when the COALESCEs are used... it uses
the index now, with a few thowsand records, but what will happen in a few months?
Thanks in advance.
I have created a simplified example of a real case, to show you what I'm tryng to do. I have
a table, like this:
CREATE TABLE sales (
saleId SERIAL,
clientId INTEGER,
branchId INTEGER,
productId INTEGER,
employeeId INTEGER,
saleDate DATE,
price NUMERIC(12, 2),
qty INTEGER,
PRIMARY KEY(saleId)
);
CREATE INDEX sales_k1 ON sales(clientId, branchId, productId, employeeId, saleDate, price, qty);
This table will grow to *many* rows in the future.
I want to make a function that returns the FIRS saleId of the sale that matches some conditions. I will
always receive the Client Id, but not always the other arguments (sent as NULLs).
The fetched resultset shoud prioritize the passed arguments, and after that, the saleDate, price
and quantity.
/**
* Finds the first sale that matches the conditions received.
* @param $1 Client Id.
* @param $2 Preferred Branch Id.
* @param $3 Preferred Product Id.
* @param $4 Preferred Employee Id.
* @return Sale Id if found, NULL if not.
*/
CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER) RETURNS INTEGER AS '
DECLARE
a_clientId ALIAS FOR $1;
a_branchId ALIAS FOR $1;
a_productId ALIAS FOR $1;
a_employeeId ALIAS FOR $1;
r_result INTEGER;
BEGIN
SELECT
INTO r_result employeeId
FROM
sales
WHERE
clientId=a_clientId AND
branchId=coalesce(a_branchId, branchId) AND /*branchId is null? anything will be ok*/
productId=coalesce(a_productId, productId) AND /*productId is null? anything will be ok*/
employeeId=coalesce(a_employeeId, employeeId) /*employeeId is null? anything will be ok*/
ORDER BY
clientId, branchId, productId, employeeId, saleDate, price, qty
LIMIT 1;
RETURN r_result;
END;
' LANGUAGE 'plpgsql';
Will findSale() in the future, when I have *many* rows still use the index when only the first couple of
arguments are passed to the function?
If not, should I create more indexes (and functions) for each possible argument combination? (of course, with
the given order)
The thing here is that I don't understand how postgreSQL solves the query when the COALESCEs are used... it uses
the index now, with a few thowsand records, but what will happen in a few months?
Thanks in advance.
Attachment
On Fri, 2003-07-25 at 11:52, Franco Bruno Borghesi wrote: [snip] > > > Will findSale() in the future, when I have *many* rows still use the > index when only the first couple of > arguments are passed to the function? > If not, should I create more indexes (and functions) for each possible > argument combination? (of course, with > the given order) > > The thing here is that I don't understand how postgreSQL solves the > query when the COALESCEs are used... it uses > the index now, with a few thowsand records, but what will happen in a > few months? When faced with cases like this, I cobble together a script/program that generates a few million rows of random data (within the confines of FKs, of course) to populate these tables like "sales", and then I see how things perform. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
Franco, > CREATE INDEX sales_k1 ON sales(clientId, branchId, productId, > employeeId, saleDate, price, qty); A 7-column index is unlikely to be effective -- the index will be almost as large as the table. Try indexing only the first 3-4 columns instead. > I want to make a function that returns the FIRS saleId of the sale that > matches some conditions. I will > always receive the Client Id, but not always the other arguments (sent > as NULLs). Well, keep in mind that your multi-column index will only be useful if all columns are queried starting from the left. That is, the index will be ignored if you have a "where productId = x" without a "where branchid = y". > CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER) > RETURNS INTEGER AS ' > DECLARE > a_clientId ALIAS FOR $1; > a_branchId ALIAS FOR $1; > a_productId ALIAS FOR $1; > a_employeeId ALIAS FOR $1; Your aliases are wrong here. > branchId=coalesce(a_branchId, branchId) AND /*branchId is null? > anything will be ok*/ > productId=coalesce(a_productId, productId) AND /*productId is > null? anything will be ok*/ On a very large table this will be very inefficient. you'll be comparing the productid, for example, even if no productid is passed ... and the index won't do you any good because the planner should figure out that 100% of rows match the condition. Instead, I recommend that you build up a dynamic query as a string and then pass only the conditions sent by the user. You can then EXECUTE the query and loop through it for a result. Of course, YMMV. My approach will require you to create more indexes which could be a problem if you have limited disk space. -- -Josh Berkus Aglio Database Solutions San Francisco
what you say is that the index is not effective because of its size, but it would still be used *if* the conditions are right... In this case, I care about performance, not space.
But what you say about the index not being good because 100% of rows match the condition confirms what I suspected.
Thanks for your help.
On Fri, 2003-07-25 at 14:28, Josh Berkus wrote:
But what you say about the index not being good because 100% of rows match the condition confirms what I suspected.
Thanks for your help.
On Fri, 2003-07-25 at 14:28, Josh Berkus wrote:
Franco, > CREATE INDEX sales_k1 ON sales(clientId, branchId, productId, > employeeId, saleDate, price, qty); A 7-column index is unlikely to be effective -- the index will be almost as large as the table. Try indexing only the first 3-4 columns instead. > I want to make a function that returns the FIRS saleId of the sale that > matches some conditions. I will > always receive the Client Id, but not always the other arguments (sent > as NULLs). Well, keep in mind that your multi-column index will only be useful if all columns are queried starting from the left. That is, the index will be ignored if you have a "where productId = x" without a "where branchid = y". > CREATE OR REPLACE FUNCTION findSale(INTEGER, INTEGER, INTEGER, INTEGER) > RETURNS INTEGER AS ' > DECLARE > a_clientId ALIAS FOR $1; > a_branchId ALIAS FOR $1; > a_productId ALIAS FOR $1; > a_employeeId ALIAS FOR $1; Your aliases are wrong here. > branchId=coalesce(a_branchId, branchId) AND /*branchId is null? > anything will be ok*/ > productId=coalesce(a_productId, productId) AND /*productId is > null? anything will be ok*/ On a very large table this will be very inefficient. you'll be comparing the productid, for example, even if no productid is passed ... and the index won't do you any good because the planner should figure out that 100% of rows match the condition. Instead, I recommend that you build up a dynamic query as a string and then pass only the conditions sent by the user. You can then EXECUTE the query and loop through it for a result. Of course, YMMV. My approach will require you to create more indexes which could be a problem if you have limited disk space.