I have a situation that can be summarized to the following:
-- day in 20061215 format
Create table calendar (
day integer unique not null
);
Create table customers (
id serial unique not null,
name varchar,
address varchar,
);
Create table deliveries (
customers_id integer not null references customers(id),
calendar_day integer not null references calendar(day),
delivered bool not null default false,
unique(customers_id, calendar_id)
);
Imagine tens of thousands of customers, a few million deliveries. A query
that's structurally similar to the following query is rather slow. It's
taking over 30 seconds, climbing fast on reasonable HW. (SMP Opteron, 10k
SCSI, 4 GB RAM) If I remove the outer join, performance is < 1 second.
SELECT customers.id as customers_id,
customers.name AS customers_name,
calendar.day AS calendar_day,
CASE WHEN (deliveries.delivered IS NULL) THEN 'n/a'
WHEN (deliveries.delivered=TRUE) THEN 'yes'
ELSE 'no' END AS delivered
FROM customers
JOIN calendars ON
(
-- GIVE A CALENDAR OF POSSIBLE DAYS FOR DELIVERIES
calendar.day < 20061201
AND calendar.day >= 20060101
)
LEFT OUTER JOIN deliveries ON
(
customers.id=deliveries.customers_id
AND deliveries.calendar_day=calendar.day
)
;
What can I do to improve the performance of this oft-used query? Is there a
better way to do this, or am I doomed to looping thru results and parsing the
results in code?
Thanks,
-Ben
--
"I kept looking around for somebody to solve the problem.
Then I realized I am somebody"
-Anonymous