Hi,
this is a more general question than about specific PostgreSQL
implementation, but may be PostgreSQL has some potential specific
answer, and hopefully you can give me some general hints.
We have a large data base of, let's call them "Acts" (in our case
it is clinical observations, but could just as well be purchase orders
or account transactions, etc.)
CREATE TABLE Act ( id OID NOT NULL PRIMARY KEY, subject_id OID NOT NULL FOREIGN KEY REFERENCES
Subject(id), type_code VARCHAR NOT NULL FOREIGN KEY REFERENCES Act_type(code), time TIMESTAMP NOT NULL, /*
detail... */
);
CREATE INDEX ON Act(subject_id, type_code, time);
Now let's have 200 million of these acts on 2 million subjects in our
test database. The production version will have about 100 times that
amount.
A very frequent decision support query we have to make is to get the
last act of a certain type that occurred before a certain cut-off
date. Worse, yet, the cutoff date is quite likely even a correlated
subquery, but that's for later. For now, I'm always struck by the
difficulty of asking the simple quesion of the most recent act of
a certain type before a cutoff time:
SELECT Act.* FROM (SELECT subject_id, type_code, MAX(time) AS time FROM Act WHERE Act.type_code =
$ACT_TYPE AND Act.time < $CUTOFF_TIME GROUP BY subject_id, type_code) last INNER JOIN Act
ON(Act.subject_id, Act.type_code, Act.time) =(last.subject_id, last.type_code, last.time);
I'm not so worried about the complexity of writing this query but
how inefficient it appears to be.
1. select the Acts of the specific type code
2. sort by subject_id, type_code and time
3. join the result again with the Act table, selecting the items again via random access
Wouldn't it be nice to somehow retain the sort that happened for the
GROUP BY operation, and just read the last item of each group with
same group-by key?
Can PostgreSQL do something like that? Is my desire for some
optimization flawed? Am I overlooking an existing, well-known
optimization one can do with plain SQL?
I would greatly appreciate your advice or ideas
-Gunther
PS: The a materialized view or summary table would not help in this
case because we're asking for the last value before a certain
cutoff time.
--
Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org
Medical Information Scientist Regenstrief Institute for Health Care
Adjunct Assistant Professor Indiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org