On Tue, Dec 16, 2003 at 12:11:59PM -0500, Nick Fankhauser wrote:
>
> I'm trying to optimize a query that I *think* should run very fast.
> Essentially, I'm joining two tables that have very selective indexes and
> constraining the query on an indexed field. (There's a third small lookup
> table in the mix, but it doesn't really affect the bottom line.)
>
> actor is a table containing roughly 3 million rows with an index on
> actor_full_name_uppercase and a unique index on actor_id.
>
> actor_summary also contains roughly 3 million rows. Its PK is a unique
> combined index on (actor_id, county_id, case_disp_global_code).
...
> I'm unsure what is happening next. I notice that an index scan is occurring
> on actor_summary_pk, with an "actual time" of 9.15, but then it looks like a
> nested loop occurs at the next level to join these tables. Does this mean
> that each probe of the actor_summary index will take 9.15 msec, but the
> nested loop is going to do this once for each actor_id?
...
> Is there a more efficient means than a nested loop to handle such a join?
> Would a different method be chosen if there was exactly one row in
> actor_summary for every row in actor?
It seems that your basic problem is that you're fetching lots of rows
from two big ol' tables. The innermost estimation mistake being made
by the planner is that the restriction on actor_full_name_uppercase
will be much more selective than it is; it thinks there will be 222
matching actors and in fact there are 3639. But being right about this
wouldn't make things a lot quicker, if it would make them quicker at
all; the index scan for them is taking about 15 seconds and presumably
a sequential scan of that table would be at least in the same ballpark.
Once it's got those rows it needs to look up matches for them in
actor_summary. Again, that's 3639 index scans of an index into a
wide-ish table; your interpretation of the 9.15 is correct. (9 ms *
3639 rows =~ 30 seconds).
It doesn't seem to me that there would be a substantially better plan
for this query with your tables as they stand. If your data were more
normalised, then your big scans might be quicker (because their rows
would be smaller so they would hit fewer disk pages), and the extra
lookups in your detail tables would only be done for the rows which
actually ended up getting returned - but that would hardly be likely
to make an order-of-magnitude difference to your overall speed.
If it were my query and I really really needed it to be considerably
faster, I'd think about hyper-normalising in the hope that my main
tables would shrink so far I could keep them in RAM effectively all
the time. The answers to your direct questions are (1) yes, (2) no,
not really, and (3) no.
Richard