Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes - Mailing list pgsql-hackers

From Marti Raudsepp
Subject Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Date
Msg-id CABRT9RBP92ep4vzr3MHMOoMFP4GDmkTQ3fTmFnq-82X2o12rfA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Dec 20, 2011 at 07:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> it'd likely be better if this code ignored unrecognized qual expression
> types rather than Assert'ing they're not there.

The patch replaced that Assert with an elog(ERROR)

> Hmm.  I am reminded of how utterly unreadable "diff -u" format is for
> anything longer than single-line changes :-( ...

Sorry, the new patch is in context (-C) diff format proper. I also
moved around code a bit and removed an unused variable that was left
around from the refactoring.

> but I think I don't
> like this refactoring much.  Will take a closer look tomorrow.

I was afraid you'd say that, especially for a change that should be
backpatched. But I couldn't think of alternative ways to do it that
give non-bogus estimates.

----

While writing this patch, the largest dilemma was where to account for
the multiple array scans. Given that this code is mostly a heuristic
and I lack a deep understanding of GIN indexes, it's likely that I got
this part wrong.

Currently I'm doing this:
    partialEntriesInQuals *= array_scans;
    exactEntriesInQuals   *= array_scans;
    searchEntriesInQuals  *= array_scans;

Which seems to be the right thing as far as random disk accesses are
concerned (successive scans are more likely to hit the cache) and also
works well with queries that don't touch most of the index. But this
fails spectacularly when multiple full scans are performed e.g. LIKE
ANY ('{%,%,%}'). Because index_pages_fetched() ends up removing all of
the rescan costs.

Another approach is multiplying the total cost from the number of
scans. This overestimates random accesses from rescans, but fixes the
above case:
    *indexTotalCost = (*indexStartupCost + dataPagesFetched *
spc_random_page_cost) * array_scans;

Regards,
Marti

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Replication timeout units
Next
From: Noah Misch
Date:
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe