Abbreviated keys for text cost model fix - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Abbreviated keys for text cost model fix
Date
Msg-id CAM3SWZQeCekUShZXgW8_nPuuF5s9eUVt+Fi7LkvBSHmXUUDcZg@mail.gmail.com
Whole thread Raw
Responses Re: Abbreviated keys for text cost model fix  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Over in the abbreviated keys for numeric thread, Tomas Vondra reports
a case where the ad-hoc cost model of abbreviated keys/sortsupport for
text is far too conservative, and aborts a case where abbreviated keys
can still greatly help.

Previously, I proposed additions to the cost model that dealt with all
this, by passing down a reltuples hint from the optimizer or a
relcache entry to tuplesort. Robert seemed to think that this was a
little bit much, and that it could be discussed at a later point. It's
clear that we need to do something about cases like the one reported
by Tomas, though.

Attached patch adds a decay to the threshold that (estimated)
abbreviated key cardinality must cross as a proportion of the
(estimated) cardinality of the original set of strings to be sorted,
while also quadrupling the initial required proportional cardinality
to 20% of full key cardinality (but for just the first 10,000 strings,
before this threshold is decayed aggressively). This seems
significantly less invasive than what I previously proposed, while
still being effective. I suggest that this be treated as a bugfix,
since Tomas reports a 5% -7% regression with his case. OTOH, with this
patch, I saw the same case have a ~300% improvement in performance.

The adjusted cost model strongly prefers to decide to abort early,
which seems desirable. In one way this makes it a little more
vulnerable to skewed sets, since early data matters more, but in
another more important way it's less vulnerable, since perhaps that
skew indicates a general skew in the data, a skew that makes the idea
of measuring the cardinality of abbreviated keys as a proxy for full
key cardinality less useful.

Thoughts?
--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Next
From: Petr Jelinek
Date:
Subject: Re: proposal: searching in array function - array_position