Thread: [HACKERS] Poor cost estimate with interaction between table correlation andpartial indexes

Hi.

I'm looking to get started contributing code to Postgres. A small
issue I'm aware of that I think would make a good first contribution
is a poor cost estimate made due to an interaction between table
correlation and partial indexes. Currently the planner assumes that
when an index is perfectly correlated with a table and a range scan is
performed on the index, all of the table page reads performed by the
index scan except for the first one will be sequential reads. While
this assumption is correct for regular indexes, it is not true for
partial indexes.

The assumption holds for regular indexes because the rows
corresponding to two entries in a regular index that is perfectly
correlated with the table are guaranteed to be next to each other in
the table. On the other hand with a partial index perfectly correlated
with a table, there may be rows in the table in between the two rows
corresponding to two adjacent entries in the index that are not
included in the index because they do not satisfy the partial index
predicate.

To make the cost calculation for this case more accurate, I want to
apply the same estimate as the one currently used to estimate the cost
of a bitmap heap scan. The bitmap heap scan cost estimate applies in
this case because both cases involve reading pages from disk ordered
by the location in the table, but where the pages may not be
consecutive.

For the relevant functions, see cost_index and cost_index_heap_scan in
costsize.c.

Thanks,
Michael



Do you think this is a reasonable approach? Should I start working on
a patch based on the solution I described or is there some other
approach I should look into?



On Sat, Aug 26, 2017 at 05:50:26PM -0700, Michael Malis wrote:
> Do you think this is a reasonable approach? Should I start working
> on a patch based on the solution I described or is there some other
> approach I should look into?

You'll get more traction with a proof-of-concept patch accompanying
the plan than without.  Don't bother with any level of care past
proof-of-concept until you get positive feedback.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



(Sorry David. I initially replied only to you)

Ok. I've attached a patch of a proof-of-concept. I have a few
questions about tests.

What is typical workflow to add tests for changes to the planner? Also
I ran make check and it appears one of the existing tests is failing.
What is a typical way for going about discovering why the query plan
for a specific query changed? Also, how should I go about changing the
old test? Should I replace the old test output with the new test
output or modify the old test slightly to get it to produce the same
case as before?

Thanks,
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
Hmm... It seems the command I used for obtaining a patch I got from
here https://wiki.postgresql.org/wiki/Working_with_Git truncated part
of the patch. I've attached the file generated from git diff
--patience master improve-partial-index-correlation-calculation
--no-color > improve-correlated-partial-index-cost-v2.patch to this
email. What is the correct command for generating a context diff?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
Michael Malis wrote:
> Hmm... It seems the command I used for obtaining a patch I got from
> here https://wiki.postgresql.org/wiki/Working_with_Git truncated part
> of the patch. I've attached the file generated from git diff
> --patience master improve-partial-index-correlation-calculation
> --no-color > improve-correlated-partial-index-cost-v2.patch to this
> email. What is the correct command for generating a context diff?

Yeah, I've had patches truncated by that too and I've never cared enough
to see about getting it fixed.  I think it's a bug in the filterdiff
utility, but I got a stupid answer from the Debian maintainer when I
reported it and didn't care to follow up any further.  Eventually I gave
up on using context diffs because of this problem.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Sun, Aug 27, 2017 at 8:31 PM, Michael Malis <michaelmalis2@gmail.com> wrote:
> (Sorry David. I initially replied only to you)
>
> Ok. I've attached a patch of a proof-of-concept. I have a few
> questions about tests.
>
> What is typical workflow to add tests for changes to the planner?

Add submitted patches at commitfest.postgresql.org

> Also
> I ran make check and it appears one of the existing tests is failing.
> What is a typical way for going about discovering why the query plan
> for a specific query changed?

I don't have any magic answer on this point.

> Also, how should I go about changing the
> old test? Should I replace the old test output with the new test
> output or modify the old test slightly to get it to produce the same
> case as before?

That's a judgement call, based on what you think the point of the test was.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company