Thread: [PATCH] minor optimization for ineq_histogram_selectivity()

[PATCH] minor optimization for ineq_histogram_selectivity()

From
Frédéric Yhuel
Date:
Hello,

When studying the weird planner issue reported here [1], I came up with 
the attached patch. It reduces the probability of calling 
get_actual_variable_range().

The patch applies to the master branch.

How to test :

CREATE TABLE foo (a bigint, b TEXT) WITH (autovacuum_enabled = off);
INSERT INTO foo SELECT i%213, md5(i::text) from 
generate_series(1,1000000) i;
VACUUM ANALYZE foo;
SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname='a'\gx
CREATE INDEX ON foo(a);
DELETE FROM foo WHERE a = 212;
EXPLAIN (BUFFERS) SELECT count(a) FROM foo WHERE a > 208;

Without this patch, you will observe at least 4694 shared hits (which 
are mostly heap fetches). If you apply the patch, you will observe very 
few of them.

You should run the EXPLAIN on a standby, if you want to observe the heap 
fetches more than one time (because of killed index tuples being ignored).

Best regards,
Frédéric

[1] 
https://www.postgresql.org/message-id/flat/CAECtzeVPM4Oi6dTdqVQmjoLkDBVChNj7ed3hNs1RGrBbwCJ7Cw%40mail.gmail.com
Attachment

Re: [PATCH] minor optimization for ineq_histogram_selectivity()

From
Frédéric Yhuel
Date:

On 10/24/22 17:26, Frédéric Yhuel wrote:
> Hello,
> 
> When studying the weird planner issue reported here [1], I came up with 
> the attached patch. It reduces the probability of calling 
> get_actual_variable_range().
> 
> The patch applies to the master branch.
> 
> How to test :
> 
> CREATE TABLE foo (a bigint, b TEXT) WITH (autovacuum_enabled = off);
> INSERT INTO foo SELECT i%213, md5(i::text) from 
> generate_series(1,1000000) i;
> VACUUM ANALYZE foo;
> SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname='a'\gx
> CREATE INDEX ON foo(a);
> DELETE FROM foo WHERE a = 212;
> EXPLAIN (BUFFERS) SELECT count(a) FROM foo WHERE a > 208;
> 

With the above example, the variables "lobound", "hibound", and "probe" 
would vary like this :

without patch :

lobound         hibound          probe
---------------------------------------
0               101              50
51              101              76
77              101              89
90              101              95
96              101              98
99              101              100
99              100              99
99              99


with patch :

lobound         hibound          probe
---------------------------------------
0               101              50
51              101              75
76              101              88
89              101              94
95              101              97
98              101              99
98              99               98
99              99

So we find the correct right end of the histogram bin (99) in both 
cases, but "probe" doesn't reach 100 in the latter one, and
get_actual_variable_range() is never called.

Now, if we'd run the query SELECT count(a) FROM foo WHERE a > 211 :

without patch :

lobound         hibound          probe
---------------------------------------
0               101              50
51              101              76
77              101              89
90              101              95
96              101              98
99              101              100
99              100              99
100             100

with patch :

lobound         hibound          probe
---------------------------------------
0               101              50
51              101              75
76              101              88
89              101              94
95              101              97
98              101              99
100             101              100
100             100


Here, the correct right end of the histogram bin (100) is also found is 
both cases.

I'm well aware that an example doesn't prove the correctness of an 
algorithm, though.

Best regards,
Frédéric



Re: [PATCH] minor optimization for ineq_histogram_selectivity()

From
Frédéric Yhuel
Date:

On 10/24/22 17:26, Frédéric Yhuel wrote:
> Hello,
> 
> When studying the weird planner issue reported here [1], I came up with 
> the attached patch. It reduces the probability of calling 
> get_actual_variable_range().

This isn't very useful anymore thanks to this patch: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c

Unless we want to save a hundred page reads in rare cases.



Re: [PATCH] minor optimization for ineq_histogram_selectivity()

From
Tom Lane
Date:
=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:
> On 10/24/22 17:26, Frédéric Yhuel wrote:
>> When studying the weird planner issue reported here [1], I came up with
>> the attached patch. It reduces the probability of calling
>> get_actual_variable_range().

> This isn't very useful anymore thanks to this patch:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c

I hadn't looked at this patch before, but now that I have, I'm inclined
to reject it anyway.  It just moves the problem around: now, instead of
possibly doing an unnecessary index probe at the right end, you're
possibly doing an unnecessary index probe at the left end.  It also
looks quite weird compared to the normal coding of binary search.

I wonder if there'd be something to be said for leaving the initial
probe calculation alone and doing this:

                else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
+               {
+                   /* Don't probe the endpoint until we have to. */
+                   if (probe > lobound)
+                       probe--;
+                   else
                    have_end = get_actual_variable_range(root,
                                                         vardata,
                                                         sslot.staop,
                                                         collation,
                                                         NULL,
                                                         &sslot.values[probe]);
+               }

On the whole though, it seems like a wart.

            regards, tom lane



Re: [PATCH] minor optimization for ineq_histogram_selectivity()

From
Frédéric Yhuel
Date:

On 11/23/22 16:59, Tom Lane wrote:
> =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:
>> On 10/24/22 17:26, Frédéric Yhuel wrote:
>>> When studying the weird planner issue reported here [1], I came up with
>>> the attached patch. It reduces the probability of calling
>>> get_actual_variable_range().
> 
>> This isn't very useful anymore thanks to this patch:
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c
> 
> I hadn't looked at this patch before, but now that I have, I'm inclined
> to reject it anyway.  It just moves the problem around: now, instead of
> possibly doing an unnecessary index probe at the right end, you're
> possibly doing an unnecessary index probe at the left end.

Indeed... it seemed to me that both versions would do an unnecessary 
index probe at the left end, but I wasn't careful enough :-/

> It also
> looks quite weird compared to the normal coding of binary search.
> 

That's right.

> I wonder if there'd be something to be said for leaving the initial
> probe calculation alone and doing this:
> 
>                  else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
> +               {
> +                   /* Don't probe the endpoint until we have to. */
> +                   if (probe > lobound)
> +                       probe--;
> +                   else
>                      have_end = get_actual_variable_range(root,
>                                                           vardata,
>                                                           sslot.staop,
>                                                           collation,
>                                                           NULL,
>                                                           &sslot.values[probe]);
> +               }
> 
> On the whole though, it seems like a wart.
> 
>

Yeah... it's probably wiser not risking introducing a bug, only to save 
an index probe in rare cases (and only 100 reads, thanks to 9c6ad5ea).

Thank you for having had a look at it.

Best regards,
Frédéric