Thread: [PATCH] minor optimization for ineq_histogram_selectivity()
Hello, When studying the weird planner issue reported here [1], I came up with the attached patch. It reduces the probability of calling get_actual_variable_range(). The patch applies to the master branch. How to test : CREATE TABLE foo (a bigint, b TEXT) WITH (autovacuum_enabled = off); INSERT INTO foo SELECT i%213, md5(i::text) from generate_series(1,1000000) i; VACUUM ANALYZE foo; SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname='a'\gx CREATE INDEX ON foo(a); DELETE FROM foo WHERE a = 212; EXPLAIN (BUFFERS) SELECT count(a) FROM foo WHERE a > 208; Without this patch, you will observe at least 4694 shared hits (which are mostly heap fetches). If you apply the patch, you will observe very few of them. You should run the EXPLAIN on a standby, if you want to observe the heap fetches more than one time (because of killed index tuples being ignored). Best regards, Frédéric [1] https://www.postgresql.org/message-id/flat/CAECtzeVPM4Oi6dTdqVQmjoLkDBVChNj7ed3hNs1RGrBbwCJ7Cw%40mail.gmail.com
Attachment
On 10/24/22 17:26, Frédéric Yhuel wrote: > Hello, > > When studying the weird planner issue reported here [1], I came up with > the attached patch. It reduces the probability of calling > get_actual_variable_range(). > > The patch applies to the master branch. > > How to test : > > CREATE TABLE foo (a bigint, b TEXT) WITH (autovacuum_enabled = off); > INSERT INTO foo SELECT i%213, md5(i::text) from > generate_series(1,1000000) i; > VACUUM ANALYZE foo; > SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname='a'\gx > CREATE INDEX ON foo(a); > DELETE FROM foo WHERE a = 212; > EXPLAIN (BUFFERS) SELECT count(a) FROM foo WHERE a > 208; > With the above example, the variables "lobound", "hibound", and "probe" would vary like this : without patch : lobound hibound probe --------------------------------------- 0 101 50 51 101 76 77 101 89 90 101 95 96 101 98 99 101 100 99 100 99 99 99 with patch : lobound hibound probe --------------------------------------- 0 101 50 51 101 75 76 101 88 89 101 94 95 101 97 98 101 99 98 99 98 99 99 So we find the correct right end of the histogram bin (99) in both cases, but "probe" doesn't reach 100 in the latter one, and get_actual_variable_range() is never called. Now, if we'd run the query SELECT count(a) FROM foo WHERE a > 211 : without patch : lobound hibound probe --------------------------------------- 0 101 50 51 101 76 77 101 89 90 101 95 96 101 98 99 101 100 99 100 99 100 100 with patch : lobound hibound probe --------------------------------------- 0 101 50 51 101 75 76 101 88 89 101 94 95 101 97 98 101 99 100 101 100 100 100 Here, the correct right end of the histogram bin (100) is also found is both cases. I'm well aware that an example doesn't prove the correctness of an algorithm, though. Best regards, Frédéric
On 10/24/22 17:26, Frédéric Yhuel wrote: > Hello, > > When studying the weird planner issue reported here [1], I came up with > the attached patch. It reduces the probability of calling > get_actual_variable_range(). This isn't very useful anymore thanks to this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c Unless we want to save a hundred page reads in rare cases.
=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes: > On 10/24/22 17:26, Frédéric Yhuel wrote: >> When studying the weird planner issue reported here [1], I came up with >> the attached patch. It reduces the probability of calling >> get_actual_variable_range(). > This isn't very useful anymore thanks to this patch: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c I hadn't looked at this patch before, but now that I have, I'm inclined to reject it anyway. It just moves the problem around: now, instead of possibly doing an unnecessary index probe at the right end, you're possibly doing an unnecessary index probe at the left end. It also looks quite weird compared to the normal coding of binary search. I wonder if there'd be something to be said for leaving the initial probe calculation alone and doing this: else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2) + { + /* Don't probe the endpoint until we have to. */ + if (probe > lobound) + probe--; + else have_end = get_actual_variable_range(root, vardata, sslot.staop, collation, NULL, &sslot.values[probe]); + } On the whole though, it seems like a wart. regards, tom lane
On 11/23/22 16:59, Tom Lane wrote: > =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes: >> On 10/24/22 17:26, Frédéric Yhuel wrote: >>> When studying the weird planner issue reported here [1], I came up with >>> the attached patch. It reduces the probability of calling >>> get_actual_variable_range(). > >> This isn't very useful anymore thanks to this patch: >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c > > I hadn't looked at this patch before, but now that I have, I'm inclined > to reject it anyway. It just moves the problem around: now, instead of > possibly doing an unnecessary index probe at the right end, you're > possibly doing an unnecessary index probe at the left end. Indeed... it seemed to me that both versions would do an unnecessary index probe at the left end, but I wasn't careful enough :-/ > It also > looks quite weird compared to the normal coding of binary search. > That's right. > I wonder if there'd be something to be said for leaving the initial > probe calculation alone and doing this: > > else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2) > + { > + /* Don't probe the endpoint until we have to. */ > + if (probe > lobound) > + probe--; > + else > have_end = get_actual_variable_range(root, > vardata, > sslot.staop, > collation, > NULL, > &sslot.values[probe]); > + } > > On the whole though, it seems like a wart. > > Yeah... it's probably wiser not risking introducing a bug, only to save an index probe in rare cases (and only 100 reads, thanks to 9c6ad5ea). Thank you for having had a look at it. Best regards, Frédéric