Thread: adjust_limit_rows_costs algorithm

adjust_limit_rows_costs algorithm

From

wenhui qiu

Date:

24 December 2024, 09:20:15

Hi Hackers

There should be many people who have encountered the problem of order by col limit 1 without index filtering,Here is an example of my test

create extension pg_trgm ;

CREATE TABLE public.dba_users (
userid integer primary key,
username character varying(64),
password character varying(64)
);

sysbench=# create extension pg_trgm ;
CREATE EXTENSION
sysbench=# CREATE TABLE public.dba_users (
sysbench(#
sysbench(# userid integer primary key,
sysbench(#
sysbench(# username character varying(64),
sysbench(#
sysbench(# password character varying(64)
sysbench(#
sysbench(# );
CREATE TABLE
sysbench=# insert into dba_users select generate_series(1,6000000),md5(random()::varchar(64)),md5(random()::varchar(64));
INSERT 0 6000000
sysbench=# CREATE INDEX dba_users_username_idx ON public.dba_users USING gin (username gin_trgm_ops);
CREATE INDEX
sysbench=#
sysbench=#
sysbench=#
sysbench=#
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..408.59 rows=1 width=4) (actual time=1433.469..1433.470 rows=0 loops=1)
-> Index Scan using dba_users_pkey on dba_users (cost=0.43..244892.51 rows=600 width=4) (actual time=1433.468..1433.468 rows=0 loops=1)
Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Filter: 6000000
Planning Time: 0.384 ms
Execution Time: 1433.489 ms
(6 rows)

sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid+0 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2300.03..2300.03 rows=1 width=8) (actual time=54.562..54.563 rows=0 loops=1)
-> Sort (cost=2300.03..2301.53 rows=600 width=8) (actual time=54.560..54.561 rows=0 loops=1)
Sort Key: ((userid + 0))
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on dba_users (cost=57.22..2297.03 rows=600 width=8) (actual time=54.526..54.527 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41310
Heap Blocks: exact=31810
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.307..7.307 rows=41310 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Planning Time: 0.238 ms
Execution Time: 54.625 ms
(12 rows)

/*
 * adjust_limit_rows_costs
 * Adjust the size and cost estimates for a LimitPath node according to the
 * offset/limit.
 *
 * This is only a cosmetic issue if we are at top level, but if we are
 * building a subquery then it's important to report correct info to the outer
 * planner.
 *
 * When the offset or count couldn't be estimated, use 10% of the estimated
 * number of rows emitted from the subpath.
 *
 * XXX we don't bother to add eval costs of the offset/limit expressions
 * themselves to the path costs. In theory we should, but in most cases those
 * expressions are trivial and it's just not worth the trouble.
 */
void
adjust_limit_rows_costs(double *rows, /* in/out parameter */
Cost *startup_cost, /* in/out parameter */
Cost *total_cost, /* in/out parameter */
int64 offset_est,
int64 count_est)
{
double input_rows = *rows;
Cost input_startup_cost = *startup_cost;
Cost input_total_cost = *total_cost;

if (offset_est != 0)
 {
double offset_rows;

if (offset_est > 0)
 offset_rows = (double) offset_est;
else
 offset_rows = clamp_row_est(input_rows * 0.10);
if (offset_rows > *rows)
 offset_rows = *rows;
if (input_rows > 0)
*startup_cost +=
 (input_total_cost - input_startup_cost)
* offset_rows / input_rows;
*rows -= offset_rows;
if (*rows < 1)
*rows = 1;
 }

if (count_est != 0)
 {
double count_rows;

if (count_est > 0)
 count_rows = (double) count_est;
else
 count_rows = clamp_row_est(input_rows * 0.10);
if (count_rows > *rows)
 count_rows = *rows;
if (input_rows > 0)
*total_cost = *startup_cost +
 (input_total_cost - input_startup_cost)
* count_rows / input_rows;
*rows = count_rows;
if (*rows < 1)
*rows = 1;
 }
}

I think the total_cost calculation method cannot be linear, because the data distribution is not linear in reality. I try to change it to a logarithmic

$ git diff
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 4f74caf..2ab59e9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4074,7 +4074,7 @@ adjust_limit_rows_costs(double *rows, /* in/out parameter */
if (input_rows > 0)
*total_cost = *startup_cost +
(input_total_cost - input_startup_cost)
- * count_rows / input_rows;
+ * log2(1+count_rows) / log2(1+input_rows);
*rows = count_rows;
if (*rows < 1)
*rows = 1;

$ source env_pg.sh 5466 pg18limit pgdatalimit
postgres@db-benchmark-master-> pg_ctl -D /data/pgsql/pgdatalimit -m fast stop
waiting for server to shut down..... done
server stopped
postgres@db-benchmark-master-> /opt/app/pg18limit/bin/pg_ctl -D /data/pgsql/pgdatalimit start
waiting for server to start....2024-12-24 14:01:28.653 +08 [276263] LOG: redirecting log output to logging collector process
2024-12-24 14:01:28.653 +08 [276263] HINT: Future log output will appear in directory "log".
done
server started
postgres@db-benchmark-master-> psql
psql (18devel)
Type "help" for help.

postgres=# \c sysbench
You are now connected to database "sysbench" as user "postgres".

sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2298.53..2298.69 rows=1 width=4) (actual time=200.202..200.204 rows=0 loops=1)
Buffers: shared hit=12958 read=18851
-> Sort (cost=2298.53..2300.03 rows=600 width=4) (actual time=200.200..200.202 rows=0 loops=1)
Sort Key: userid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=12958 read=18851
-> Bitmap Heap Scan on dba_users (cost=57.22..2295.53 rows=600 width=4) (actual time=200.195..200.196 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=12958 read=18851
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.992..7.993 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=1 read=13
Planning:
Buffers: shared hit=24 read=5
Planning Time: 0.569 ms
Execution Time: 200.283 ms
(18 rows)

sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid+0 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2300.03..2300.19 rows=1 width=8) (actual time=55.165..55.166 rows=0 loops=1)
Buffers: shared hit=31809
-> Sort (cost=2300.03..2301.53 rows=600 width=8) (actual time=55.164..55.165 rows=0 loops=1)
Sort Key: ((userid + 0))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31809
-> Bitmap Heap Scan on dba_users (cost=57.22..2297.03 rows=600 width=8) (actual time=55.160..55.160 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=31809
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.080..7.080 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=14
Planning:
Buffers: shared hit=1
Planning Time: 0.119 ms
Execution Time: 55.187 ms
(18 rows)

sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2298.53..2298.69 rows=1 width=4) (actual time=56.483..56.484 rows=0 loops=1)
Buffers: shared hit=31809
-> Sort (cost=2298.53..2300.03 rows=600 width=4) (actual time=56.481..56.483 rows=0 loops=1)
Sort Key: userid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31809
-> Bitmap Heap Scan on dba_users (cost=57.22..2295.53 rows=600 width=4) (actual time=56.476..56.477 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=31809
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.284..7.285 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=14
Planning:
Buffers: shared hit=1
Planning Time: 0.145 ms
Execution Time: 56.531 ms
(18 rows)

#

The reality may be more complicated, I mean there is no better way for the community leaders to solve this problem, because the community will never say no to the hint, and there are so many such problems

Thanks

Re: adjust_limit_rows_costs algorithm

From

Heikki Linnakangas

Date:

25 December 2024, 19:11:34

On 24/12/2024 08:20, wenhui qiu wrote:
> sysbench=# explain analyze  select userid from dba_users where  username 
> like '%aaaaaaaaaaaa%' order by userid limit 1;
>                                                                   QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.43..408.59 rows=1 width=4) (actual 
> time=1433.469..1433.470 rows=0 loops=1)
>     ->  Index Scan using dba_users_pkey on dba_users 
>   (cost=0.43..244892.51 rows=600 width=4) (actual 
> time=1433.468..1433.468 rows=0 loops=1)
>           Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
>           Rows Removed by Filter: 6000000
>   Planning Time: 0.384 ms
>   Execution Time: 1433.489 ms
> (6 rows)

The problem seems to be that the planner estimates the LIKE qual to 
match 600 rows, but in reality it matches 0.

> I think the total_cost calculation method cannot be linear, because the 
> data distribution is not linear in reality. I try to change it to a 
> logarithmic

Why? I don't see anything wrong with the LIMIT estimation here.

One factor here is that the execution time of the Index Scan + Limit 
plan depends heavily on whether there are 0 matches, or more. With 0 
matches, the Index Scans needs to scan the whole table. With 1 rows, it 
needs to scan just half of the table on average before finding the 
match. With 10 rows, just 10 % of the table, and so forth. So the 
execution time is indeed highly non-linear depending on whether there 
are any matches or not. Is that what you meant?

It'd be nice to somehow take that non-linearity into account in the 
estimation. But I don't know how, and I don't think that change in LIMIT 
estimation is the right way.

> The reality may be more complicated, I mean there is no better way for 
> the community leaders to solve this problem, because the community will 
> never say no to the hint, and there are so many such problems

It's a difficult topic because "hints" mean different things to 
different people. Meanwhile, we have added things like "ALTER FUNCTION 
... ROWS <rows>" and support functions which can do more accurate 
estimation of functions. That's a kind of a hint. If someone comes up 
with more ways to help the planner with selectivity estimation, it's 
worth considering.

Or maybe you could improve the selectivity estimator of the LIKE 
operator to be more accurate to begin with.

-- 
Heikki Linnakangas
Neon (https://neon.tech)