adjust_limit_rows_costs algorithm - Mailing list pgsql-hackers
From | wenhui qiu |
---|---|
Subject | adjust_limit_rows_costs algorithm |
Date | |
Msg-id | CAGjGUALFnohOs203ZCK68sifVvPBLsjQ7-EpjtJtawkonSnnMg@mail.gmail.com Whole thread Raw |
Responses |
Re: adjust_limit_rows_costs algorithm
|
List | pgsql-hackers |
Hi Hackers
I think the total_cost calculation method cannot be linear, because the data distribution is not linear in reality. I try to change it to a logarithmic
There should be many people who have encountered the problem of order by col limit 1 without index filtering,Here is an example of my test
create extension pg_trgm ;
CREATE TABLE public.dba_users (
userid integer primary key,
username character varying(64),
password character varying(64)
);
sysbench=# create extension pg_trgm ;
CREATE EXTENSION
sysbench=# CREATE TABLE public.dba_users (
sysbench(#
sysbench(# userid integer primary key,
sysbench(#
sysbench(# username character varying(64),
sysbench(#
sysbench(# password character varying(64)
sysbench(#
sysbench(# );
CREATE TABLE
sysbench=# insert into dba_users select generate_series(1,6000000),md5(random()::varchar(64)),md5(random()::varchar(64));
INSERT 0 6000000
sysbench=# CREATE INDEX dba_users_username_idx ON public.dba_users USING gin (username gin_trgm_ops);
CREATE INDEX
sysbench=#
sysbench=#
sysbench=#
sysbench=#
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..408.59 rows=1 width=4) (actual time=1433.469..1433.470 rows=0 loops=1)
-> Index Scan using dba_users_pkey on dba_users (cost=0.43..244892.51 rows=600 width=4) (actual time=1433.468..1433.468 rows=0 loops=1)
Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Filter: 6000000
Planning Time: 0.384 ms
Execution Time: 1433.489 ms
(6 rows)
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid+0 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2300.03..2300.03 rows=1 width=8) (actual time=54.562..54.563 rows=0 loops=1)
-> Sort (cost=2300.03..2301.53 rows=600 width=8) (actual time=54.560..54.561 rows=0 loops=1)
Sort Key: ((userid + 0))
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on dba_users (cost=57.22..2297.03 rows=600 width=8) (actual time=54.526..54.527 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41310
Heap Blocks: exact=31810
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.307..7.307 rows=41310 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Planning Time: 0.238 ms
Execution Time: 54.625 ms
(12 rows)
CREATE TABLE public.dba_users (
userid integer primary key,
username character varying(64),
password character varying(64)
);
sysbench=# create extension pg_trgm ;
CREATE EXTENSION
sysbench=# CREATE TABLE public.dba_users (
sysbench(#
sysbench(# userid integer primary key,
sysbench(#
sysbench(# username character varying(64),
sysbench(#
sysbench(# password character varying(64)
sysbench(#
sysbench(# );
CREATE TABLE
sysbench=# insert into dba_users select generate_series(1,6000000),md5(random()::varchar(64)),md5(random()::varchar(64));
INSERT 0 6000000
sysbench=# CREATE INDEX dba_users_username_idx ON public.dba_users USING gin (username gin_trgm_ops);
CREATE INDEX
sysbench=#
sysbench=#
sysbench=#
sysbench=#
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..408.59 rows=1 width=4) (actual time=1433.469..1433.470 rows=0 loops=1)
-> Index Scan using dba_users_pkey on dba_users (cost=0.43..244892.51 rows=600 width=4) (actual time=1433.468..1433.468 rows=0 loops=1)
Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Filter: 6000000
Planning Time: 0.384 ms
Execution Time: 1433.489 ms
(6 rows)
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid+0 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2300.03..2300.03 rows=1 width=8) (actual time=54.562..54.563 rows=0 loops=1)
-> Sort (cost=2300.03..2301.53 rows=600 width=8) (actual time=54.560..54.561 rows=0 loops=1)
Sort Key: ((userid + 0))
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on dba_users (cost=57.22..2297.03 rows=600 width=8) (actual time=54.526..54.527 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41310
Heap Blocks: exact=31810
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.307..7.307 rows=41310 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Planning Time: 0.238 ms
Execution Time: 54.625 ms
(12 rows)
/*
* adjust_limit_rows_costs
* Adjust the size and cost estimates for a LimitPath node according to the
* offset/limit.
*
* This is only a cosmetic issue if we are at top level, but if we are
* building a subquery then it's important to report correct info to the outer
* planner.
*
* When the offset or count couldn't be estimated, use 10% of the estimated
* number of rows emitted from the subpath.
*
* XXX we don't bother to add eval costs of the offset/limit expressions
* themselves to the path costs. In theory we should, but in most cases those
* expressions are trivial and it's just not worth the trouble.
*/
void
adjust_limit_rows_costs(double *rows, /* in/out parameter */
Cost *startup_cost, /* in/out parameter */
Cost *total_cost, /* in/out parameter */
int64 offset_est,
int64 count_est)
{
double input_rows = *rows;
Cost input_startup_cost = *startup_cost;
Cost input_total_cost = *total_cost;
if (offset_est != 0)
{
double offset_rows;
if (offset_est > 0)
offset_rows = (double) offset_est;
else
offset_rows = clamp_row_est(input_rows * 0.10);
if (offset_rows > *rows)
offset_rows = *rows;
if (input_rows > 0)
*startup_cost +=
(input_total_cost - input_startup_cost)
* offset_rows / input_rows;
*rows -= offset_rows;
if (*rows < 1)
*rows = 1;
}
if (count_est != 0)
{
double count_rows;
if (count_est > 0)
count_rows = (double) count_est;
else
count_rows = clamp_row_est(input_rows * 0.10);
if (count_rows > *rows)
count_rows = *rows;
if (input_rows > 0)
*total_cost = *startup_cost +
(input_total_cost - input_startup_cost)
* count_rows / input_rows;
*rows = count_rows;
if (*rows < 1)
*rows = 1;
}
}
$ git diff
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 4f74caf..2ab59e9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4074,7 +4074,7 @@ adjust_limit_rows_costs(double *rows, /* in/out parameter */
if (input_rows > 0)
*total_cost = *startup_cost +
(input_total_cost - input_startup_cost)
- * count_rows / input_rows;
+ * log2(1+count_rows) / log2(1+input_rows);
*rows = count_rows;
if (*rows < 1)
*rows = 1;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 4f74caf..2ab59e9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4074,7 +4074,7 @@ adjust_limit_rows_costs(double *rows, /* in/out parameter */
if (input_rows > 0)
*total_cost = *startup_cost +
(input_total_cost - input_startup_cost)
- * count_rows / input_rows;
+ * log2(1+count_rows) / log2(1+input_rows);
*rows = count_rows;
if (*rows < 1)
*rows = 1;
$ source env_pg.sh 5466 pg18limit pgdatalimit
postgres@db-benchmark-master-> pg_ctl -D /data/pgsql/pgdatalimit -m fast stop
waiting for server to shut down..... done
server stopped
postgres@db-benchmark-master-> /opt/app/pg18limit/bin/pg_ctl -D /data/pgsql/pgdatalimit start
waiting for server to start....2024-12-24 14:01:28.653 +08 [276263] LOG: redirecting log output to logging collector process
2024-12-24 14:01:28.653 +08 [276263] HINT: Future log output will appear in directory "log".
done
server started
postgres@db-benchmark-master-> psql
psql (18devel)
Type "help" for help.
postgres=# \c sysbench
You are now connected to database "sysbench" as user "postgres".
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2298.53..2298.69 rows=1 width=4) (actual time=200.202..200.204 rows=0 loops=1)
Buffers: shared hit=12958 read=18851
-> Sort (cost=2298.53..2300.03 rows=600 width=4) (actual time=200.200..200.202 rows=0 loops=1)
Sort Key: userid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=12958 read=18851
-> Bitmap Heap Scan on dba_users (cost=57.22..2295.53 rows=600 width=4) (actual time=200.195..200.196 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=12958 read=18851
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.992..7.993 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=1 read=13
Planning:
Buffers: shared hit=24 read=5
Planning Time: 0.569 ms
Execution Time: 200.283 ms
(18 rows)
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid+0 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2300.03..2300.19 rows=1 width=8) (actual time=55.165..55.166 rows=0 loops=1)
Buffers: shared hit=31809
-> Sort (cost=2300.03..2301.53 rows=600 width=8) (actual time=55.164..55.165 rows=0 loops=1)
Sort Key: ((userid + 0))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31809
-> Bitmap Heap Scan on dba_users (cost=57.22..2297.03 rows=600 width=8) (actual time=55.160..55.160 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=31809
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.080..7.080 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=14
Planning:
Buffers: shared hit=1
Planning Time: 0.119 ms
Execution Time: 55.187 ms
(18 rows)
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2298.53..2298.69 rows=1 width=4) (actual time=56.483..56.484 rows=0 loops=1)
Buffers: shared hit=31809
-> Sort (cost=2298.53..2300.03 rows=600 width=4) (actual time=56.481..56.483 rows=0 loops=1)
Sort Key: userid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31809
-> Bitmap Heap Scan on dba_users (cost=57.22..2295.53 rows=600 width=4) (actual time=56.476..56.477 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=31809
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.284..7.285 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=14
Planning:
Buffers: shared hit=1
Planning Time: 0.145 ms
Execution Time: 56.531 ms
(18 rows)
#
postgres@db-benchmark-master-> pg_ctl -D /data/pgsql/pgdatalimit -m fast stop
waiting for server to shut down..... done
server stopped
postgres@db-benchmark-master-> /opt/app/pg18limit/bin/pg_ctl -D /data/pgsql/pgdatalimit start
waiting for server to start....2024-12-24 14:01:28.653 +08 [276263] LOG: redirecting log output to logging collector process
2024-12-24 14:01:28.653 +08 [276263] HINT: Future log output will appear in directory "log".
done
server started
postgres@db-benchmark-master-> psql
psql (18devel)
Type "help" for help.
postgres=# \c sysbench
You are now connected to database "sysbench" as user "postgres".
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2298.53..2298.69 rows=1 width=4) (actual time=200.202..200.204 rows=0 loops=1)
Buffers: shared hit=12958 read=18851
-> Sort (cost=2298.53..2300.03 rows=600 width=4) (actual time=200.200..200.202 rows=0 loops=1)
Sort Key: userid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=12958 read=18851
-> Bitmap Heap Scan on dba_users (cost=57.22..2295.53 rows=600 width=4) (actual time=200.195..200.196 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=12958 read=18851
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.992..7.993 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=1 read=13
Planning:
Buffers: shared hit=24 read=5
Planning Time: 0.569 ms
Execution Time: 200.283 ms
(18 rows)
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid+0 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2300.03..2300.19 rows=1 width=8) (actual time=55.165..55.166 rows=0 loops=1)
Buffers: shared hit=31809
-> Sort (cost=2300.03..2301.53 rows=600 width=8) (actual time=55.164..55.165 rows=0 loops=1)
Sort Key: ((userid + 0))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31809
-> Bitmap Heap Scan on dba_users (cost=57.22..2297.03 rows=600 width=8) (actual time=55.160..55.160 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=31809
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.080..7.080 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=14
Planning:
Buffers: shared hit=1
Planning Time: 0.119 ms
Execution Time: 55.187 ms
(18 rows)
sysbench=# explain analyze select userid from dba_users where username like '%aaaaaaaaaaaa%' order by userid limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2298.53..2298.69 rows=1 width=4) (actual time=56.483..56.484 rows=0 loops=1)
Buffers: shared hit=31809
-> Sort (cost=2298.53..2300.03 rows=600 width=4) (actual time=56.481..56.483 rows=0 loops=1)
Sort Key: userid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31809
-> Bitmap Heap Scan on dba_users (cost=57.22..2295.53 rows=600 width=4) (actual time=56.476..56.477 rows=0 loops=1)
Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Rows Removed by Index Recheck: 41242
Heap Blocks: exact=31795
Buffers: shared hit=31809
-> Bitmap Index Scan on dba_users_username_idx (cost=0.00..57.07 rows=600 width=0) (actual time=7.284..7.285 rows=41242 loops=1)
Index Cond: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
Buffers: shared hit=14
Planning:
Buffers: shared hit=1
Planning Time: 0.145 ms
Execution Time: 56.531 ms
(18 rows)
#
The reality may be more complicated, I mean there is no better way for the community leaders to solve this problem, because the community will never say no to the hint, and there are so many such problems
Thanks
pgsql-hackers by date: