Re: Tid scan improvements - Mailing list pgsql-hackers

From David Rowley
Subject Re: Tid scan improvements
Date
Msg-id CAKJS1f9f+1CEyT3_mv9kN-Le=npqS-0iyZgqPDgCrFYKEAACKA@mail.gmail.com
Whole thread Raw
In response to Re: Tid scan improvements  (Edmund Horner <ejrh00@gmail.com>)
Responses Re: Tid scan improvements
List pgsql-hackers
On Mon, 4 Feb 2019 at 18:37, Edmund Horner <ejrh00@gmail.com> wrote:
> 1. v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patch

I think 0001 is good to go. It's a clear improvement over what we do today.

(t1 = 1 million row table with a single int column.)

Patched:
# explain (analyze, timing off) select * from t1 where ctid < '(1, 90)';
 Seq Scan on t1  (cost=0.00..16925.00 rows=315 width=4) (actual
rows=315 loops=1)

# explain (analyze, timing off) select * from t1 where ctid <= '(1, 90)';
 Seq Scan on t1  (cost=0.00..16925.00 rows=316 width=4) (actual
rows=316 loops=1)

Master:
# explain (analyze, timing off) select * from t1 where ctid < '(1, 90)';
 Seq Scan on t1  (cost=0.00..16925.00 rows=333333 width=4) (actual
rows=315 loops=1)

# explain (analyze, timing off) select * from t1 where ctid <= '(1, 90)';
 Seq Scan on t1  (cost=0.00..16925.00 rows=333333 width=4) (actual
rows=316 loops=1)

The only possible risk I can foresee is that it may be more likely we
underestimate the selectivity and that causes something like a nested
loop join due to the estimation being, say 1 row.

It could happen in a case like:

SELECT * FROM bloated_table WHERE ctid >= <last ctid that would exist
without bloat>

but I don't think we should keep using DEFAULT_INEQ_SEL just in case
this happens. We could probably fix 90% of those cases by returning 2
rows instead of 1.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: Timeout parameters
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Special role for subscriptions