Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: Using BRIN indexes for sorted output
Date
Msg-id 23a61b2e-e6cb-0c07-0316-447487fc313e@enterprisedb.com
Whole thread Raw
In response to Re: PATCH: Using BRIN indexes for sorted output  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 2/18/23 19:51, Justin Pryzby wrote:
> Are (any of) these patches targetting v16 ?
> 

Probably not. Maybe if there's more feedback / scrutiny, but I'm not
sure one commitfest is enough to polish the patch (especially
considering I haven't done much on the costing yet).

> typos:
> ar we - we are?
> morestly - mostly
> interstect - intersect
> 
>> + * XXX We don't sort the bins, so just do binary sort. For large number of values
>> + * this might be an issue, for small number of values a linear search is fine.
> 
> "binary sort" is wrong?
> 
>> + * only half of there ranges, thus 1/2. This can be extended to randomly
> 
> half of *these* ranges ?
> 

Thanks, I'll fix those.

>> From 7b3307c27b35ece119feab4891f03749250e454b Mon Sep 17 00:00:00 2001
>> From: Tomas Vondra <tomas.vondra@postgresql.org>
>> Date: Mon, 17 Oct 2022 18:39:28 +0200
>> Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics
> 
> I think the idea can also apply to btree - currently, correlation is
> considered to be a property of a column, but not an index.  But that
> fails to distinguish between a freshly built index, and an index with
> out of order heap references, which can cause an index scan to be a lot
> more expensive.
> 
> I implemented per-index correlation stats way back when:
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com
> 
> See also:
> https://www.postgresql.org/message-id/14438.1512499811@sss.pgh.pa.us
> 
> With my old test case:
> 
> Index scan is 3x slower than bitmap scan, but index scan is costed as
> being cheaper:
> 
> postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
>  Index Scan using t_i_idx on t  (cost=0.43..21153.74 rows=130912 width=8) (actual time=0.107..222.737 rows=128914
loops=1)
> 
> postgres=# SET enable_indexscan =no;
> postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
>  Bitmap Heap Scan on t  (cost=2834.28..26895.96 rows=130912 width=8) (actual time=16.830..69.860 rows=128914
loops=1)
> 
> If it's clustered, then the index scan is almost twice as fast, and the
> costs are more consistent with the associated time.  The planner assumes
> that the indexes are freshly built...
> 
> postgres=# CLUSTER t USING t_i_idx ;
> postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
>  Index Scan using t_i_idx on t  (cost=0.43..20121.74 rows=130912 width=8) (actual time=0.084..117.549 rows=128914
loops=1)
> 

Yeah, the concept of indexam statistics certainly applies to other index
types, and for btree we might collect information about correlation etc.
I haven't looked at the 2017 patch, but it seems reasonable.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Next
From: Andres Freund
Date:
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED