Re: Tid scan improvements - Mailing list pgsql-hackers

From David Rowley
Subject Re: Tid scan improvements
Date
Msg-id CAKJS1f8Ap6cFKeBbTnHXpOcgq5iXV1UhwA7ZiP4fxxx1LArbwQ@mail.gmail.com
Whole thread Raw
In response to Re: Tid scan improvements  (Edmund Horner <ejrh00@gmail.com>)
Responses Re: Tid scan improvements  (Edmund Horner <ejrh00@gmail.com>)
List pgsql-hackers
On 28 September 2018 at 18:13, Edmund Horner <ejrh00@gmail.com> wrote:
> On Fri, 28 Sep 2018 at 17:02, Edmund Horner <ejrh00@gmail.com> wrote:
>> I did run pgindent over it though. :)
>
> But I didn't check if it still applied to master.  Sigh.  Here's one that does.

I know commit fest is over, but I made a pass of this to hopefully
provide a bit of guidance so that it's closer for the November 'fest.

I've only done light testing on the patch and it does seem to work,
but there are a few things that I think should be changed. Most
importantly #11 below I think needs to be done. That might overwrite
some of the items that come before it in the list as you likely will
have to pull some of code which I mention out out due to changing #11.
I've kept them around anyway just in case some of it remains.

1. Could wrap for tables > 16TB. Please use double. See index_pages_fetched()

int nrandompages;
int nseqpages;

2. Should multiply by nseqpages, not add.

run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost + nseqpages;

3. Should be double:

BlockNumber pages = selectivity * baserel->pages;

4. Comment needs updated to mention what the new code does in
heapgettup() and heapgettup_pagemode()

+
  /* start from last page of the scan */
- if (scan->rs_startblock > 0)
- page = scan->rs_startblock - 1;
+ if (scan->rs_numblocks == InvalidBlockNumber)
+ {
+ if (scan->rs_startblock > 0)
+ page = scan->rs_startblock - 1;
+ else
+ page = scan->rs_nblocks - 1;
+ }
  else
- page = scan->rs_nblocks - 1;
+ page = scan->rs_startblock + scan->rs_numblocks - 1;
+

5. Variables should be called "inclusive". We use "strict" to indicate
an operator comparison cannot match NULL values.

+ bool strict; /* Indicates < rather than <=, or > rather */
+ bool strict2; /* than >= */

Don't break the comment like that. If you need more space don't end
the comment and use a new line and tab the next line out to match the
* of the first line.

6. Why not pass the TidExpr into MakeTidOpExprState() and have it set
the type instead of repeating code

7. It's not very obvious why the following Assert() can't fail.

+ bool invert;
+ bool invert2;
+
+ Assert(list_length(((BoolExpr *) expr)->args) == 2);

I had to hunt around quite a bit to see that
TidQualFromBaseRestrictinfo could only ever make the list have 2
elements, and we'd not form a BoolExpr with just 1. (but see #11)

8. Many instances of the word "strict" are used to mean "inclusive".
Can you please change all of them.

9. Confusing comment:

+ * If the expression was non-strict (<=) and the offset is 0, then just
+ * pretend it was strict, because offset 0 doesn't exist and we may as
+ * well exclude that block.

Shouldn't this be, "If the operator is non-inclusive, then since TID
offsets are 1-based, for simplicity, we can just class the expression
as inclusive.", or something along those lines.

10. Comment talks about LHS, but the first OpExpr in a list of two
OpExprs has nothing to do with left hand side.  You could use LHS if
you were talking about the first arg in an OpExpr, but this is not the
case here.

/* If the LHS is not the lower bound, swap them. */

You could probably just ensure that the >=, > ops is the first in the
list inside TidQualFromBaseRestrictinfo(), but you'd need to clearly
comment that this is the case in both locations. Perhaps use lcons()
for the lower end and lappend() for the upper end, but see #11.

11. I think the qual matching code needs an overhaul.  Really you
should attempt to find the smallest and largest ctid for your
implicitly ANDed ranges.  This would require you getting rid of the
BETWEEN type claused you're trying to build in
TidQualFromBaseRestrictinfo
and instead just include all quals, don't ignore other quals when
you've already found your complete range bounds.

The problem with doing it the way that you're doing it now is in cases like:

create table t1(a int);
insert into t1 select generate_Series(1,10000000);
create index on t1 (a);
select ctid,a from t1 order by a desc limit 1; -- find the max ctid.
    ctid     |    a
-------------+----------
 (44247,178) | 10000000
(1 row)

set max_parallel_workers_per_gather=0;
explain analyze select ctid,* from t1 where ctid > '(0,0)' and ctid <=
'(44247,178)' and ctid <= '(0,1)';
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Tid Scan on t1  (cost=0.01..169248.78 rows=1 width=10) (actual
time=0.042..2123.432 rows=1 loops=1)
   TID Cond: ((ctid > '(0,0)'::tid) AND (ctid <= '(44247,178)'::tid))
   Filter: (ctid <= '(0,1)'::tid)
   Rows Removed by Filter: 9999999
 Planning Time: 4.049 ms
 Execution Time: 2123.464 ms
(6 rows)

Due to how you've coded TidQualFromBaseRestrictinfo(), the ctid <=
'(0,1)' qual does not make it into the range. It's left as a filter in
the Tid Scan.

I think I'm going to stop here as changing this going to cause quite a
bit of churn.

but one more...

12. I think the changes to selfuncs.c to get the selectivity estimate
is a fairly credible idea, but I think it also needs to account for
offsets. You should be able to work out the average number of items
per page with rel->tuples / rel->pages and factor that in to get a
better estimate for cases like:

postgres=# explain analyze select ctid,* from t1 where ctid <= '(0,200)';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Tid Scan on t1  (cost=0.00..5.00 rows=1 width=10) (actual
time=0.025..0.065 rows=200 loops=1)
   TID Cond: (ctid <= '(0,200)'::tid)
 Planning Time: 0.081 ms
 Execution Time: 0.088 ms
(4 rows)

You can likely add on "(offset / avg_tuples_per_page) / rel->pages" to
the selectivity and get a fairly accurate estimate... at least when
there are no dead tuples in the heap

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: SerializeParamList vs machines with strict alignment
Next
From: Andres Freund
Date:
Subject: Re: Pluggable Storage - Andres's take