Thread: bitmaps and correlation

bitmaps and correlation

From
Justin Pryzby
Date:
It's a new year and I'm getting reflective, so resuming a portion of
conversation we had here:
https://www.postgresql.org/message-id/CAMkU%3D1yVbwEAugaCmKWxjaX15ZduWee45%2B_DqCw--d_3N_O_%3DQ%40mail.gmail.com

Find attached patch which implements use of correlation statistic in costing
for bitmap scans.

An opened question in my mind is how to combine the correlation statistic with
existing cost_per_page:

 . sqrt(a^2+b^2)
 . MIN()
 . multiply existing computation by new correlation component

On Wed, Dec 20, 2017 at 09:55:40PM -0800, Jeff Janes wrote:
> On Tue, Dec 19, 2017 at 7:25 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> 
> > I started playing with this weeks ago (probably during Vitaliy's problem
> > report).  Is there any reason cost_bitmap_heap_scan shouldn't interpolate
> > based on correlation from seq_page_cost to rand_page_cost, same as
> > cost_index ?
> 
> I think that doing something like that is a good idea in general, but someone
> has to implement the code, and so far no one seems enthused to do so.  You
> seem pretty interested in the topic, so....

I tested patch using CDR data which was causing issues for us years ago:
https://www.postgresql.org/message-id/20160524173914.GA11880%40telsasoft.com

Note: since that original problem report:
 . the SAN is backed by SSD rather than rotational storage;
 . we're using relkind=p partitioned tables;
 . PG12 uses pread() rather than lstat()+read(), so overhead of seek()+read()
  is avoided (but probably wasn't a substantial component of the problem);

Unpatched, I'm unable to get bitmap scan without disabling indexscan and seqscan.
| Bitmap Heap Scan on cdrs_huawei_pgwrecord_2018_12_25  (cost=55764.07..1974230.46 rows=2295379 width=1375)
|   Recheck Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime <=
'2018-12-2510:00:00-06'::timestamp with time zone))
 
|   ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2018_12_25_recordopeningtime_idx  (cost=0.00..55190.22 rows=2295379
width=0)
|         Index Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime
<='2018-12-25 10:00:00-06'::timestamp with time zone))
 

Patched, I get bitmap scan when random_page_cost is reduced enough that startup
cost (index scan component) is not prohibitive.  But for simplicity, this just
forces bitmap by setting enable_indexscan=off;
| Bitmap Heap Scan on cdrs_huawei_pgwrecord_2018_12_25  (cost=55764.07..527057.45 rows=2295379 width=1375)
|   Recheck Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime <=
'2018-12-2510:00:00-06'::timestamp with time zone))
 
|   ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2018_12_25_recordopeningtime_idx  (cost=0.00..55190.22 rows=2295379
width=0)
|         Index Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime
<='2018-12-25 10:00:00-06'::timestamp with time zone))
 

That addresses the issue we had in part.  A remaining problem is that
correlation fails to distinguish between "fresh" index and fragmented index,
and so heap access of a correlated index may looks cheaper than it is.  Which
is why I still have to set random_page_cost to get bitmap scan.

Justin


Re: bitmaps and correlation

From
Justin Pryzby
Date:
Attached for real.

Attachment

Re: bitmaps and correlation

From
Justin Pryzby
Date:
Attached is a fixed and rebasified patch for cfbot.
Included inline for conceptual review.


From f3055a5696924427dda280da702c41d2d2796a24 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 1 Jan 2019 16:17:28 -0600
Subject: [PATCH v2] Use correlation statistic in costing bitmap scans..

Same as for an index scan, an uncorrelated bitmap (like modulus) which access a
certain number of pages across the entire length of a table should have cost
estimate heavily weighted by random access, compared to an bitmap scan which
accesses same number of pages across a small portion of the table.

Note, Tom points out that there are cases where a column could be
tightly-clumped without being hightly-ordered.  Since we have correlation
already, we use that for now, and if someone creates a statistic for
clumpiness, we'll re-evaluate at some later date.
---
 src/backend/optimizer/path/costsize.c | 84 ++++++++++++++++++++++++++++-------
 src/backend/optimizer/path/indxpath.c |  8 ++--
 src/include/nodes/pathnodes.h         |  3 ++
 src/include/optimizer/cost.h          |  2 +-
 4 files changed, 77 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c5f6593..aaac29a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -549,11 +549,12 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 
     /*
      * Save amcostestimate's results for possible use in bitmap scan planning.
-     * We don't bother to save indexStartupCost or indexCorrelation, because a
-     * bitmap scan doesn't care about either.
+     * We don't bother to save indexStartupCost, because a
+     * bitmap scan doesn't care.
      */
     path->indextotalcost = indexTotalCost;
     path->indexselectivity = indexSelectivity;
+    path->indexCorrelation = indexCorrelation;
 
     /* all costs for touching index itself included here */
     startup_cost += indexStartupCost;
@@ -986,12 +987,33 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
      * appropriate to charge spc_seq_page_cost apiece.  The effect is
      * nonlinear, too. For lack of a better idea, interpolate like this to
      * determine the cost per page.
+     * Note this works at PAGE granularity, so even if we read 1% of a
+     * table's tuples, if we have to read nearly every page, it should be
+     * considered sequential.
      */
-    if (pages_fetched >= 2.0)
+    if (pages_fetched >= 2.0) {
+        double correlation, cost_per_page2;
         cost_per_page = spc_random_page_cost -
             (spc_random_page_cost - spc_seq_page_cost)
             * sqrt(pages_fetched / T);
-    else
+
+        // XXX: interpolate based on correlation from seq_page_cost to rand_page_cost?
+        // a highly correlated bitmap scan 1) likely reads fewer pages; and,
+        // 2) at higher "density" (more sequential).
+        // double correlation = get_indexpath_correlation(root, bitmapqual);
+        correlation = ((IndexPath *)bitmapqual)->indexCorrelation;
+        cost_per_page2 = spc_seq_page_cost +
+            (1-correlation*correlation)*(spc_random_page_cost - spc_seq_page_cost);  // XXX: *sqrt(pages_fetched/T) ?
+        // There are two variables: fraction of pages(T) and correlation.
+        // If T is high, giving sequential reads, we want low cost_per_page regardless of correlation.
+        // If correlation is high, we (probably) want low cost per page.
+        // ...the exception is if someone does an =ANY() query on a list of non-consecutive values.
+        // Something like start_time=ANY('2017-01-01', '2017-02-01',...)
+        // which reads small number of rows from pages all across the length of the table.
+        // But index scan doesn't seem to do address that at all, so leave it alone for now.
+        cost_per_page=Min(cost_per_page, cost_per_page2);
+        // cost_per_page=sqrt(cost_per_page*cost_per_page + cost_per_page2*cost_per_page2);
+    } else
         cost_per_page = spc_random_page_cost;
 
     run_cost += pages_fetched * cost_per_page;
@@ -1035,15 +1057,16 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 
 /*
  * cost_bitmap_tree_node
- *        Extract cost and selectivity from a bitmap tree node (index/and/or)
+ *        Extract cost, selectivity, and correlation from a bitmap tree node (index/and/or)
  */
 void
-cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec)
+cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec, double *correlation)
 {
     if (IsA(path, IndexPath))
     {
         *cost = ((IndexPath *) path)->indextotalcost;
         *selec = ((IndexPath *) path)->indexselectivity;
+        if (correlation) *correlation = ((IndexPath *) path)->indexCorrelation;
 
         /*
          * Charge a small amount per retrieved tuple to reflect the costs of
@@ -1057,11 +1080,13 @@ cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec)
     {
         *cost = path->total_cost;
         *selec = ((BitmapAndPath *) path)->bitmapselectivity;
+        if (correlation) *correlation = ((BitmapAndPath *) path)->bitmapcorrelation;
     }
     else if (IsA(path, BitmapOrPath))
     {
         *cost = path->total_cost;
         *selec = ((BitmapOrPath *) path)->bitmapselectivity;
+        if (correlation) *correlation = ((BitmapOrPath *) path)->bitmapcorrelation;
     }
     else
     {
@@ -1084,8 +1109,9 @@ void
 cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root)
 {
     Cost        totalCost;
-    Selectivity selec;
+    Selectivity selec, minsubselec;
     ListCell   *l;
+    double        correlation;
 
     /*
      * We estimate AND selectivity on the assumption that the inputs are
@@ -1097,22 +1123,32 @@ cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root)
      * definitely too simplistic?
      */
     totalCost = 0.0;
-    selec = 1.0;
+    minsubselec = selec = 1.0;
+    correlation = 0;
     foreach(l, path->bitmapquals)
     {
         Path       *subpath = (Path *) lfirst(l);
         Cost        subCost;
         Selectivity subselec;
+        double        subcorrelation;
 
-        cost_bitmap_tree_node(subpath, &subCost, &subselec);
+        cost_bitmap_tree_node(subpath, &subCost, &subselec, &subcorrelation);
 
         selec *= subselec;
 
+        /* For an AND node, use the correlation of its most-selective subpath */
+        if (subselec<=minsubselec) {
+                correlation = subcorrelation;
+                minsubselec = subselec;
+        }
+
         totalCost += subCost;
         if (l != list_head(path->bitmapquals))
+            // ??? XXX && !IsA(subpath, IndexPath))
             totalCost += 100.0 * cpu_operator_cost;
     }
     path->bitmapselectivity = selec;
+    path->bitmapcorrelation = correlation;
     path->path.rows = 0;        /* per above, not used */
     path->path.startup_cost = totalCost;
     path->path.total_cost = totalCost;
@@ -1128,8 +1164,9 @@ void
 cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root)
 {
     Cost        totalCost;
-    Selectivity selec;
+    Selectivity selec, maxsubselec;
     ListCell   *l;
+    double        correlation;
 
     /*
      * We estimate OR selectivity on the assumption that the inputs are
@@ -1142,23 +1179,32 @@ cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root)
      * optimized out when the inputs are BitmapIndexScans.
      */
     totalCost = 0.0;
-    selec = 0.0;
+    maxsubselec = selec = 0.0;
+    correlation = 0;
     foreach(l, path->bitmapquals)
     {
         Path       *subpath = (Path *) lfirst(l);
         Cost        subCost;
         Selectivity subselec;
+        double        subcorrelation;
 
-        cost_bitmap_tree_node(subpath, &subCost, &subselec);
+        cost_bitmap_tree_node(subpath, &subCost, &subselec, &subcorrelation);
 
         selec += subselec;
 
+        /* For an OR node, use the correlation of its least-selective subpath */
+        if (subselec>=maxsubselec) {
+                correlation = subcorrelation;
+                maxsubselec = subselec;
+        }
+
         totalCost += subCost;
         if (l != list_head(path->bitmapquals) &&
             !IsA(subpath, IndexPath))
             totalCost += 100.0 * cpu_operator_cost;
     }
     path->bitmapselectivity = Min(selec, 1.0);
+    path->bitmapcorrelation = correlation;
     path->path.rows = 0;        /* per above, not used */
     path->path.startup_cost = totalCost;
     path->path.total_cost = totalCost;
@@ -5510,8 +5556,11 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
 {
     Cost        indexTotalCost;
     Selectivity indexSelectivity;
+    double        indexCorrelation;
     double        T;
-    double        pages_fetched;
+    double        pages_fetched,
+                pages_fetchedMIN,
+                pages_fetchedMAX;
     double        tuples_fetched;
     double        heap_pages;
     long        maxentries;
@@ -5520,7 +5569,7 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
      * Fetch total cost of obtaining the bitmap, as well as its total
      * selectivity.
      */
-    cost_bitmap_tree_node(bitmapqual, &indexTotalCost, &indexSelectivity);
+    cost_bitmap_tree_node(bitmapqual, &indexTotalCost, &indexSelectivity, &indexCorrelation);
 
     /*
      * Estimate number of main-table pages fetched.
@@ -5534,7 +5583,12 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
      * the same as the Mackert and Lohman formula for the case T <= b (ie, no
      * re-reads needed).
      */
-    pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
+    pages_fetchedMAX = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
+
+    /* pages_fetchedMIN is for the perfectly correlated case (csquared=1) */
+    pages_fetchedMIN = ceil(indexSelectivity * (double) baserel->pages);
+
+    pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);
 
     /*
      * Calculate the number of pages fetched from the heap.  Then based on
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 37b257c..2a3db34 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1467,8 +1467,8 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
             Selectivity nselec;
             Selectivity oselec;
 
-            cost_bitmap_tree_node(pathinfo->path, &ncost, &nselec);
-            cost_bitmap_tree_node(pathinfoarray[i]->path, &ocost, &oselec);
+            cost_bitmap_tree_node(pathinfo->path, &ncost, &nselec, NULL);
+            cost_bitmap_tree_node(pathinfoarray[i]->path, &ocost, &oselec, NULL);
             if (ncost < ocost)
                 pathinfoarray[i] = pathinfo;
         }
@@ -1580,8 +1580,8 @@ path_usage_comparator(const void *a, const void *b)
     Selectivity aselec;
     Selectivity bselec;
 
-    cost_bitmap_tree_node(pa->path, &acost, &aselec);
-    cost_bitmap_tree_node(pb->path, &bcost, &bselec);
+    cost_bitmap_tree_node(pa->path, &acost, &aselec, NULL);
+    cost_bitmap_tree_node(pb->path, &bcost, &bselec, NULL);
 
     /*
      * If costs are the same, sort by selectivity.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 23a06d7..beaac03 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1181,6 +1181,7 @@ typedef struct IndexPath
     ScanDirection indexscandir;
     Cost        indextotalcost;
     Selectivity indexselectivity;
+    double        indexCorrelation;
 } IndexPath;
 
 /*
@@ -1261,6 +1262,7 @@ typedef struct BitmapAndPath
     Path        path;
     List       *bitmapquals;    /* IndexPaths and BitmapOrPaths */
     Selectivity bitmapselectivity;
+    double        bitmapcorrelation;
 } BitmapAndPath;
 
 /*
@@ -1274,6 +1276,7 @@ typedef struct BitmapOrPath
     Path        path;
     List       *bitmapquals;    /* IndexPaths and BitmapAndPaths */
     Selectivity bitmapselectivity;
+    double        bitmapcorrelation;
 } BitmapOrPath;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b3d0b4f..9a28665 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -79,7 +79,7 @@ extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *bas
                                   Path *bitmapqual, double loop_count);
 extern void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root);
 extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
-extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
+extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec, double *correlation);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
                          RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
-- 
2.7.4


Attachment

Re: bitmaps and correlation

From
Michael Paquier
Date:
On Sat, Nov 02, 2019 at 03:26:17PM -0500, Justin Pryzby wrote:
> Attached is a fixed and rebasified patch for cfbot.
> Included inline for conceptual review.

Your patch still causes two regression tests to fail per Mr Robot's
report: join and select.  Could you look at those problems?  I have
moved the patch to next CF, waiting on author.
--
Michael

Attachment

Re: bitmaps and correlation

From
Justin Pryzby
Date:
On Sun, Dec 01, 2019 at 12:34:37PM +0900, Michael Paquier wrote:
> On Sat, Nov 02, 2019 at 03:26:17PM -0500, Justin Pryzby wrote:
> > Attached is a fixed and rebasified patch for cfbot.
> > Included inline for conceptual review.
> 
> Your patch still causes two regression tests to fail per Mr Robot's
> report: join and select.  Could you look at those problems?  I have
> moved the patch to next CF, waiting on author.

The regression failures seem to be due to deliberate, expected plan changes.

I think the patch still needs some discussion, but find attached rebasified
patch including regression diffs.

Justin

Attachment

Re: bitmaps and correlation

From
Justin Pryzby
Date:
Find attached cleaned up patch.
For now, I updated the regress/expected/, but I think the test maybe has to be
updated to do what it was written to do.

Attachment

Re: bitmaps and correlation

From
Dilip Kumar
Date:
On Tue, Jan 7, 2020 at 1:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Find attached cleaned up patch.
> For now, I updated the regress/expected/, but I think the test maybe has to be
> updated to do what it was written to do.

I have noticed that in "cost_index" we have used the indexCorrelation
for computing the run_cost, not the number of pages whereas in your
patch you have used it for computing the number of pages.  Any reason
for the same?

cost_index
{
..
/*
* Now interpolate based on estimated index order correlation to get total
* disk I/O cost for main table accesses.
*/
csquared = indexCorrelation * indexCorrelation;
run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
}

Patch
- pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
+ pages_fetchedMAX = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
+
+ /* pages_fetchedMIN is for the perfectly correlated case (csquared=1) */
+ pages_fetchedMIN = ceil(indexSelectivity * (double) baserel->pages);
+
+ pages_fetched = pages_fetchedMAX +
indexCorrelation*indexCorrelation*(pages_fetchedMIN -
pages_fetchedMAX);

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: bitmaps and correlation

From
Justin Pryzby
Date:
On Tue, Jan 07, 2020 at 09:21:03AM +0530, Dilip Kumar wrote:
> On Tue, Jan 7, 2020 at 1:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > Find attached cleaned up patch.
> > For now, I updated the regress/expected/, but I think the test maybe has to be
> > updated to do what it was written to do.
> 
> I have noticed that in "cost_index" we have used the indexCorrelation
> for computing the run_cost, not the number of pages whereas in your
> patch you have used it for computing the number of pages.  Any reason
> for the same?

As Jeff has pointed out, high correlation has two effects in cost_index():
1) the number of pages read will be less;
2) the pages will be read more sequentially;

cost_index reuses the pages_fetched variable, so (1) isn't particularly clear,
but does essentially:

                /* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
                pages_fetched(MIN) = index_pages_fetched(tuples_fetched,
                                                    baserel->pages,
                                                    (double) index->pages,
                                                    root);
                max_IO_cost = pages_fetchedMIN * spc_random_page_cost;

                /* min_IO_cost is for the perfectly correlated case (csquared=1) */
                pages_fetched(MAX) = ceil(indexSelectivity * (double) baserel->pages);
                min_IO_cost = (pages_fetchedMAX - 1) * spc_seq_page_cost;


My patch 1) changes compute_bitmap_pages() to interpolate pages_fetched using
the correlation; pages_fetchedMIN is new:

> Patch
> - pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
> + pages_fetchedMAX = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
> +
> + /* pages_fetchedMIN is for the perfectly correlated case (csquared=1) */
> + pages_fetchedMIN = ceil(indexSelectivity * (double) baserel->pages);
> +
> + pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);

And, 2) also computes cost_per_page by interpolation between seq_page and
random_page cost:

+               cost_per_page_corr = spc_random_page_cost -
+                       (spc_random_page_cost - spc_seq_page_cost)
+                       * (1-correlation*correlation);

Thanks for looking.  I'll update the name of pages_fetchedMIN/MAX in my patch
for consistency with cost_index.

Justin



Re: bitmaps and correlation

From
Justin Pryzby
Date:
On Mon, Jan 06, 2020 at 11:26:06PM -0600, Justin Pryzby wrote:
> As Jeff has pointed out, high correlation has two effects in cost_index():
> 1) the number of pages read will be less;
> 2) the pages will be read more sequentially;
> 
> cost_index reuses the pages_fetched variable, so (1) isn't particularly clear,

I tried to make this more clear in 0001

> +               cost_per_page_corr = spc_random_page_cost -
> +                       (spc_random_page_cost - spc_seq_page_cost)
> +                       * (1-correlation*correlation);

And fixed bug: this should be c*c not 1-c*c.

Attachment

Re: bitmaps and correlation

From
Justin Pryzby
Date:
There were no comments last month, so rebased, fixed tests, and kicked to next
CF.

-- 
Justin

Attachment

Re: bitmaps and correlation

From
Anastasia Lubennikova
Date:
Status update for a commitfest entry

According to cfbot, the patch fails to apply.  Could you please send a rebased version?

I wonder why this patch hangs so long without a review. Maybe it will help to move discussion forward, if you provide
moreexamples of queries that can benefit from this imporovement?
 

The first patch is simply a refactoring and don't see any possible objections against it.
The second patch also looks fine to me. The logic is understandable and the code is neat.

It wouldn't hurt to add a comment for this computation, though.
+    pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);

The new status of this patch is: Waiting on Author

Re: bitmaps and correlation

From
Justin Pryzby
Date:
On Fri, Nov 06, 2020 at 01:51:26PM +0000, Anastasia Lubennikova wrote:
> I wonder why this patch hangs so long without a review. Maybe it will help to move discussion forward, if you provide
moreexamples of queries that can benefit from this imporovement?
 

Thanks for looking.

The explanation is that the planner currently gives index scans a cost
"discount" for correlation.  Jeff Janes has pointed out that there are two
discounts: 1) fewer pages are read; and, 2) lower cost-per-page.  This patch
aims to give bitmap scans the same benefits.  A "dense" bitmap will read fewer
pages, more sequentially.

Tom pointed out that the "correlation" isn't a perfect metric for this, since
the index might be "clumpy" without being well-ordered, which doesn't matter
for bitmap scans, which access in physical order anyway.  In those cases, the
correlation logic would fail to reduce the estimated cost of bitmap scan, even
though the actual cost is less (same as now).  This is true, but my goal is to
give bitmap scans at least the same benefit as index scans, even if there's
additional "discounts" which aren't yet being considered.

This was an issue for me in the past when the planner chose a to scan a index,
but it was slower than projected (for reasons unrelated to this patch).  Making
bitmap cost account for high correlation was one step towards addressing that.
Since then, we switched to brin indexes, which force bitmap scans.
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com

Here's an example.

CREATE TABLE t AS SELECT a,b FROM generate_series(1,999) a, generate_series(1,999) b ORDER BY a+b/9.9;
CREATE INDEX ON t(a);

postgres=# SELECT attname, correlation FROM pg_stats WHERE tablename ='t';
 a       |   0.9951819
 b       |  0.10415093

postgres=# explain analyze SELECT * FROM t WHERE a BETWEEN 55 AND 77;
 Index Scan using t_a_idx on t  (cost=0.42..810.89 rows=22683 width=8) (actual time=0.292..66.657 rows=22977 loops=1)

vs (without my patch, with SET enable_indexscan=off);
 Bitmap Heap Scan on t  (cost=316.93..5073.17 rows=22683 width=8) (actual time=10.810..26.633 rows=22977 loops=1)

vs (with my patch, with SET enable_indexscan=off):
postgres=# explain analyze SELECT * FROM t WHERE a BETWEEN 55 AND 77;
 Bitmap Heap Scan on t  (cost=316.93..823.84 rows=22683 width=8) (actual time=10.742..33.279 rows=22977 loops=1)

So bitmap scan is cheaper, but the cost estimate is a lot higher.  My patch
improves but doesn't completely "fix" that - bitmap scan is still costed as
more expensive, but happens to be.  This is probably not even a particularly
good example, as it's a small table cached in RAM.  There's always going to be
cases like this, certainly near the costs where the plan changes "shape".  I
think a cost difference of 10 here is very reasonable (cpu_oper_cost,
probably), but a cost difference of 5x is not.

There's not many regression tests changed.  Probably partially because bitmap
scans have an overhead (the heap scan cannot start until after the index scan
finishes), and we avoid large tests.

If there's no interest in the patch, I guess we should just close it rather
than letting it rot.

> The first patch is simply a refactoring and don't see any possible objections against it.
> The second patch also looks fine to me. The logic is understandable and the code is neat.
> 
> It wouldn't hurt to add a comment for this computation, though.
> +    pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);

You're right.  It's like this:
// interpolate between c==0: pages_fetched=max and c==1: pages_fetched=min
pages_fetched = min + (max-min)*(1-c**2) 
pages_fetched = min + max*(1-c**2) - min*(1-c**2)
pages_fetched = max*(1-c**2) + min - min*(1-c**2)
pages_fetched = max*(1-c**2) + min*(c**2)
pages_fetched = max - max*c**2 + min*(c**2)
pages_fetched = max + min*(c**2) - max*c**2
pages_fetched = max + c**2 * (min-max)

I'm not sure if there's a reason why it's written like that, but (min-max)
looks odd, so I wrote it like:
pages_fetched = max - c**2 * (max-min)

> The new status of this patch is: Waiting on Author

Attachment

Re: bitmaps and correlation

From
Heikki Linnakangas
Date:
On 06/11/2020 19:57, Justin Pryzby wrote:
> On Fri, Nov 06, 2020 at 01:51:26PM +0000, Anastasia Lubennikova wrote:
>> The first patch is simply a refactoring and don't see any possible objections against it.
>> The second patch also looks fine to me. The logic is understandable and the code is neat.

+1

>> It wouldn't hurt to add a comment for this computation, though.
>> +    pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);
> 
> You're right.  It's like this:
> // interpolate between c==0: pages_fetched=max and c==1: pages_fetched=min
> pages_fetched = min + (max-min)*(1-c**2)
> pages_fetched = min + max*(1-c**2) - min*(1-c**2)
> pages_fetched = max*(1-c**2) + min - min*(1-c**2)
> pages_fetched = max*(1-c**2) + min*(c**2)
> pages_fetched = max - max*c**2 + min*(c**2)
> pages_fetched = max + min*(c**2) - max*c**2
> pages_fetched = max + c**2 * (min-max)
> 
> I'm not sure if there's a reason why it's written like that, but (min-max)
> looks odd, so I wrote it like:
> pages_fetched = max - c**2 * (max-min)

I agree min-max looks odd. max - c**2 * (max-min) looks a bit odd too, 
though. Whatever we do here, though, I'd suggest that we keep it 
consistent with cost_index().

Other than that, and a quick pgdindent run, this seems ready to me. I'll 
mark it as Ready for Committer.

- Heikki



Re: bitmaps and correlation

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Other than that, and a quick pgdindent run, this seems ready to me. I'll 
> mark it as Ready for Committer.

I dunno, this seems largely misguided to me.

It's already the case that index correlation is just not the right
stat for this purpose, since it doesn't give you much of a toehold
on whether a particular scan is going to be accessing tightly-clumped
data.  For specific kinds of index conditions, such as a range query
on a btree index, maybe you could draw that conclusion ... but this
patch isn't paying any attention to the index condition in use.

And then the rules for bitmap AND and OR correlations, if not just
plucked out of the air, still seem *far* too optimistic.  As an
example, even if my individual indexes are perfectly correlated and
so a probe would touch only one page, OR'ing ten such probes together
is likely going to touch ten different pages.  But unless I'm
misreading the patch, it's going to report back an OR correlation
that corresponds to touching one page.

Even if we assume that the correlation is nonetheless predictive of
how big a part of the table we'll be examining, I don't see a lot
of basis for deciding that the equations the patch adds to
cost_bitmap_heap_scan are the right ones.

I'd have expected this thread to focus a whole lot more on actual
examples than it has done, so that we could have some confidence
that these equations have something to do with reality.

            regards, tom lane



Re: bitmaps and correlation

From
Masahiko Sawada
Date:
On Sat, Nov 28, 2020 at 5:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > Other than that, and a quick pgdindent run, this seems ready to me. I'll
> > mark it as Ready for Committer.
>
> I dunno, this seems largely misguided to me.
>
> It's already the case that index correlation is just not the right
> stat for this purpose, since it doesn't give you much of a toehold
> on whether a particular scan is going to be accessing tightly-clumped
> data.  For specific kinds of index conditions, such as a range query
> on a btree index, maybe you could draw that conclusion ... but this
> patch isn't paying any attention to the index condition in use.
>
> And then the rules for bitmap AND and OR correlations, if not just
> plucked out of the air, still seem *far* too optimistic.  As an
> example, even if my individual indexes are perfectly correlated and
> so a probe would touch only one page, OR'ing ten such probes together
> is likely going to touch ten different pages.  But unless I'm
> misreading the patch, it's going to report back an OR correlation
> that corresponds to touching one page.
>
> Even if we assume that the correlation is nonetheless predictive of
> how big a part of the table we'll be examining, I don't see a lot
> of basis for deciding that the equations the patch adds to
> cost_bitmap_heap_scan are the right ones.
>
> I'd have expected this thread to focus a whole lot more on actual
> examples than it has done, so that we could have some confidence
> that these equations have something to do with reality.
>

Status update for a commitfest entry.

The discussion has been inactive since the end of the last CF. It
seems to me that we need some discussion on the point Tom mentioned.
It looks either "Needs Review" or "Ready for Committer" status but
Justin set it to "Waiting on Author" on 2020-12-03 by himself. Are you
working on this, Justin?

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Re: bitmaps and correlation

From
Masahiko Sawada
Date:
On Thu, Jan 28, 2021 at 9:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Nov 28, 2020 at 5:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > > Other than that, and a quick pgdindent run, this seems ready to me. I'll
> > > mark it as Ready for Committer.
> >
> > I dunno, this seems largely misguided to me.
> >
> > It's already the case that index correlation is just not the right
> > stat for this purpose, since it doesn't give you much of a toehold
> > on whether a particular scan is going to be accessing tightly-clumped
> > data.  For specific kinds of index conditions, such as a range query
> > on a btree index, maybe you could draw that conclusion ... but this
> > patch isn't paying any attention to the index condition in use.
> >
> > And then the rules for bitmap AND and OR correlations, if not just
> > plucked out of the air, still seem *far* too optimistic.  As an
> > example, even if my individual indexes are perfectly correlated and
> > so a probe would touch only one page, OR'ing ten such probes together
> > is likely going to touch ten different pages.  But unless I'm
> > misreading the patch, it's going to report back an OR correlation
> > that corresponds to touching one page.
> >
> > Even if we assume that the correlation is nonetheless predictive of
> > how big a part of the table we'll be examining, I don't see a lot
> > of basis for deciding that the equations the patch adds to
> > cost_bitmap_heap_scan are the right ones.
> >
> > I'd have expected this thread to focus a whole lot more on actual
> > examples than it has done, so that we could have some confidence
> > that these equations have something to do with reality.
> >
>
> Status update for a commitfest entry.
>
> The discussion has been inactive since the end of the last CF. It
> seems to me that we need some discussion on the point Tom mentioned.
> It looks either "Needs Review" or "Ready for Committer" status but
> Justin set it to "Waiting on Author" on 2020-12-03 by himself. Are you
> working on this, Justin?
>

Status update for a commitfest entry.

This patch, which you submitted to this CommitFest, has been awaiting
your attention for more than one month. As such, we have moved it to
"Returned with Feedback" and removed it from the reviewing queue.
Depending on timing, this may be reversable, so let us know if there
are extenuating circumstances. In any case, you are welcome to address
the feedback you have received, and resubmit the patch to the next
CommitFest.

Thank you for contributing to PostgreSQL.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/