Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
Date
Msg-id CA+TgmoboNGVJxxea8wfpWhsfxQ1-qPWJ-5eOhZaf9y_GeJoC2A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Tue, Aug 29, 2017 at 1:08 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> (Time in ms)
> Query    head          patch
>
> 6           23819        14571
> 14         13514        11183
> 15         49980         32400
> 20        204441       188978

These are cool results, but this patch is obviously not ready for
prime time as-is, since there are various other references that will
need to be updated:
    * Since we are called as soon as nentries exceeds maxentries, we should    * push nentries down to significantly
lessthan maxentries, or else we'll    * just end up doing this again very soon.  We shoot for maxentries/2.
 
   /*    * With a big bitmap and small work_mem, it's possible that we cannot get    * under maxentries.  Again, if
thathappens, we'd end up uselessly    * calling tbm_lossify over and over.  To prevent this from becoming a    *
performancesink, force maxentries up to at least double the current    * number of entries.  (In essence, we're
admittinginability to fit    * within work_mem when we do this.)  Note that this test will not fire if    * we broke
outof the loop early; and if we didn't, the current number of    * entries is simply not reducible any further.    */
if(tbm->nentries > tbm->maxentries / 2)       tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;
 

I suggest defining a TBM_FILLFACTOR constant instead of repeating the
value 0.9 in a bunch of places.  I think it would also be good to try
to find the sweet spot for that constant.  Making it bigger reduces
the number of lossy entries  created, but making it smaller reduces
the number of times we have to walk the bitmap.  So if, for example,
0.75 is sufficient to produce almost all of the gain, then I think we
would want to prefer 0.75 to 0.9.  But if 0.9 is better, then we can
stick with that.

Note that a value higher than 0.9375 wouldn't be sane without some
additional safety precautions because maxentries could be as low as
16.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Improving overflow checks when adding tuple to PGresultRe: [GENERAL] Retrieving query results
Next
From: Dmitry Dolgov
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting