Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Date
Msg-id CAD21AoBazzozsJ_HxMSSwzPa3m6-uRFOoReT50431ZSdvRn23w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Fri, Apr 21, 2017 at 6:24 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Wed, Apr 12, 2017 at 4:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Apr 11, 2017 at 4:38 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> In essence, the patch as it is proposed, doesn't *need* a binary
>>> search, because the segment list can only grow up to 15 segments at
>>> its biggest, and that's a size small enough that linear search will
>>> outperform (or at least perform as well as) binary search. Reducing
>>> the initial segment size wouldn't change that. If the 12GB limit is
>>> lifted, or the maximum segment size reduced (from 1GB to 128MB for
>>> example), however, that would change.
>>>
>>> I'd be more in favor of lifting the 12GB limit than of reducing the
>>> maximum segment size, for the reasons above. Raising the 12GB limit
>>> has concrete and readily apparent benefits, whereas using bigger (or
>>> smaller) segments is far more debatable. Yes, that will need a binary
>>> search. But, I was hoping that could be a second (or third) patch, to
>>> keep things simple, and benefits measurable.
>>
>> To me, it seems a bit short-sighted to say, OK, let's use a linear
>> search because there's this 12GB limit so we can limit ourselves to 15
>> segments.  Because somebody will want to remove that 12GB limit, and
>> then we'll have to revisit the whole thing anyway.  I think, anyway.
>
> Ok, attached an updated patch that implements the binary search
>
>> What's not clear to me is how sensitive the performance of vacuum is
>> to the number of cycles used here.  For a large index, the number of
>> searches will presumably be quite large, so it does seem worth
>> worrying about performance.  But if we just always used a binary
>> search, would that lose enough performance with small numbers of
>> segments that anyone would care?  If so, maybe we need to use linear
>> search for small numbers of segments and switch to binary search with
>> larger numbers of segments.
>
> I just went and tested.
>
> I implemented the hybrid binary search attached, and ran a few tests
> with and without the sequential code enabled, at small scales.
>
> The difference is statistically significant, but small (less than 3%).
> With proper optimization of the binary search, however, the difference
> flips:
>
> claudiofreire@klaumpp:~/src/postgresql.vacuum> fgrep shufp80
> fullbinary.s100.times
> vacuum_bench_s100.1.shufp80.log:CPU: user: 6.20 s, system: 1.42 s,
> elapsed: 18.34 s.
> vacuum_bench_s100.2.shufp80.log:CPU: user: 6.44 s, system: 1.40 s,
> elapsed: 19.75 s.
> vacuum_bench_s100.3.shufp80.log:CPU: user: 6.28 s, system: 1.41 s,
> elapsed: 18.48 s.
> vacuum_bench_s100.4.shufp80.log:CPU: user: 6.39 s, system: 1.51 s,
> elapsed: 20.60 s.
> vacuum_bench_s100.5.shufp80.log:CPU: user: 6.26 s, system: 1.42 s,
> elapsed: 19.16 s.
>
> claudiofreire@klaumpp:~/src/postgresql.vacuum> fgrep shufp80
> hybridbinary.s100.times
> vacuum_bench_s100.1.shufp80.log:CPU: user: 6.49 s, system: 1.39 s,
> elapsed: 19.15 s.
> vacuum_bench_s100.2.shufp80.log:CPU: user: 6.36 s, system: 1.33 s,
> elapsed: 18.40 s.
> vacuum_bench_s100.3.shufp80.log:CPU: user: 6.36 s, system: 1.31 s,
> elapsed: 18.87 s.
> vacuum_bench_s100.4.shufp80.log:CPU: user: 6.59 s, system: 1.35 s,
> elapsed: 26.43 s.
> vacuum_bench_s100.5.shufp80.log:CPU: user: 6.54 s, system: 1.28 s,
> elapsed: 20.02 s.
>
> That's after inlining the compare on both the linear and sequential
> code, and it seems it lets the compiler optimize the binary search to
> the point where it outperforms the sequential search.
>
> That's not the case when the compare isn't inlined.
>
> That seems in line with [1], that show the impact of various
> optimizations on both algorithms. It's clearly a close enough race
> that optimizations play a huge role.
>
> Since we're not likely to go and implement SSE2-optimized versions, I
> believe I'll leave the binary search only. That's the attached patch
> set.
>
> I'm running the full test suite, but that takes a very long while.
> I'll post the results when they're done.
>
> [1] https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/

Thank you for updating the patch.

I've read this patch again and here are some review comments.

+ * Lookup in that structure proceeds sequentially in the list of segments,
+ * and with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity (2 log N to be
+ * precise).

IIUC we now do binary search even over the list of segments.

-----

We often fetch a particular dead tuple segment. How about providing a
macro for easier understanding?
For example,
#define GetDeadTuplsSegment(lvrelstats, seg) \ (&(lvrelstats)->dead_tuples.dt_segments[(seg)])

-----

+       if (vacrelstats->dead_tuples.num_segs == 0)
+               return;
+

+       /* If uninitialized, we have no tuples to delete from the indexes */
+       if (vacrelstats->dead_tuples.num_segs == 0)
+       {
+               return;
+       }

+       if (vacrelstats->dead_tuples.num_segs == 0)
+               return false;
+

As I listed, there is code to check if dead tuple is initialized
already in some places where doing actual vacuum.
I guess that it should not happen that we attempt to vacuum a
table/index page while not having any dead tuple. Is it better to have
Assert or ereport instead?

-----

@@ -1915,2 +2002,2 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
-                       BlockNumber     prefetchStart;
-                       BlockNumber     pblkno;
+                       BlockNumber prefetchStart;
+                       BlockNumber pblkno;

I think that it's a unnecessary change.

-----

+       /* Search for the segment likely to contain the item pointer */
+       iseg = vac_itemptr_binsrch(
+               (void *) itemptr,
+               (void *)
&(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
+               vacrelstats->dead_tuples.last_seg + 1,
+               sizeof(DeadTuplesSegment));
+

I think that we can change the above to;

+       /* Search for the segment likely to contain the item pointer */
+       iseg = vac_itemptr_binsrch(
+               (void *) itemptr,
+               (void *) &(seg->last_dead_tuple),
+               vacrelstats->dead_tuples.last_seg + 1,
+               sizeof(DeadTuplesSegment));

We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [HACKERS] scram and \password
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] [PostgreSQL 10] default of hot_standby should be "on"?