Thread: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

08 September 2015, 02:44:25

I noticed that the TID type is cataloged as typbyval = f, despite the
fact that it is 6 bytes, and so could be made typbyval = t on 64-bit
platforms (i.e. typbyval = FLOAT8PASSBYVAL) with a little work.

This matters because a major cost during CREATE INDEX CONCURRENTLY is
a TID-based datum sort (this is probably most of the cost over and
above a conventional CREATE INDEX). Based on prior experience, I'd
guess that making the type pass-by-value on 64-bit machines would make
that sort about twice as fast. This would give CREATE INDEX
CONCURRENTLY a nice overall performance boost. SortSupport would also
help, but I would not bother with abbreviation to get some benefit on
32-bit platforms -- that would prevent 64-bit platforms from using
tuplesort's onlyKey optimization, which matters quite a bit. Given the
increasing rarity of 32-bit platforms these days, basic SortSupport
seems best.

I'm more than a little busy at the moment, so I would be happy for
someone else to write the patch. It seems worthwhile.

-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Tom Lane

Date:

08 September 2015, 04:03:14

Peter Geoghegan <pg@heroku.com> writes:
> I noticed that the TID type is cataloged as typbyval = f, despite the
> fact that it is 6 bytes, and so could be made typbyval = t on 64-bit
> platforms (i.e. typbyval = FLOAT8PASSBYVAL) with a little work.

I'm not sure that it would be just "a little work" --- I suspect that
the idea that pass-by-val types are 1, 2, 4, or 8 bytes is embedded in
a fair number of places, including alignment macros in which any added
complexity would have a large distributed cost.

> This matters because a major cost during CREATE INDEX CONCURRENTLY is
> a TID-based datum sort (this is probably most of the cost over and
> above a conventional CREATE INDEX).

Might be better to hack a special case right there (ie, embed TIDs into
int8s and sort the int8s) rather than try to change the type's SQL
declaration.
        regards, tom lane

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

08 September 2015, 04:20:35

On Mon, Sep 7, 2015 at 9:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not sure that it would be just "a little work" --- I suspect that
> the idea that pass-by-val types are 1, 2, 4, or 8 bytes is embedded in
> a fair number of places, including alignment macros in which any added
> complexity would have a large distributed cost.

I didn't immediately consider that.

>> This matters because a major cost during CREATE INDEX CONCURRENTLY is
>> a TID-based datum sort (this is probably most of the cost over and
>> above a conventional CREATE INDEX).
>
> Might be better to hack a special case right there (ie, embed TIDs into
> int8s and sort the int8s) rather than try to change the type's SQL
> declaration.

That sounds at least as effective as what I originally sketched. It
would also be better to have a simple comparator, rather than the more
complex TID comparator. Baking the details into the int8
representation, rather than having the comparator tease it apart will
be faster. Ordinarily, abbreviation does that kind of thing, but there
are upsides to not doing that when tie-breaker comparisons are not
actually required, and this really only needs to happen in one place.

I hope someone picks this up soon.

-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

17 November 2015, 00:52:40

On Mon, Sep 7, 2015 at 9:20 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> This matters because a major cost during CREATE INDEX CONCURRENTLY is
>>> a TID-based datum sort (this is probably most of the cost over and
>>> above a conventional CREATE INDEX).
>>
>> Might be better to hack a special case right there (ie, embed TIDs into
>> int8s and sort the int8s) rather than try to change the type's SQL
>> declaration.
>
> That sounds at least as effective as what I originally sketched.

> I hope someone picks this up soon.

I suggested to someone else that he take a look at this as a project,
but I guess he was busy too. I decided to just do it myself. Patch is
attached. This has the bulkdelete callback that gathers TIDs from the
index during CREATE INDEX CONCURRENTLY encode them as int8 values
ahead of the sort, while sorting the values as int8 values. They're
later decoded as needed.

This turns out to be a relatively simple patch. Testing shows it
removes *most* of the overhead of CREATE INDEX CONCURRENTLY over
CREATE INDEX. In absolute terms, a test case involving a CREATE INDEX
CONCURRENTLY on a single integer column takes about 30% - 40% less
time with the patch applied. The TID sort itself is about 3 times
faster, and that's without the benefit of the patch making the TID
sort an internal sort where it would otherwise have been an external
sort. Sorting item pointers as TIDs naively currently wastes a lot of
memory (not just memory bandwidth) -- a pass-by-value datum sort only
ever allocates memory for SortTuples, avoiding allocating any memory
for a "tuple proper". Clearly just having the sort be pass-by-value is
*much* faster. As comments above process_ordered_aggregate_single()
put it:

 * The one-input case is handled separately from the multi-input case
 * for performance reasons: for single by-value inputs, such as the
 * common case of count(distinct id), the tuplesort_getdatum code path
 * is around 300% faster.  (The speedup for by-reference types is less
 * but still noticeable.)

Another way of stating how things are improved here is: my CREATE
INDEX CONCURRENTLY test case had about a 2.1x overhead over an
equivalent CREATE INDEX on master, but with the patch applied the
CREATE INDEX CONCURRENTLY has an overhead of only about 1.3x. The
extra logical I/O for CONCURRENTLY's second scan, and for the
physical-order index scan (to "bulk delete", to gather TIDs to sort)
is a surprisingly small cost.

I'll add the patch to the open commitfest. Think of all these patches
of mine as giving reviewers a choice of what to review. This patch
does seem like a slam dunk, even if I do say so myself, so at least
it's relatively easy to review. The external sort work remains my most
interesting pending work, though.

--
Peter Geoghegan

Attachment

0001-Speed-up-CREATE-INDEX-CONCURRENTLY-s-TID-sort.patch

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Simon Riggs

Date:

17 November 2015, 08:54:05

On 17 November 2015 at 00:52, Peter Geoghegan <pg@heroku.com> wrote:

This patch does seem like a slam dunk, even if I do say so myself

Short and sweet! Looks good.

I would be inclined to add more comments to explain it, these things have a habit of being forgotten.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Tom Lane

Date:

17 November 2015, 15:54:59

Peter Geoghegan <pg@heroku.com> writes:
>>> Might be better to hack a special case right there (ie, embed TIDs into
>>> int8s and sort the int8s) rather than try to change the type's SQL
>>> declaration.

> I suggested to someone else that he take a look at this as a project,
> but I guess he was busy too. I decided to just do it myself. Patch is
> attached.

I think this might do the wrong thing with block numbers above 0x80000000
and/or offset numbers above 0x8000.  I'd be more comfortable about it if
+    encoded = ((int64) block << 16) | offset;
were
+    encoded = ((uint64) block << 16) | (uint16) offset;
so that there's no risk of the compiler deciding to sign-extend rather
than zero-fill either input.

Also, I think it'd be a good idea to explicitly set indexcursor = NULL
in the tuplesort_empty case; the previous coding was effectively doing
that.  It's true that the code shouldn't attempt to touch the value,
but it's better not to have dangling pointers lying around.
        regards, tom lane

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

17 November 2015, 18:59:15

On Tue, Nov 17, 2015 at 7:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think this might do the wrong thing with block numbers above 0x80000000
> and/or offset numbers above 0x8000.  I'd be more comfortable about it if
> +       encoded = ((int64) block << 16) | offset;
> were
> +       encoded = ((uint64) block << 16) | (uint16) offset;
> so that there's no risk of the compiler deciding to sign-extend rather
> than zero-fill either input.

I don't have a problem with your alternative, but I don't see any risk
with the original. It is recommended by various coding standards to
only use bitwise operators on unsigned operands, so that's a good
enough reason, I suppose.

> Also, I think it'd be a good idea to explicitly set indexcursor = NULL
> in the tuplesort_empty case; the previous coding was effectively doing
> that.  It's true that the code shouldn't attempt to touch the value,
> but it's better not to have dangling pointers lying around.

The code started that way, but I removed the "indexcursor = NULL"
because the previous coding was *not* doing that. tuplesort_getdatum()
was not setting the passed Datum pointer (which points to indexcursor
here) in the event of returning false. Maybe I should have left in the
code setting indexcursor = NULL all the same.

-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

17 November 2015, 23:50:44

On Tue, Nov 17, 2015 at 12:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Short and sweet! Looks good.

Thanks.

> I would be inclined to add more comments to explain it, these things have a
> habit of being forgotten.

I'm not sure what additional detail I can add.

I seem to be able to produce these sorting patches at a much greater
rate than they can be committed, in part because Robert is the only
one that ever reviews them, and he is only one person. Since you think
the patch is good work, perhaps you can find the time to formally
review it.

-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Michael Paquier

Date:

18 November 2015, 00:28:51

On Wed, Nov 18, 2015 at 8:50 AM, Peter Geoghegan wrote:
> I seem to be able to produce these sorting patches at a much greater
> rate than they can be committed, in part because Robert is the only
> one that ever reviews them, and he is only one person. Since you think
> the patch is good work, perhaps you can find the time to formally
> review it.

Finding reviewing volunteers is a good thing, particularly on these
times where we tend to have more patches than reviews, however I would
suggest prioritizing the older items by beginning in what is in the
current CF (47 items waiting for review at I write this message), 3
patches for the sorting work.

FWIW, I think that this series of patches is interesting and have high
value because particularly I have seen clear improvements particularly
with raw dumps on schemas with many indexes (disclaimer: I am the guy
Peter talked to regarding this patch though this was not on the top
head nor of my TODOs).
-- 
Michael

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Corey Huinker

Date:

18 November 2015, 03:33:57

<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, Nov 17, 2015 at 7:28 PM, Michael
Paquier<span dir="ltr"><<a href="mailto:michael.paquier@gmail.com"
target="_blank">michael.paquier@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0px
0px0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span
class="">OnWed, Nov 18, 2015 at 8:50 AM, Peter Geoghegan wrote:<br /> > I seem to be able to produce these sorting
patchesat a much greater<br /> > rate than they can be committed, in part because Robert is the only<br /> > one
thatever reviews them, and he is only one person. Since you think<br /> > the patch is good work, perhaps you can
findthe time to formally<br /> > review it.<br /><br /></span>Finding reviewing volunteers is a good thing,
particularlyon these<br /> times where we tend to have more patches than reviews, however I would<br /> suggest
prioritizingthe older items by beginning in what is in the<br /> current CF (47 items waiting for review at I write
thismessage), 3<br /> patches for the sorting work.<br /><br /> FWIW, I think that this series of patches is
interestingand have high<br /> value because particularly I have seen clear improvements particularly<br /> with raw
dumpson schemas with many indexes (disclaimer: I am the guy<br /> Peter talked to regarding this patch though this was
noton the top<br /> head nor of my TODOs).<br /><span class=""><font color="#888888">--<br /> Michael<br
/></font></span><divclass=""><div class="h5"><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br
/></div><divclass="gmail_extra"><span style="font-size:12.8px">I'm willing, but I'm too new to the codebase to be an
effectivereviewer (without guidance). The one thing I can offer in the mean time is this: my company/client nearly
alwayshas a few spare AWS machines on the largish side where I can compile uncommitted patches and benchmark stuff for
y'all.</span><br/></div><div class="gmail_extra"><br /></div></div>

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

10 December 2015, 01:16:52

On Tue, Nov 17, 2015 at 7:33 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
> I'm willing, but I'm too new to the codebase to be an effective reviewer
> (without guidance). The one thing I can offer in the mean time is this: my
> company/client nearly always has a few spare AWS machines on the largish
> side where I can compile uncommitted patches and benchmark stuff for y'all.

I think that this particular patch is close to being a slam-dunk, so I
don't think it's particularly needed here. But thanks.

-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Robert Haas

Date:

11 December 2015, 17:13:12

On Wed, Dec 9, 2015 at 8:16 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Nov 17, 2015 at 7:33 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
>> I'm willing, but I'm too new to the codebase to be an effective reviewer
>> (without guidance). The one thing I can offer in the mean time is this: my
>> company/client nearly always has a few spare AWS machines on the largish
>> side where I can compile uncommitted patches and benchmark stuff for y'all.
>
> I think that this particular patch is close to being a slam-dunk, so I
> don't think it's particularly needed here. But thanks.

It never hurts to have a few extra performance test results - I'm all
in favor of Corey doing some testing.

Also, I'd be in favor of you updating the patch to reflect the
comments from Tom and Simon on November 17th.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Corey Huinker

Date:

11 December 2015, 22:27:05

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Dec 11, 2015 at 12:13 PM, Robert Haas <span
dir="ltr"><<ahref="mailto:robertmhaas@gmail.com" target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">OnWed, Dec 9, 2015 at 8:16 PM, Peter Geoghegan <<a href="mailto:pg@heroku.com">pg@heroku.com</a>>
wrote:<br/> > On Tue, Nov 17, 2015 at 7:33 PM, Corey Huinker <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>>wrote:<br /> >> I'm willing, but I'm too new
tothe codebase to be an effective reviewer<br /> >> (without guidance). The one thing I can offer in the mean
timeis this: my<br /> >> company/client nearly always has a few spare AWS machines on the largish<br /> >>
sidewhere I can compile uncommitted patches and benchmark stuff for y'all.<br /> ><br /> > I think that this
particularpatch is close to being a slam-dunk, so I<br /> > don't think it's particularly needed here. But
thanks.<br/><br /></span>It never hurts to have a few extra performance test results - I'm all<br /> in favor of Corey
doingsome testing.<br /><br /> Also, I'd be in favor of you updating the patch to reflect the<br /> comments from Tom
andSimon on November 17th.<br /><span class="HOEnZb"><font color="#888888"><br /> --<br /> Robert Haas<br />
EnterpriseDB:<a href="http://www.enterprisedb.com" rel="noreferrer" target="_blank">http://www.enterprisedb.com</a><br
/>The Enterprise PostgreSQL Company<br /></font></span></blockquote></div><br /></div><div class="gmail_extra">Sure,
themachine we called "ninefivealpha", which incidentally, failed to find a single bug in alpha2 thru beta2, is
currentlyidle, and concurrent index creation times are a bugbear around these parts. Can somebody, either in this
threador privately, outline what sort of a test they'd like to see?</div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div></div>

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

11 December 2015, 22:35:09

On Fri, Dec 11, 2015 at 2:26 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
> Sure, the machine we called "ninefivealpha", which incidentally, failed to
> find a single bug in alpha2 thru beta2, is currently idle, and concurrent
> index creation times are a bugbear around these parts. Can somebody, either
> in this thread or privately, outline what sort of a test they'd like to see?

Any kind of CREATE INDEX CONCURRENTLY test, before and after.

I looked at a simple, random int4 column. That seems like a good case
to focus on, since there isn't too much other overhead.  I think I
performed my test on an unlogged table, to make sure other overhead
was even further minimized.

-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Corey Huinker

Date:

12 December 2015, 17:49:02

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Dec 11, 2015 at 5:35 PM, Peter Geoghegan <span
dir="ltr"><<ahref="mailto:pg@heroku.com" target="_blank">pg@heroku.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Fri, Dec 11,
2015at 2:26 PM, Corey Huinker <<a href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>> wrote:<br />
>Sure, the machine we called "ninefivealpha", which incidentally, failed to<br /> > find a single bug in alpha2
thrubeta2, is currently idle, and concurrent<br /> > index creation times are a bugbear around these parts. Can
somebody,either<br /> > in this thread or privately, outline what sort of a test they'd like to see?<br /><br
/></span>Anykind of CREATE INDEX CONCURRENTLY test, before and after.<br /><br /> I looked at a simple, random int4
column.That seems like a good case<br /> to focus on, since there isn't too much other overhead.  I think I<br />
performedmy test on an unlogged table, to make sure other overhead<br /> was even further minimized.<br /><span
class="HOEnZb"><fontcolor="#888888"><br /> --<br /> Peter Geoghegan<br /></font></span></blockquote></div><br
/></div><divclass="gmail_extra">What, if any, other load should be placed on the underlying table during the
test?</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">I ask because CIC statements that run in
secondson our staging machine can take many hours on our production machine, when most of the access is just reads,
thoughthose reads may have been part of a larger transaction that did updates elsewhere.</div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div></div>

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

12 December 2015, 19:50:06

On Sat, Dec 12, 2015 at 9:48 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
> I ask because CIC statements that run in seconds on our staging machine can
> take many hours on our production machine, when most of the access is just
> reads, though those reads may have been part of a larger transaction that
> did updates elsewhere.

I don't think it's necessary to simulate any other load.


-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

12 December 2015, 23:39:49

On Fri, Dec 11, 2015 at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Also, I'd be in favor of you updating the patch to reflect the
> comments from Tom and Simon on November 17th.

Attached revision:

* Has more worked out comments on encoding, per Simon's request.

* Uses Tom's preferred formulation for encoding TIDs (which involves
unsigned integer casts).

* Sets indexcursor = NULL when the tuplesort is empty, just to be
tidy, per Tom's request.

--
Peter Geoghegan

Attachment

0001-Speed-up-CREATE-INDEX-CONCURRENTLY-s-TID-sort.patch

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Jeff Janes

Date:

13 December 2015, 05:33:24

On Sat, Dec 12, 2015 at 9:48 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
>
>
> What, if any, other load should be placed on the underlying table during the
> test?
>
> I ask because CIC statements that run in seconds on our staging machine can
> take many hours on our production machine, when most of the access is just
> reads, though those reads may have been part of a larger transaction that
> did updates elsewhere.

That sounds like the CIC is just blocked waiting for long-lived
transactions to go away.  There isn't much that changes to the sorting
algorithm can do about that.

Cheers,

Jeff

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Robert Haas

Date:

16 December 2015, 20:28:40

On Tue, Nov 17, 2015 at 6:50 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Nov 17, 2015 at 12:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Short and sweet! Looks good.
>
> Thanks.
>
>> I would be inclined to add more comments to explain it, these things have a
>> habit of being forgotten.
>
> I'm not sure what additional detail I can add.
>
> I seem to be able to produce these sorting patches at a much greater
> rate than they can be committed, in part because Robert is the only
> one that ever reviews them, and he is only one person.

I object to that vicious slander.  I am at least three people, if not more!

Meanwhile, I did some simple benchmarking on your latest patch on my
MacBook Pro.  I did pgbench -i -s 100 and then tried:

create index x on pgbench_accounts (aid);
create index concurrently x on pgbench_accounts (aid);

The first took about 6.9 seconds.  The second took about 11.3 seconds
patched versus 14.6 seconds unpatched.  That's certainly a healthy
improvement.

I then rebuilt with --disable-float8-byval, because I was worried that
case might be harmed by this change.  It turns out we win in that case
too, just not by as much.  The time for the first case doesn't change
much, and neither does the unpatched time.  The patched time is about
12.9 seconds.

I have also reviewed the code, and it looks OK to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

16 December 2015, 21:24:44

On Wed, Dec 16, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I seem to be able to produce these sorting patches at a much greater
>> rate than they can be committed, in part because Robert is the only
>> one that ever reviews them, and he is only one person.
>
> I object to that vicious slander.  I am at least three people, if not more!

I was referring only to the Robert that reviews my sorting patches.  :-)

> Meanwhile, I did some simple benchmarking on your latest patch on my
> MacBook Pro.  I did pgbench -i -s 100 and then tried:
>
> create index x on pgbench_accounts (aid);
> create index concurrently x on pgbench_accounts (aid);
>
> The first took about 6.9 seconds.  The second took about 11.3 seconds
> patched versus 14.6 seconds unpatched.  That's certainly a healthy
> improvement.

That seems pretty good. It probably doesn't matter, but FWIW it's
likely that your numbers are not as good as mine because this ends up
with a perfect logical/physical correlation, which the quicksort
precheck [1] does very well on when sorting the TIDs (since input is
*perfectly* correlated, as opposed to 99.99% correlated, a case that
does poorly [2]).

> I have also reviewed the code, and it looks OK to me, so committed.

Thanks!

[1] Commit a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649
[2] http://www.postgresql.org/message-id/54EB580C.2000904@2ndquadrant.com
-- 
Peter Geoghegan

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Corey Huinker

Date:

17 December 2015, 17:29:38

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Dec 16, 2015 at 4:24 PM, Peter Geoghegan <span
dir="ltr"><<ahref="mailto:pg@heroku.com" target="_blank">pg@heroku.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, Dec 16,
2015at 12:28 PM, Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> wrote:<br />
>>I seem to be able to produce these sorting patches at a much greater<br /> >> rate than they can be
committed,in part because Robert is the only<br /> >> one that ever reviews them, and he is only one person.<br
/>><br /> > I object to that vicious slander.  I am at least three people, if not more!<br /><br /></span>I was
referringonly to the Robert that reviews my sorting patches.  :-)<br /><span class=""><br /> > Meanwhile, I did some
simplebenchmarking on your latest patch on my<br /> > MacBook Pro.  I did pgbench -i -s 100 and then tried:<br />
><br/> > create index x on pgbench_accounts (aid);<br /> > create index concurrently x on pgbench_accounts
(aid);<br/> ><br /> > The first took about 6.9 seconds.  The second took about 11.3 seconds<br /> > patched
versus14.6 seconds unpatched.  That's certainly a healthy<br /> > improvement.<br /><br /></span>That seems pretty
good.It probably doesn't matter, but FWIW it's<br /> likely that your numbers are not as good as mine because this ends
up<br/> with a perfect logical/physical correlation, which the quicksort<br /> precheck [1] does very well on when
sortingthe TIDs (since input is<br /> *perfectly* correlated, as opposed to 99.99% correlated, a case that<br /> does
poorly[2]).<br /><span class=""><br /> > I have also reviewed the code, and it looks OK to me, so committed.<br
/><br/></span>Thanks!<br /><br /> [1] Commit a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649<br /> [2] <a
href="http://www.postgresql.org/message-id/54EB580C.2000904@2ndquadrant.com"rel="noreferrer"
target="_blank">http://www.postgresql.org/message-id/54EB580C.2000904@2ndquadrant.com</a><br/><span
class="HOEnZb"><fontcolor="#888888">--<br /> Peter Geoghegan<br /></font></span><div class="HOEnZb"><div class="h5"><br
/><br/> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br
/></div><divclass="gmail_extra">My apologies to Peter and all the Roberts, I wasn't able to set up a test fast enough.
Gladit got committed.</div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div></div>

Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From

Peter Geoghegan

Date:

17 December 2015, 19:42:34

On Thu, Dec 17, 2015 at 9:29 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
> My apologies to Peter and all the Roberts, I wasn't able to set up a test
> fast enough. Glad it got committed.

I don't use the term "slam-dunk" casually. :-)

This was the first time I ever referred to a patch of mine that way.
It doesn't happen very often, but it does happen.

-- 
Peter Geoghegan