Re: avoiding tuple copying in btree index builds - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: avoiding tuple copying in btree index builds |
Date | |
Msg-id | CAA4eK1LvYVjMntUt+Doxp9H5aGQi8RB-dVJNXZo8rD1rdC8_iw@mail.gmail.com Whole thread Raw |
In response to | Re: avoiding tuple copying in btree index builds (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
> On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I also think it's possible to have similar optimization for hash index
> > incase it has to spool the tuple for sorting.
> >
> > In function hashbuildCallback(), when buildstate->spool is true, we
> > can avoid to form index tuple. To check for nulls before calling
> >
> > _h_spool(), we can traverse the isnull array.
>
> Hmm, that might work. Arguably it's less efficient, but on the other
> hand if it avoids forming the tuple sometimes it might be MORE
> efficient. And anyway the difference might not be enough to matter.
Considering the fact that for hash indexes, this optimization would
tuplesort_putindextuple() from code, if this optimization is done for hash
indexes. However we can leave it as it is for now as there doesn't seem
to be any gain by doing so.
You seem to have removed puttuple_common() call from function
tuplesort_putindextuple(), is there any reason for doing so?
Apart from that patch looks good to me.
Below performance data on various size of indexes shows that this
patch can improve performance upto ~7%. The performance
difference becomes lesser when the index size is too big and I think
that is probably due to the reason that we have to write all the data
at end of operation, so if the data is big the improvement is not shown
due to large I/O. The performance improvement is shown considering
median value of 3 runs and time is taken by enabling \timing option
of psql.
Performance Data
---------------------------
Configuration:
IBM POWER-7 16 cores, 64 hardware threads
RAM = 64GB
shared_buffers=8GB
pgbench_accounts
No. of Records -10million
Index - on integer
Operation - Reindex
Master
16043.500 ms
16058.723 ms
15941.057 ms
Patch
15525.054 ms
15551.935 ms
15492.879 ms
Perf Improvement: 3.43%
pgbench_accounts
No. of Records -30million
Index - on integer
Operation - Reindex
Master
51258.338 ms
50520.328 ms
50562.022 ms
Patch
49610.710 ms
49302.571 ms
49301.390 ms
Perf Improvement: 2.41%
table (c1 int, c2 char(10))
No. of records = 300,000
Index - on char(10)
Operation - Reindex
Master
443.584 ms
444.798 ms
452.888 ms
Patch
421.554 ms
430.528 ms
447.558 ms
Performance Improvement: 3.2%
table (c1 int, c2 char(10))
No. of records = 500,000
Index - on char(10)
Operation - Reindex
Master
663.621 ms
661.299 ms
657.754 ms
Patch
652.325 ms
644.782 ms
643.218 ms
Performance Improvement: 2.5%
table (c1 int, c2 char(10))
No. of records = 1000,000
Index - on char(10)
Operation - Reindex
Master
16554.076 ms
16686.528 ms
16571.129 ms
Patch
16556.852 ms
16513.543 ms
16610.615 ms
Performance Improvement: less than 1%
table (c1 int, c2 char(20))
No. of records = 300,000
Index - on char(20)
Operation - Reindex
Master
429.670 ms
441.445 ms
411.539 ms
Patch
401.801 ms
412.716 ms
395.002 ms
Performance Improvement: 6.48%
table (c1 int, c2 char(20))
No. of records = 500,000
Index - on char(20)
Operation - Reindex
Master
724.541 ms
731.582 ms
704.934 ms
Patch
686.004 ms
677.361 ms
686.915 ms
Performance Improvement: 5.31%
table (c1 int, c2 char(20))
No. of records = 1000,000
Index - on char(20)
Operation - Reindex
Master
20728.758 ms
20665.289 ms
20656.375 ms
Patch
20594.022 ms
20617.383 ms
20628.181 ms
pgsql-hackers by date: