On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote:
> On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> > > > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.
>
> I think it would be better if this was self-tuning. So that we don't
> allocate a bulkinsert state until we've done say 100 (?) rows
> inserted.
I made it an optional, non-default behavior in response to the legitimate
concern for performance regression for the cases where a loader needs to be as
fast as possible - as compared with our case, where we want instead to optimize
for our reports by making the loaders responsible for their own writes, rather
than leaving behind many dirty pages, and clobbering the cache, too.
Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use
INSERT .. VALUES () .. ON CONFLICT. This would handle that case, which is
great, even though that wasn't a design goal. It could also be an integer GUC
to allow configuring the size of the ring buffer.
> You should also use table_multi_insert() since that will give further
> performance gains by reducing block access overheads. Switching from
> single row to multi-row should also only happen once we've loaded a
> few rows, so we don't introduce overahads for smaller SQL statements.
Good idea...multi_insert (which reduces the overhead of individual inserts) is
mostly independent from BulkInsert state (which uses a ring-buffer to avoid
dirtying the cache). I made this 0002.
This makes INSERT SELECT several times faster, and not clobber the cache too.
Time: 4700.606 ms (00:04.701)
123 | 1
37 | 2
20 | 3
11 | 4
4537 | 5
11656 |
Time: 1125.302 ms (00:01.125)
2171 | 1
37 | 2
20 | 3
11 | 4
111 | 5
14034 |
When enabled, this passes nearly all regression tests, and all but 2 of the
changes are easily understood. The 2nd patch still needs work.
--
Justin