Re: Faster str to int conversion (was Table with large number of intcolumns, very slow COPY FROM) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Faster str to int conversion (was Table with large number of intcolumns, very slow COPY FROM)
Date
Msg-id 20180719203212.qso3vgljwns75oho@alap3.anarazel.de
Whole thread Raw
In response to Re: Faster str to int conversion (was Table with large number of intcolumns, very slow COPY FROM)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Faster str to int conversion (was Table with large number of intcolumns, very slow COPY FROM)  (Robert Haas <robertmhaas@gmail.com>)
Re: Faster str to int conversion (was Table with large number of intcolumns, very slow COPY FROM)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2018-07-18 14:34:34 -0400, Robert Haas wrote:
> On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund <andres@anarazel.de> wrote:
> > FWIW, here's a rebased version of this patch. Could probably be polished
> > further. One might argue that we should do a bit more wide ranging
> > changes, to convert scanint8 and pg_atoi to be also unified. But it
> > might also just be worthwhile to apply without those, given the
> > performance benefit.
> 
> Wouldn't hurt to do that one too, but might be OK to just do this
> much.  Questions:
> 
> 1. Why the error message changes?  If there's a good reason, it should
> be done as a separate commit, or at least well-documented in the
> commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s.  But you're right, it'd be worthwhile
to point that out in the commit message.


> 2. Does the likely/unlikely stuff make a noticeable difference?

Yes. It's also largely a copy from existing code (scanint8), so I don't
really want to differ here.


> 3. If this is a drop-in replacement for pg_atoi, why not just recode
> pg_atoi this way -- or have it call this -- and leave the callers
> unchanged?

Because pg_atoi supports a variable 'terminator'. Supporting that would
create a bit slower code, without being particularly useful.  I think
there's only a single in-core caller left after the patch
(int2vectorin). There's a fair argument that that should just be
open-coded to handle the weird space parsing, but given there's probably
external pg_atoi() callers, I'm not sure it's worth doing so?

I don't think it's a good idea to continue to have pg_atoi as a wrapper
- it takes a size argument, which makes efficient code hard.


> 4. Are we sure this is faster on all platforms, or could it work out
> the other way on, say, BSD?

I'd be *VERY* surprised if any would be faster. It's not easy to write a
faster implmentation, than what I've proposed, and especially not so if
you use strtol() as the API (variable bases, a bit of locale support).

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Bug in gin insert redo code path during re-compression of emptygin data leaf pages
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket