Re: Improving on MAX_CONVERSION_GROWTH - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Improving on MAX_CONVERSION_GROWTH
Date
Msg-id 20159.1569612302@sss.pgh.pa.us
Whole thread Raw
In response to Re: Improving on MAX_CONVERSION_GROWTH  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Improving on MAX_CONVERSION_GROWTH
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Sep 27, 2019 at 11:40 AM Andres Freund <andres@anarazel.de> wrote:
>> Note that one of the additional reasons for the 1GB limit is that it
>> protects against int overflows. I'm somewhat unconvinced that that's a
>> sensible approach, but ...

> It's not crazy. People using 'int' rather casually just as they use
> 'palloc' rather casually, without necessarily thinking about what
> could go wrong at the edges. I don't have any beef with that as a
> general strategy; I just think we should be trying to do better in the
> cases where it negatively affects the user experience.

A small problem with doing anything very interesting here is that the
int-is-enough-for-a-string-length approach is baked into the wire
protocol (read the DataRow message format spec and weep).

We could probably bend the COPY protocol enough to support multi-gig row
values --- dropping the rule that the backend doesn't split rows across
CopyData messages wouldn't break too many clients, hopefully.  That would
at least dodge some problems in dump/restore scenarios.

In the meantime, I still think we should commit what I proposed in the
other thread (<974.1569356381@sss.pgh.pa.us>), or something close to it.
Andres' proposal would perhaps be an improvement on that, but I don't
think it'll be ready anytime soon; and for sure we wouldn't risk
back-patching it, while I think we could back-patch what I suggested.
In any case, that patch is small enough that dropping it would be no big
loss if a better solution comes along.

Also, as far as the immediate subject of this thread is concerned,
I'm inclined to get rid of MAX_CONVERSION_GROWTH in favor of using
the target encoding's max char length.  The one use (in printtup.c)
where we don't know the target encoding could use MAX_MULTIBYTE_CHAR_LEN
instead.  Being smarter than that could help in some cases (mostly,
conversion of ISO encodings to UTF8), but it's not that big a win.
(I did some checks and found that some ISO encodings could provide a
max growth of 2x, but many are max 3x, so 4x isn't that far out of
line.)  If Andres' ideas don't pan out we could come back and work
harder on this, but for now something simple and back-patchable
seems like a useful stopgap improvement.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Attempt to consolidate reading of XLOG page
Next
From: legrand legrand
Date:
Subject: Re: Hooks for session start and end, take two