Re: OOM in libpq and infinite loop with getCopyStart() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: OOM in libpq and infinite loop with getCopyStart()
Date
Msg-id 20734.1459472783@sss.pgh.pa.us
Whole thread Raw
In response to Re: OOM in libpq and infinite loop with getCopyStart()  (Aleksander Alekseev <a.alekseev@postgrespro.ru>)
Responses Re: OOM in libpq and infinite loop with getCopyStart()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:
> +1, same here. Lets see whats committer's opinion.

I fooled around with this patch quite a bit but could not bring myself
to pull the trigger, because I think it fundamentally breaks applications
that follow the "repeat PQgetResult until NULL" rule.  The only reason
that psql manages not to fail is that it doesn't do that, it just calls
PQexec; and you've hacked PQexecFinish so that it falls out without
pumping PQgetResult till dry.  But that's not a good solution, it's just
a hack that makes the behavior unprincipled and unlike direct use of
PQgetResult.  The key problem is that, assuming that the changes in
getCopyStart() succeed in returning a PGRES_FATAL_ERROR PGresult, the
application may just follow the rule of doing nothing with it unless it's
the last non-null result from PQgetResult.  And it won't be, because
you've switched libpq's asyncStatus into one or another COPY status, which
will cause PQgetResult to continually create and return PGRES_COPY_XXX
results, which is what it's supposed to do in that situation (cf the last
step in getCopyResult).

Now, this will accidentally fail to fail if PQgetResult's attempt to
manufacture a PGRES_COPY_XXX result fails and returns null, which is
certainly possible if we're up against an OOM situation.  But what if
it doesn't fail --- which is also possible, because a PGRES_COPY_XXX
with no attached data will not need as much space as a PGRES_FATAL_ERROR
with attached error message.  The app probably throws away the
PGRES_FATAL_ERROR and tries to use the PGRES_COPY_XXX result, which
is almost okay, except that it will lack the copy format information
which will be fatal if the application is relying on that.

So AFAICT, this patch kinda sorta works for psql but it is not going
to make things better for other apps.

The other problem is that I don't have a lot of faith in the theory
that getCopyStart is going to be able to make an error PGresult when
it couldn't make a COPY PGresult.  The existing message-receipt routines
that this is modeled on are dealing with PGresults that are expected to
grow large, so throwing away the accumulated PGresult is highly likely
to free enough memory to let us allocate an error PGresult.  Not so
here.

I have to run, but the bottom line is I don't feel comfortable with
this.  It's trying to fix what's a very corner-case problem (since
COPY PGresults aren't large) but it's introducing new corner case
behaviors of its own.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Recovery test failure for recovery_min_apply_delay on hamster
Next
From: David Rowley
Date:
Subject: Incorrect format in error message