Thread: Re: [PERFORM] psql -A (unaligned format) eats too much memory

Re: [PERFORM] psql -A (unaligned format) eats too much memory

From
"Jim C. Nasby"
Date:
Moving to -hackers

On Mon, Jun 05, 2006 at 12:32:38AM +0200, Zoltan Boszormenyi wrote:
> >I just noticed that psql's unformatted output uses too much
> >memory. Is it normal? It seems that psql draws all records
> >of a query off the server before it displays or writes the output.
> >I would expect this only with formatted output.
> >
> >Problem is, I have an export that produces 500'000+ records
> >which changes frequently. Several (20+) sites run this query
> >nightly with different parameters and download it. The SELECTs
> >that run in psql -A -t -c '...' may overlap and the query that runs
> >in less than 1.5 minutes if it's the only one at the time may take
> >3+ hours if ten such queries overlap. The time is mostly spent
> >in swapping, all psql processes take up 300+ MB, so the 1GB
> >server is brought to its knees quickly, peek swap usage is 1.8 GB.
> >I watched the progress in top and the postmaster processes finished
> >their work in about half an hour (that would still be acceptable)
> >then the psql processes started eating up memory as they read
> >the records.
> >
> >PostgreSQL 8.1.4 was used on RHEL3.
> >
> >Is there a way to convince psql to use less memory in unformatted
> >mode? I know COPY will be able to use arbitrary SELECTs
> >but until then I am still stuck with redirecting psql's output.
> 
> The answer it to use SELECT INTO TEMP and then COPY.
> Psql will use much less memory that way. But still...

I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM
largetable' > /dev/null results in psql consuming vast quantities of
memory. Why is this? ISTM this is a bug...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: [PERFORM] psql -A (unaligned format) eats too much memory

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM
> largetable' > /dev/null results in psql consuming vast quantities of
> memory. Why is this?

Is it different without the -A?

I'm reading this as just another uninformed complaint about libpq's
habit of buffering the whole query result.  It's possible that there's
a memory leak in the -A path specifically, but nothing said so far
provided any evidence for that.
        regards, tom lane


Re: [PERFORM] psql -A (unaligned format) eats too much memory

From
"Jim C. Nasby"
Date:
On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM
> > largetable' > /dev/null results in psql consuming vast quantities of
> > memory. Why is this?
> 
> Is it different without the -A?

Nope.

> I'm reading this as just another uninformed complaint about libpq's
> habit of buffering the whole query result.  It's possible that there's
> a memory leak in the -A path specifically, but nothing said so far
> provided any evidence for that.

Certainly seems like it. It seems like it would be good to allow for
libpq not to buffer, since there's cases where it's not needed...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: [PERFORM] psql -A (unaligned format) eats too much memory

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
>> I'm reading this as just another uninformed complaint about libpq's
>> habit of buffering the whole query result.  It's possible that there's
>> a memory leak in the -A path specifically, but nothing said so far
>> provided any evidence for that.

> Certainly seems like it. It seems like it would be good to allow for
> libpq not to buffer, since there's cases where it's not needed...

See past discussions.  The problem is that libpq's API says that when it
hands you back the completed query result, the command is complete and
guaranteed not to fail later.  A streaming interface could not make that
guarantee, so it's not a transparent substitution.

I wouldn't have any strong objection to providing a separate API that
operates in a streaming fashion, but defining it is something no one's
bothered to do yet.  In practice, if you have to code to a variant API,
it's not that much more trouble to use a cursor...
        regards, tom lane


Re: [PERFORM] psql -A (unaligned format) eats too much

From
"Mark Woodward"
Date:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
>>> I'm reading this as just another uninformed complaint about libpq's
>>> habit of buffering the whole query result.  It's possible that there's
>>> a memory leak in the -A path specifically, but nothing said so far
>>> provided any evidence for that.
>
>> Certainly seems like it. It seems like it would be good to allow for
>> libpq not to buffer, since there's cases where it's not needed...
>
> See past discussions.  The problem is that libpq's API says that when it
> hands you back the completed query result, the command is complete and
> guaranteed not to fail later.  A streaming interface could not make that
> guarantee, so it's not a transparent substitution.
>
> I wouldn't have any strong objection to providing a separate API that
> operates in a streaming fashion, but defining it is something no one's
> bothered to do yet.  In practice, if you have to code to a variant API,
> it's not that much more trouble to use a cursor...
>

Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve
this for free?


Re: [PERFORM] psql -A (unaligned format) eats too much

From
Andrew Dunstan
Date:
Mark Woodward wrote:
>> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>>     
>>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
>>>       
>>>> I'm reading this as just another uninformed complaint about libpq's
>>>> habit of buffering the whole query result.  It's possible that there's
>>>> a memory leak in the -A path specifically, but nothing said so far
>>>> provided any evidence for that.
>>>>         
>>> Certainly seems like it. It seems like it would be good to allow for
>>> libpq not to buffer, since there's cases where it's not needed...
>>>       
>> See past discussions.  The problem is that libpq's API says that when it
>> hands you back the completed query result, the command is complete and
>> guaranteed not to fail later.  A streaming interface could not make that
>> guarantee, so it's not a transparent substitution.
>>
>> I wouldn't have any strong objection to providing a separate API that
>> operates in a streaming fashion, but defining it is something no one's
>> bothered to do yet.  In practice, if you have to code to a variant API,
>> it's not that much more trouble to use a cursor...
>>
>>     
>
> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve
> this for free?
>
>
>   

It won't solve it in the general case for clients that expect a result 
set. ISTM that "use a cursor" is a perfectly reasonable answer, though.

cheers

andrew



Re: [PERFORM] psql -A (unaligned format) eats too much

From
Zoltan Boszormenyi
Date:
Hi!

Tom Lane írta:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>   
>> I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM
>> largetable' > /dev/null results in psql consuming vast quantities of
>> memory. Why is this?
>>     
>
> Is it different without the -A?
>
> I'm reading this as just another uninformed complaint about libpq's
> habit of buffering the whole query result.  It's possible that there's
> a memory leak in the -A path specifically, but nothing said so far
> provided any evidence for that.
>
>             regards, tom lane
>   

So, is libpq always buffering the result? Thanks.
I thought psql buffers only because in its formatted output mode
it has to know the widest value for all the columns.

Then the SELECT INTO TEMP ; COPY TO STDOUT solution
I found is _the_ solution.

I guess then the libpq-based ODBC driver suffers
from the same problem? It certainly explains the
performance problems I observed: the server
finishes the query, the ODBC driver (or libpq underneath)
fetches all the records and the application receives
the first record after all these. Nice.

Best regards,
Zoltán Böszörményi



Re: [PERFORM] psql -A (unaligned format) eats too much

From
"Mark Woodward"
Date:
> Mark Woodward wrote:
>>> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>>>
>>>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
>>>>
>>>>> I'm reading this as just another uninformed complaint about libpq's
>>>>> habit of buffering the whole query result.  It's possible that
>>>>> there's
>>>>> a memory leak in the -A path specifically, but nothing said so far
>>>>> provided any evidence for that.
>>>>>
>>>> Certainly seems like it. It seems like it would be good to allow for
>>>> libpq not to buffer, since there's cases where it's not needed...
>>>>
>>> See past discussions.  The problem is that libpq's API says that when
>>> it
>>> hands you back the completed query result, the command is complete and
>>> guaranteed not to fail later.  A streaming interface could not make
>>> that
>>> guarantee, so it's not a transparent substitution.
>>>
>>> I wouldn't have any strong objection to providing a separate API that
>>> operates in a streaming fashion, but defining it is something no one's
>>> bothered to do yet.  In practice, if you have to code to a variant API,
>>> it's not that much more trouble to use a cursor...
>>>
>>>
>>
>> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve
>> this for free?
>>
>>
>>
>
> It won't solve it in the general case for clients that expect a result
> set. ISTM that "use a cursor" is a perfectly reasonable answer, though.

I'm not sure I agree -- surprise!

psql is often used as a command line tool and using a cursor is not
acceptable.

Granted, with an unaligned output, perhaps psql should not buffer the
WHOLE result at once, but without rewriting that behavior, a COPY from
query may be close enough.


Re: [PERFORM] psql -A (unaligned format) eats too much

From
Andrew Dunstan
Date:
Mark Woodward wrote:
>>>>
>>>>
>>>>         
>>> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve
>>> this for free?
>>>
>>>
>>>
>>>       
>> It won't solve it in the general case for clients that expect a result
>> set. ISTM that "use a cursor" is a perfectly reasonable answer, though.
>>     
>
> I'm not sure I agree -- surprise!
>
> psql is often used as a command line tool and using a cursor is not
> acceptable.
>
> Granted, with an unaligned output, perhaps psql should not buffer the
> WHOLE result at once, but without rewriting that behavior, a COPY from
> query may be close enough.
>
>   

You have missed my point. Surprise!

I didn't say it wasn't OK in the psql case, I said it wasn't helpful in 
the case of *other* libpq clients.

Expecting clients generally to split and interpret COPY output is not 
reasonable, but if they want large result sets they should use a cursor.

cheers

andrew



Re: [PERFORM] psql -A (unaligned format) eats too much

From
Zoltan Boszormenyi
Date:
Andrew Dunstan írta:
> Mark Woodward wrote:
>>> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>>>    
>>>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
>>>>      
>>>>> I'm reading this as just another uninformed complaint about libpq's
>>>>> habit of buffering the whole query result.  It's possible that 
>>>>> there's
>>>>> a memory leak in the -A path specifically, but nothing said so far
>>>>> provided any evidence for that.
>>>>>         
>>>> Certainly seems like it. It seems like it would be good to allow for
>>>> libpq not to buffer, since there's cases where it's not needed...
>>>>       
>>> See past discussions.  The problem is that libpq's API says that 
>>> when it
>>> hands you back the completed query result, the command is complete and
>>> guaranteed not to fail later.  A streaming interface could not make 
>>> that
>>> guarantee, so it's not a transparent substitution.
>>>
>>> I wouldn't have any strong objection to providing a separate API that
>>> operates in a streaming fashion, but defining it is something no one's
>>> bothered to do yet.  In practice, if you have to code to a variant API,
>>> it's not that much more trouble to use a cursor...
>>>
>>>     
>>
>> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve
>> this for free? 

Yes, it would for me.
> It won't solve it in the general case for clients that expect a result 
> set. ISTM that "use a cursor" is a perfectly reasonable answer, though.

The general case cannot be applied for all particular cases.
E.g. you cannot use cursors from shell scripts and just for
producing an "export file" it's not too reasonable either.
Redirecting psql's output or COPY is enough.

Best regards,
Zoltán Böszörényi



Re: [PERFORM] psql -A (unaligned format) eats too much

From
Neil Conway
Date:
On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote:
> The general case cannot be applied for all particular cases.
> E.g. you cannot use cursors from shell scripts

This could be fixed by adding an option to psql to transparently produce
SELECT result sets via a cursor.

-Neil




Re: [PERFORM] psql -A (unaligned format) eats too much

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote:
>> The general case cannot be applied for all particular cases.
>> E.g. you cannot use cursors from shell scripts

> This could be fixed by adding an option to psql to transparently produce
> SELECT result sets via a cursor.

Note of course that such a thing would push the incomplete-result
problem further upstream.   For instance in (hypothetical --cursor
switch)psql --cursor -c "select ..." | myprogram
there would be no very good way for myprogram to find out that it'd
been sent an incomplete result due to error partway through the SELECT.
        regards, tom lane


Re: [PERFORM] psql -A (unaligned format) eats too much

From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane:
> Neil Conway <neilc@samurai.com> writes:
> > On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote:
> >> The general case cannot be applied for all particular cases.
> >> E.g. you cannot use cursors from shell scripts
> 
> > This could be fixed by adding an option to psql to transparently produce
> > SELECT result sets via a cursor.

I think this is an excellent idea. 

psql --cursor --fetchby 10000 -c "select ..." | myprogram

> Note of course that such a thing would push the incomplete-result
> problem further upstream.   For instance in (hypothetical --cursor
> switch)
>     psql --cursor -c "select ..." | myprogram
> there would be no very good way for myprogram to find out that it'd
> been sent an incomplete result due to error partway through the SELECT.

would it not learn about it at the point of error ?

even without --cursor there is still no very good way to find out when
something else goes wrong, like the result inside libpq taking up all
memory and so psql runs out of memory on formatting some longer lines.


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: [PERFORM] psql -A (unaligned format) eats too much

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane:
>> Note of course that such a thing would push the incomplete-result
>> problem further upstream.   For instance in (hypothetical --cursor
>> switch)
>>    psql --cursor -c "select ..." | myprogram
>> there would be no very good way for myprogram to find out that it'd
>> been sent an incomplete result due to error partway through the SELECT.

> would it not learn about it at the point of error ?

No, it would merely see EOF after some number of result rows.  (I'm
assuming you're also using -A -t so that the output is unadorned.)
        regards, tom lane


Re: [PERFORM] psql -A (unaligned format) eats too much

From
"Jim C. Nasby"
Date:
On Tue, Jun 06, 2006 at 09:48:43AM -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane:
> >> Note of course that such a thing would push the incomplete-result
> >> problem further upstream.   For instance in (hypothetical --cursor
> >> switch)
> >>    psql --cursor -c "select ..." | myprogram
> >> there would be no very good way for myprogram to find out that it'd
> >> been sent an incomplete result due to error partway through the SELECT.
> 
> > would it not learn about it at the point of error ?
> 
> No, it would merely see EOF after some number of result rows.  (I'm
> assuming you're also using -A -t so that the output is unadorned.)

So if an error occurs partway through reading a cursor, no error message
is generated? That certainly sounds like a bug to me...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: [PERFORM] psql -A (unaligned format) eats too much

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Tue, Jun 06, 2006 at 09:48:43AM -0400, Tom Lane wrote:
>>> psql --cursor -c "select ..." | myprogram
>>> there would be no very good way for myprogram to find out that it'd
>>> been sent an incomplete result due to error partway through the SELECT.

> So if an error occurs partway through reading a cursor, no error message
> is generated? That certainly sounds like a bug to me...

Sure an error is generated.  But it goes to stderr.  The guy at the
downstream end of the stdout pipe cannot see either the error message,
or the nonzero status that psql will (hopefully) exit with.

You can theoretically deal with this by having the shell script calling
this combination check psql exit status and discard the results of
myprogram on failure, but it's not easy or simple.
        regards, tom lane