Thread: Re: [PERFORM] psql -A (unaligned format) eats too much memory
Moving to -hackers On Mon, Jun 05, 2006 at 12:32:38AM +0200, Zoltan Boszormenyi wrote: > >I just noticed that psql's unformatted output uses too much > >memory. Is it normal? It seems that psql draws all records > >of a query off the server before it displays or writes the output. > >I would expect this only with formatted output. > > > >Problem is, I have an export that produces 500'000+ records > >which changes frequently. Several (20+) sites run this query > >nightly with different parameters and download it. The SELECTs > >that run in psql -A -t -c '...' may overlap and the query that runs > >in less than 1.5 minutes if it's the only one at the time may take > >3+ hours if ten such queries overlap. The time is mostly spent > >in swapping, all psql processes take up 300+ MB, so the 1GB > >server is brought to its knees quickly, peek swap usage is 1.8 GB. > >I watched the progress in top and the postmaster processes finished > >their work in about half an hour (that would still be acceptable) > >then the psql processes started eating up memory as they read > >the records. > > > >PostgreSQL 8.1.4 was used on RHEL3. > > > >Is there a way to convince psql to use less memory in unformatted > >mode? I know COPY will be able to use arbitrary SELECTs > >but until then I am still stuck with redirecting psql's output. > > The answer it to use SELECT INTO TEMP and then COPY. > Psql will use much less memory that way. But still... I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM largetable' > /dev/null results in psql consuming vast quantities of memory. Why is this? ISTM this is a bug... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM > largetable' > /dev/null results in psql consuming vast quantities of > memory. Why is this? Is it different without the -A? I'm reading this as just another uninformed complaint about libpq's habit of buffering the whole query result. It's possible that there's a memory leak in the -A path specifically, but nothing said so far provided any evidence for that. regards, tom lane
On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM > > largetable' > /dev/null results in psql consuming vast quantities of > > memory. Why is this? > > Is it different without the -A? Nope. > I'm reading this as just another uninformed complaint about libpq's > habit of buffering the whole query result. It's possible that there's > a memory leak in the -A path specifically, but nothing said so far > provided any evidence for that. Certainly seems like it. It seems like it would be good to allow for libpq not to buffer, since there's cases where it's not needed... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: >> I'm reading this as just another uninformed complaint about libpq's >> habit of buffering the whole query result. It's possible that there's >> a memory leak in the -A path specifically, but nothing said so far >> provided any evidence for that. > Certainly seems like it. It seems like it would be good to allow for > libpq not to buffer, since there's cases where it's not needed... See past discussions. The problem is that libpq's API says that when it hands you back the completed query result, the command is complete and guaranteed not to fail later. A streaming interface could not make that guarantee, so it's not a transparent substitution. I wouldn't have any strong objection to providing a separate API that operates in a streaming fashion, but defining it is something no one's bothered to do yet. In practice, if you have to code to a variant API, it's not that much more trouble to use a cursor... regards, tom lane
> "Jim C. Nasby" <jnasby@pervasive.com> writes: >> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: >>> I'm reading this as just another uninformed complaint about libpq's >>> habit of buffering the whole query result. It's possible that there's >>> a memory leak in the -A path specifically, but nothing said so far >>> provided any evidence for that. > >> Certainly seems like it. It seems like it would be good to allow for >> libpq not to buffer, since there's cases where it's not needed... > > See past discussions. The problem is that libpq's API says that when it > hands you back the completed query result, the command is complete and > guaranteed not to fail later. A streaming interface could not make that > guarantee, so it's not a transparent substitution. > > I wouldn't have any strong objection to providing a separate API that > operates in a streaming fashion, but defining it is something no one's > bothered to do yet. In practice, if you have to code to a variant API, > it's not that much more trouble to use a cursor... > Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve this for free?
Mark Woodward wrote: >> "Jim C. Nasby" <jnasby@pervasive.com> writes: >> >>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: >>> >>>> I'm reading this as just another uninformed complaint about libpq's >>>> habit of buffering the whole query result. It's possible that there's >>>> a memory leak in the -A path specifically, but nothing said so far >>>> provided any evidence for that. >>>> >>> Certainly seems like it. It seems like it would be good to allow for >>> libpq not to buffer, since there's cases where it's not needed... >>> >> See past discussions. The problem is that libpq's API says that when it >> hands you back the completed query result, the command is complete and >> guaranteed not to fail later. A streaming interface could not make that >> guarantee, so it's not a transparent substitution. >> >> I wouldn't have any strong objection to providing a separate API that >> operates in a streaming fashion, but defining it is something no one's >> bothered to do yet. In practice, if you have to code to a variant API, >> it's not that much more trouble to use a cursor... >> >> > > Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve > this for free? > > > It won't solve it in the general case for clients that expect a result set. ISTM that "use a cursor" is a perfectly reasonable answer, though. cheers andrew
Hi! Tom Lane írta: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > >> I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM >> largetable' > /dev/null results in psql consuming vast quantities of >> memory. Why is this? >> > > Is it different without the -A? > > I'm reading this as just another uninformed complaint about libpq's > habit of buffering the whole query result. It's possible that there's > a memory leak in the -A path specifically, but nothing said so far > provided any evidence for that. > > regards, tom lane > So, is libpq always buffering the result? Thanks. I thought psql buffers only because in its formatted output mode it has to know the widest value for all the columns. Then the SELECT INTO TEMP ; COPY TO STDOUT solution I found is _the_ solution. I guess then the libpq-based ODBC driver suffers from the same problem? It certainly explains the performance problems I observed: the server finishes the query, the ODBC driver (or libpq underneath) fetches all the records and the application receives the first record after all these. Nice. Best regards, Zoltán Böszörményi
> Mark Woodward wrote: >>> "Jim C. Nasby" <jnasby@pervasive.com> writes: >>> >>>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: >>>> >>>>> I'm reading this as just another uninformed complaint about libpq's >>>>> habit of buffering the whole query result. It's possible that >>>>> there's >>>>> a memory leak in the -A path specifically, but nothing said so far >>>>> provided any evidence for that. >>>>> >>>> Certainly seems like it. It seems like it would be good to allow for >>>> libpq not to buffer, since there's cases where it's not needed... >>>> >>> See past discussions. The problem is that libpq's API says that when >>> it >>> hands you back the completed query result, the command is complete and >>> guaranteed not to fail later. A streaming interface could not make >>> that >>> guarantee, so it's not a transparent substitution. >>> >>> I wouldn't have any strong objection to providing a separate API that >>> operates in a streaming fashion, but defining it is something no one's >>> bothered to do yet. In practice, if you have to code to a variant API, >>> it's not that much more trouble to use a cursor... >>> >>> >> >> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve >> this for free? >> >> >> > > It won't solve it in the general case for clients that expect a result > set. ISTM that "use a cursor" is a perfectly reasonable answer, though. I'm not sure I agree -- surprise! psql is often used as a command line tool and using a cursor is not acceptable. Granted, with an unaligned output, perhaps psql should not buffer the WHOLE result at once, but without rewriting that behavior, a COPY from query may be close enough.
Mark Woodward wrote: >>>> >>>> >>>> >>> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve >>> this for free? >>> >>> >>> >>> >> It won't solve it in the general case for clients that expect a result >> set. ISTM that "use a cursor" is a perfectly reasonable answer, though. >> > > I'm not sure I agree -- surprise! > > psql is often used as a command line tool and using a cursor is not > acceptable. > > Granted, with an unaligned output, perhaps psql should not buffer the > WHOLE result at once, but without rewriting that behavior, a COPY from > query may be close enough. > > You have missed my point. Surprise! I didn't say it wasn't OK in the psql case, I said it wasn't helpful in the case of *other* libpq clients. Expecting clients generally to split and interpret COPY output is not reasonable, but if they want large result sets they should use a cursor. cheers andrew
Andrew Dunstan írta: > Mark Woodward wrote: >>> "Jim C. Nasby" <jnasby@pervasive.com> writes: >>> >>>> On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: >>>> >>>>> I'm reading this as just another uninformed complaint about libpq's >>>>> habit of buffering the whole query result. It's possible that >>>>> there's >>>>> a memory leak in the -A path specifically, but nothing said so far >>>>> provided any evidence for that. >>>>> >>>> Certainly seems like it. It seems like it would be good to allow for >>>> libpq not to buffer, since there's cases where it's not needed... >>>> >>> See past discussions. The problem is that libpq's API says that >>> when it >>> hands you back the completed query result, the command is complete and >>> guaranteed not to fail later. A streaming interface could not make >>> that >>> guarantee, so it's not a transparent substitution. >>> >>> I wouldn't have any strong objection to providing a separate API that >>> operates in a streaming fashion, but defining it is something no one's >>> bothered to do yet. In practice, if you have to code to a variant API, >>> it's not that much more trouble to use a cursor... >>> >>> >> >> Wouldn't the "COPY (select ...) TO STDOUT" format being discussed solve >> this for free? Yes, it would for me. > It won't solve it in the general case for clients that expect a result > set. ISTM that "use a cursor" is a perfectly reasonable answer, though. The general case cannot be applied for all particular cases. E.g. you cannot use cursors from shell scripts and just for producing an "export file" it's not too reasonable either. Redirecting psql's output or COPY is enough. Best regards, Zoltán Böszörényi
On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote: > The general case cannot be applied for all particular cases. > E.g. you cannot use cursors from shell scripts This could be fixed by adding an option to psql to transparently produce SELECT result sets via a cursor. -Neil
Neil Conway <neilc@samurai.com> writes: > On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote: >> The general case cannot be applied for all particular cases. >> E.g. you cannot use cursors from shell scripts > This could be fixed by adding an option to psql to transparently produce > SELECT result sets via a cursor. Note of course that such a thing would push the incomplete-result problem further upstream. For instance in (hypothetical --cursor switch)psql --cursor -c "select ..." | myprogram there would be no very good way for myprogram to find out that it'd been sent an incomplete result due to error partway through the SELECT. regards, tom lane
Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane: > Neil Conway <neilc@samurai.com> writes: > > On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote: > >> The general case cannot be applied for all particular cases. > >> E.g. you cannot use cursors from shell scripts > > > This could be fixed by adding an option to psql to transparently produce > > SELECT result sets via a cursor. I think this is an excellent idea. psql --cursor --fetchby 10000 -c "select ..." | myprogram > Note of course that such a thing would push the incomplete-result > problem further upstream. For instance in (hypothetical --cursor > switch) > psql --cursor -c "select ..." | myprogram > there would be no very good way for myprogram to find out that it'd > been sent an incomplete result due to error partway through the SELECT. would it not learn about it at the point of error ? even without --cursor there is still no very good way to find out when something else goes wrong, like the result inside libpq taking up all memory and so psql runs out of memory on formatting some longer lines. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing <hannu@skype.net> writes: > Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane: >> Note of course that such a thing would push the incomplete-result >> problem further upstream. For instance in (hypothetical --cursor >> switch) >> psql --cursor -c "select ..." | myprogram >> there would be no very good way for myprogram to find out that it'd >> been sent an incomplete result due to error partway through the SELECT. > would it not learn about it at the point of error ? No, it would merely see EOF after some number of result rows. (I'm assuming you're also using -A -t so that the output is unadorned.) regards, tom lane
On Tue, Jun 06, 2006 at 09:48:43AM -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane: > >> Note of course that such a thing would push the incomplete-result > >> problem further upstream. For instance in (hypothetical --cursor > >> switch) > >> psql --cursor -c "select ..." | myprogram > >> there would be no very good way for myprogram to find out that it'd > >> been sent an incomplete result due to error partway through the SELECT. > > > would it not learn about it at the point of error ? > > No, it would merely see EOF after some number of result rows. (I'm > assuming you're also using -A -t so that the output is unadorned.) So if an error occurs partway through reading a cursor, no error message is generated? That certainly sounds like a bug to me... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Tue, Jun 06, 2006 at 09:48:43AM -0400, Tom Lane wrote: >>> psql --cursor -c "select ..." | myprogram >>> there would be no very good way for myprogram to find out that it'd >>> been sent an incomplete result due to error partway through the SELECT. > So if an error occurs partway through reading a cursor, no error message > is generated? That certainly sounds like a bug to me... Sure an error is generated. But it goes to stderr. The guy at the downstream end of the stdout pipe cannot see either the error message, or the nonzero status that psql will (hopefully) exit with. You can theoretically deal with this by having the shell script calling this combination check psql exit status and discard the results of myprogram on failure, but it's not easy or simple. regards, tom lane