Thread: Very long SQL strings
I can't seem to find a definitive answer to this.
It looks like Postgres does not enforce a limit on the length of an SQL string. Great. However is there some point at which a query string becomes ridiculously too long and affects performance? Here's my particular case: consider an INSERT statement where you're using the new multi-row VALUES clause or SELECT ... UNION ALL to group together tuples. Is it always better to group as many together as possible?
For example, on a toy table with two columns, I noticed about a 20% increase when bulking together 1000 tuples in one INSERT statement as opposed to doing 1000 individual INSERTS. Would this be the same for 10000? 100000? Does it depend on the width of the tuples or the data types?
Are there any values A and B such that grouping together A tuples and B tuples separately and running two statements, will be faster than grouping A+B tuples in one statement?
Steve
Steven Flatt wrote: > It looks like Postgres does not enforce a limit on the length of an SQL > string. Great. However is there some point at which a query string > becomes > ridiculously too long and affects performance? Here's my particular case: > consider an INSERT statement where you're using the new multi-row VALUES > clause or SELECT ... UNION ALL to group together tuples. Is it always > better to group as many together as possible? I'm sure you'll reach a point of diminishing returns, and eventually a ceiling where you run out of memory etc, but I don't know what the limit would be. The most efficient way to do bulk inserts is to stream the data with COPY. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Steven Flatt" <steven.flatt@gmail.com> writes: > It looks like Postgres does not enforce a limit on the length of an SQL > string. Great. However is there some point at which a query string becomes > ridiculously too long and affects performance? Yes, but it'll depend a whole lot on context; I'd suggest experimentation if you want to derive a number for your particular situation. For starters, whether you are on 32- or 64-bit hardware is hugely relevant. FYI, when we developed multi-row-VALUES quite a bit of thought was put into maintaining performance with lots of rows, and IIRC we saw reasonable performance up into the tens of thousands of rows (depending on how wide the rows are). Other ways of making a query long, such as lots of WHERE clauses, might send performance into the tank a lot quicker. So the short answer is it all depends. regards, tom lane PS: for the record, there is a hard limit at 1GB of query text, owing to restrictions built into palloc. But I think you'd hit other memory limits or performance bottlenecks before that one.
Steven Flatt <steven.flatt@gmail.com> schrieb: > For example, on a toy table with two columns, I noticed about a 20% increase > when bulking together 1000 tuples in one INSERT statement as opposed to doing > 1000 individual INSERTS. Would this be the same for 10000? 100000? Does it > depend on the width of the tuples or the data types? I guess you can obtain the same if you pack all INSERTs into one transaction. And, faster than INSERT: COPY. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
tgl@sss.pgh.pa.us (Tom Lane) writes: > PS: for the record, there is a hard limit at 1GB of query text, owing > to restrictions built into palloc. But I think you'd hit other > memory limits or performance bottlenecks before that one. It would be much funnier to set a hard limit of 640K of query text. The reasoning should be obvious :-). I once ran into the situation where Slony-I generated a query that made the parser blow out (some sort of memory problem / running out of stack space somewhere thing); it was just short of 640K long, and so we figured that evidently it was wrong to conclude that "640K ought to be enough for anybody." Neil Conway was an observer; he was speculating that, with some (possibly nontrivial) change to the parser, we should have been able to cope with it. The query consisted mostly of a NOT IN clause where the list had some atrocious number of entries in it (all integers). (Aside: I wound up writing a "query compressor" (now in 1.2) which would read that list and, if it was at all large, try to squeeze any sets of consecutive integers into sets of "NOT BETWEEN" clauses. Usually, the lists, of XIDs, were more or less consecutive, and frequently, in the cases where the query got to MBs in size, there would be sets of hundreds or even thousands of consecutive integers such that we'd be left with a tiny query after this...) -- select 'cbbrowne' || '@' || 'linuxfinances.info'; http://linuxfinances.info/info/linux.html As of next Monday, MACLISP will no longer support list structure. Please downgrade your programs.
Thanks everyone for your responses. I don't think it's realistic to change our application infrastructure to use COPY from a stream at this point. It's good to know that multi-row-VALUES is good up into the thousands of rows (depending on various things, of course). That's a good enough answer for what I was looking for and we can revisit this if performance does start to hurt.
On 6/21/07, Andreas Kretschmer <akretschmer@spamfence.net> wrote:
I guess you can obtain the same if you pack all INSERTs into one
transaction.
Well the 20% gain I referred to was when all individual INSERTs were within one transaction. When each INSERT does its own commit, it's significantly slower.
Steve
Chris Browne <cbbrowne@acm.org> writes: > I once ran into the situation where Slony-I generated a query that > made the parser blow out (some sort of memory problem / running out of > stack space somewhere thing); it was just short of 640K long, and so > we figured that evidently it was wrong to conclude that "640K ought to > be enough for anybody." > Neil Conway was an observer; he was speculating that, with some > (possibly nontrivial) change to the parser, we should have been able > to cope with it. > The query consisted mostly of a NOT IN clause where the list had some > atrocious number of entries in it (all integers). FWIW, we do seem to have improved that as of 8.2. Assuming your entries were 6-or-so-digit integers, that would have been on the order of 80K entries, and we can manage it --- not amazingly fast, but it doesn't blow out the stack anymore. > (Aside: I wound up writing a "query compressor" (now in 1.2) which > would read that list and, if it was at all large, try to squeeze any > sets of consecutive integers into sets of "NOT BETWEEN" clauses. > Usually, the lists, of XIDs, were more or less consecutive, and > frequently, in the cases where the query got to MBs in size, there > would be sets of hundreds or even thousands of consecutive integers > such that we'd be left with a tiny query after this...) Probably still a win. regards, tom lane