Thread: SELECT very slow
Hello, I have a table with roughly 100,000 rows (four varchar(100) columns). This is basically test data I generated for something else. I'm using JDBC to access PG (but the behaviour is the same with psql). The problem is, that a SELECT * FROM foobar; takes ages (roughly 3 minutes) to return the first row. I played around with the fetchSize() to disable the result set caching in the Java program first (before I tried psql) but that did not change anything. It seems that PG is actually building up the full result set in the background before delivering the first row. But I cannot see any of the processes (neither my Java process nor the PG processes) using a lot of memory - which I would expect if a result set of that size is created. I need to test a program which should process large result sets (without loading it into memory) and waiting nearly three minutes before it actually starts working is a bit annoying :) A SELECT count(*) FROM foobar is quite fast (about 2 seconds) I hate to say this, but the same SELECT returns the first row more or less instantly with Firebird, SQL Server (MSDE) and HSQLDB. Is there anything I can do, to convince PG to return the first row more quickly? I tried a VACUUM FULL, no change. I increased the shared_buffers to 1000, no improvement either. The execution plan is not really surprising: Seq Scan on foobar (cost=0.00..2510.04 rows=117504 width=63) I'm using PG 8.0.3 on Win2K. Thanks in advance Thomas
Thomas Kellerer <spam_eater@gmx.net> writes: > Is there anything I can do, to convince PG to return the first row more > quickly? libpq's API for PQresult is such that it really doesn't have any choice but to collect the full result set before it hands you back the PQresult. I don't know JDBC very well but I think it has similar semantic constraints. The solution is to use a cursor and FETCH a reasonably small number of rows at a time. regards, tom lane
Tom Lane escreveu: > Thomas Kellerer <spam_eater@gmx.net> writes: > >>Is there anything I can do, to convince PG to return the first row more >>quickly? Are you now looking for the LIMIT ? SELECT * FROM table LIMIT 1; and when when you wnat the rest of it: SELECT * FROM table OFFSET 1; Alain
On 09.06.2005 03:13 Alain wrote: > > > Tom Lane escreveu: > >> Thomas Kellerer <spam_eater@gmx.net> writes: >> >>> Is there anything I can do, to convince PG to return the first row >>> more quickly? > > > Are you now looking for the LIMIT ? > > SELECT * FROM table LIMIT 1; > > and when when you wnat the rest of it: > > SELECT * FROM table OFFSET 1; > No I want the whole result. Thomas
Hi, On 6/9/05, Thomas Kellerer <spam_eater@gmx.net> wrote: > No I want the whole result. As Tom underlined: On 6/9/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The solution is to use a cursor and FETCH a reasonably > small number of rows at a time. AFAIC, query results are stored as arrays in PGresult structures. Thus, storing huge result sets in a single struct is not very feasible; although, you can face with theoretical limits like MAX_INT in the long run. Moreover, it's so rare to see any practical use of retrieving thousands of rows. If you're getting quite huge sets of data, you should try grouping them with suitable statements. IMHO, you should use cursors to fetch a suitable amount of row from related table and forward it recursively. (Furthermore, I think this is one of the design goals of FETCH mechanism.) Although, as I see from most API implementations (like C++, Perl, PHP, Python and etc.), they use libpq as layer between API and server. Therefore, you'll probably encounter with every limitation of libpq with other programming languages out of C too. Regards.
On 09.06.2005 02:06 Tom Lane wrote: > Thomas Kellerer <spam_eater@gmx.net> writes: > >>Is there anything I can do, to convince PG to return the first row more >>quickly? > > > libpq's API for PQresult is such that it really doesn't have any choice > but to collect the full result set before it hands you back the > PQresult. I don't know JDBC very well but I think it has similar > semantic constraints. > > The solution is to use a cursor and FETCH a reasonably small number of > rows at a time. > My understanding was that setting the fetchSize() to 1 in JDBC would force the use of a cursor. I'll have to go through the JDBC docs again to check how I can enforce this. Thanks for the answer though Thomas
On 09.06.2005 02:06 Tom Lane wrote: > Thomas Kellerer <spam_eater@gmx.net> writes: > >>Is there anything I can do, to convince PG to return the first row more >>quickly? > > The solution is to use a cursor and FETCH a reasonably small number of > rows at a time. Thanks for all your answers. I turned out that I did not read the JDBC documentation closely enough (blush) Setting the fetchSize to 1 (or something >0) only uses a cursor if autocommit is disabled (due to the annoying "transaction is aborted" when running DDL scripts that have errors, I usually turn autocommit on). With autocommit off, the rows will be returned "immediately" (so the driver is using a cursor to fetch the data) Regards Thomas
> The problem is, that a SELECT * FROM foobar; takes ages (roughly 3 > minutes) to return the first row. I played around with the fetchSize() > to disable the result set caching in the Java program first (before I > tried psql) but that did not change anything. Hello, Yours seemed strange so I tried this :Created a table with 128K lines, 4 TEXT columns containing about 70 chars each... --------------------------------------------- \d bigtest; Colonne | Type | Modificateurs ---------+---------+--------------------------------------------------------- id | integer | not null default nextval('public.bigtest_id_seq'::text) data1 | text | data2 | text | data3 | text | data4 | text | Index : «bigtest_pkey» PRIMARY KEY, btree (id) --------------------------------------------- SELECT count(*) from bigtest; count -------- 131072 --------------------------------------------- explain analyze select * from bigtest; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scanon bigtest (cost=0.00..7001.72 rows=131072 width=308) (actual time=0.035..484.249 rows=131072 loops=1) Total runtime: 875.095 ms So grabbing the data takes 0.875 seconds. --------------------------------------------- SELECT avg(length(data1)), avg(length(data2)),avg(length(data3)),avg(length(data3)) from bigtest; avg | avg | avg | avg ---------------------+---------------------+---------------------+--------------------- 72.1629180908203125 | 72.2342376708984375| 72.3680572509765625 | 72.3680572509765625 Here you see the average data sizes. --------------------------------------------- Now I fire up python, do a SELECT * from the table and retrieve all the data as native objects... Hm, it takes about 1.3 seconds... on my Pentium-M 1600 laptop... I was about to suggest you use a less slow and bloated language than Java, but then on my machine psql takes about 5 seconds to display the results, so it looks like it ain't Java. psql is slow because it has to format the result and compute the column widths. Don't you have a problem somewhere ? Are you sure it's not swapping ? did you check memory ? Are you transferring all this data over the network ? Might an obscure cabling problem have reverted your connection to 10 Mbps ? I'm using pg 8.0.something on Linux. Ouch. I saw you're on Windows so I tried it on the windows machine there which has a postgres installed, over a 100Mbps network, querying from my linux laptop. The windows machine is a piece of crap, Pentium-II 300 and 256 MB Ram, it takes 7 seconds to retrieve the whole table in a python native object. So...
PFC wrote on 14.06.2005 14:26: [...] > Now I fire up python, do a SELECT * from the table and retrieve all > the data as native objects... Hm, it takes about 1.3 seconds... on my > Pentium-M 1600 laptop... > > Don't you have a problem somewhere ? Are you sure it's not swapping > ? did you check memory ? Are you transferring all this data over the > network ? Might an obscure cabling problem have reverted your > connection to 10 Mbps ? I'm sure. Everything is running on the same machine, about 350 MB free memory. > Ouch. I saw you're on Windows so I tried it on the windows machine > there which has a postgres installed, over a 100Mbps network, querying > from my linux laptop. The windows machine is a piece of crap, > Pentium-II 300 and 256 MB Ram, it takes 7 seconds to retrieve the whole > table in a python native object. > It's not the program or Java. The same program takes about 20 seconds with Firebird and the exactly same data. Thomas
> It's not the program or Java. The same program takes about 20 seconds > with Firebird and the exactly same data. Hm, that's still very slow (it should do it in a couple seconds like my PC does... maybe the problem is common to postgres and firebird ?) Try eliminating disk IO by writing a set returning function which returns 1000000 rows, something simple like just a sequence number and a text value... if this is slow too... i don't know... do you have an antivirus or zonealarm or something ? Have you tried connecting from another machine ? > > Thomas > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
PFC wrote on 15.06.2005 22:04: > >> It's not the program or Java. The same program takes about 20 seconds >> with Firebird and the exactly same data. > > > Hm, that's still very slow (it should do it in a couple seconds like > my PC does... maybe the problem is common to postgres and firebird ?) > > Try eliminating disk IO by writing a set returning function which > returns 1000000 rows, something simple like just a sequence number and > a text value... if this is slow too... i don't know... > do you have an antivirus or zonealarm or something ? Wouldn't that affect all DB access not only PG? And as I said, all other The 20 seconds are ok. This includes processing of the data in the application. If I simply loop over the result set and get each column's value without further processing it takes 4 seconds with Firebird. Basically I'm doing the following: rs = stmt.executeQuery("select * from foo"); while (rs.next()) { for (int i=0; i < 4; i++) { Object o = rs.getObject(i+1); } } As I said in my other post, the behaviour/performance in PG is dependent on the autocommit setting for the connection. With autocommit set to false the above code takes about 3 seconds in PG but wit autocommit set to true, PG takes 3 minutes! It seems that it also is very dependent on the fetchsize (apparently the number of rows that are cached by the driver). Anything above 100 seems to slow down the overall process. Regards Thomas
On Wed, 2005-06-15 at 17:08, Thomas Kellerer wrote: > PFC wrote on 15.06.2005 22:04: > > > > >> It's not the program or Java. The same program takes about 20 seconds > >> with Firebird and the exactly same data. > > > > > > Hm, that's still very slow (it should do it in a couple seconds like > > my PC does... maybe the problem is common to postgres and firebird ?) > > > > Try eliminating disk IO by writing a set returning function which > > returns 1000000 rows, something simple like just a sequence number and > > a text value... if this is slow too... i don't know... > > > do you have an antivirus or zonealarm or something ? > Wouldn't that affect all DB access not only PG? And as I said, all other > > The 20 seconds are ok. This includes processing of the data in the > application. If I simply loop over the result set and get each column's > value without further processing it takes 4 seconds with Firebird. > > Basically I'm doing the following: > > rs = stmt.executeQuery("select * from foo"); > while (rs.next()) > { > for (int i=0; i < 4; i++) > { > Object o = rs.getObject(i+1); > } > } > > As I said in my other post, the behaviour/performance in PG is dependent on > the autocommit setting for the connection. > > With autocommit set to false the above code takes about 3 seconds in PG > but wit autocommit set to true, PG takes 3 minutes! It seems that it also > is very dependent on the fetchsize (apparently the number of rows that are > cached by the driver). Anything above 100 seems to slow down the overall > process. There's got to be more happening than what this is showing us. A select, and looping through it, should involve no writes, and therefore no real performance difference from autocommit versus not. Is there some underlying trigger on the view or something like that? Some kind of auditing function?
On 16.06.2005 16:00 Scott Marlowe wrote: > There's got to be more happening than what this is showing us. A > select, and looping through it, should involve no writes, and therefore > no real performance difference from autocommit versus not. Is there > some underlying trigger on the view or something like that? Some kind > of auditing function? That's exactly the code that produced the mentioned timings. This is - according to the JDBC driver's documentation - the expected behaviour. The driver can be set to use cursor based fetching but *only* if autocommit is false. If autocommit is on (or fetch size is zero) then the driver will build the whole result set before returning to the caller. http://jdbc.postgresql.org/documentation/80/query.html#query-with-cursor Thomas
> If autocommit is on (or fetch size is zero) then the driver will build > the whole > result set before returning to the caller. Sure, but that is not your problem : even building the whole result set should not take longer than a few seconds (I gave you test timings in a previous message).So... what ?What does the taskman say ? CPU at 100% ? how much kernel time ? > > http://jdbc.postgresql.org/documentation/80/query.html#query-with-cursor > > Thomas > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
On 16.06.2005 17:29 PFC wrote: >> If autocommit is on (or fetch size is zero) then the driver will >> build the whole >> result set before returning to the caller. > > > Sure, but that is not your problem : even building the whole result > set should not take longer than a few seconds (I gave you test timings > in a previous message). > So... what ? > What does the taskman say ? CPU at 100% ? how much kernel time ? > Taskmanager (I'm on windows) does not show any high CPU usage. And (as I wrote in the initial post) I cannot see any memory increase in the PG process as well (which I would expect with a result set of that size built up in memory). Thomas