Thread: SELECT very slow

SELECT very slow

From
Thomas Kellerer
Date:
Hello,

I have a table with roughly 100,000 rows (four varchar(100) columns). This 
is basically test data I generated for something else. I'm using JDBC to 
access PG (but the behaviour is the same with psql).

The problem is, that a SELECT * FROM foobar; takes ages (roughly 3 minutes) 
to return the first row. I played around with the fetchSize() to disable 
the result set caching in the Java program first (before I tried psql) but 
that did not change anything.

It seems that PG is actually building up the full result set in the 
background before delivering the first row. But I cannot see any of the 
processes (neither my Java process nor the PG processes) using a lot of 
memory - which I would expect if a result set of that size is created.

I need to test a program which should process large result sets (without 
loading it into memory) and waiting nearly three minutes before it actually 
starts working is a bit annoying :)
A SELECT count(*) FROM foobar is quite fast (about 2 seconds)

I hate to say this, but the same SELECT returns the first row more or less 
instantly with Firebird, SQL Server (MSDE) and HSQLDB.

Is there anything I can do, to convince PG to return the first row more 
quickly?

I tried a VACUUM FULL, no change. I increased the shared_buffers to 1000, 
no improvement either.

The execution plan is not really surprising:

Seq Scan on foobar  (cost=0.00..2510.04 rows=117504 width=63)


I'm using PG 8.0.3 on Win2K.

Thanks in advance
Thomas



Re: SELECT very slow

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Is there anything I can do, to convince PG to return the first row more 
> quickly?

libpq's API for PQresult is such that it really doesn't have any choice
but to collect the full result set before it hands you back the
PQresult.  I don't know JDBC very well but I think it has similar
semantic constraints.

The solution is to use a cursor and FETCH a reasonably small number of
rows at a time.
        regards, tom lane


Re: SELECT very slow

From
Alain
Date:

Tom Lane escreveu:
> Thomas Kellerer <spam_eater@gmx.net> writes:
> 
>>Is there anything I can do, to convince PG to return the first row more 
>>quickly?

Are you now looking for the LIMIT ?

SELECT * FROM table LIMIT 1;

and when when you wnat the rest of it:

SELECT * FROM table OFFSET 1;

Alain


Re: SELECT very slow

From
Thomas Kellerer
Date:
On 09.06.2005 03:13 Alain wrote:

> 
> 
> Tom Lane escreveu:
> 
>> Thomas Kellerer <spam_eater@gmx.net> writes:
>>
>>> Is there anything I can do, to convince PG to return the first row
>>> more quickly?
> 
> 
> Are you now looking for the LIMIT ?
> 
> SELECT * FROM table LIMIT 1;
> 
> and when when you wnat the rest of it:
> 
> SELECT * FROM table OFFSET 1;
> 

No I want the whole result.

Thomas



Re: SELECT very slow

From
Volkan YAZICI
Date:
Hi,

On 6/9/05, Thomas Kellerer <spam_eater@gmx.net> wrote:
> No I want the whole result.

As Tom underlined:

On 6/9/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The solution is to use a cursor and FETCH a reasonably
> small number of rows at a time.

AFAIC, query results are stored as arrays in PGresult structures.
Thus, storing huge result sets in a single struct is not very
feasible; although, you can face with theoretical limits like MAX_INT
in the long run. Moreover, it's so rare to see any practical use of
retrieving thousands of rows. If you're getting quite huge sets of
data, you should try grouping them with suitable statements.

IMHO, you should use cursors to fetch a suitable amount of row from
related table and forward it recursively. (Furthermore, I think this
is one of the design goals of FETCH mechanism.)

Although, as I see from most API implementations (like C++, Perl, PHP,
Python and etc.), they use libpq as layer between API and server.
Therefore, you'll probably encounter with every limitation of libpq
with other programming languages out of C too.

Regards.


Re: SELECT very slow

From
Thomas Kellerer
Date:
On 09.06.2005 02:06 Tom Lane wrote:

> Thomas Kellerer <spam_eater@gmx.net> writes:
> 
>>Is there anything I can do, to convince PG to return the first row more 
>>quickly?
> 
> 
> libpq's API for PQresult is such that it really doesn't have any choice
> but to collect the full result set before it hands you back the
> PQresult.  I don't know JDBC very well but I think it has similar
> semantic constraints.
> 
> The solution is to use a cursor and FETCH a reasonably small number of
> rows at a time.
> 

My understanding was that setting the fetchSize() to 1 in JDBC would force the
use of a cursor.

I'll have to go through the JDBC docs again to check how I can enforce this.

Thanks for the answer though
Thomas



Re: SELECT very slow

From
Thomas Kellerer
Date:
On 09.06.2005 02:06 Tom Lane wrote:

> Thomas Kellerer <spam_eater@gmx.net> writes:
> 
>>Is there anything I can do, to convince PG to return the first row more 
>>quickly?
> 
> The solution is to use a cursor and FETCH a reasonably small number of
> rows at a time.

Thanks for all your answers.

I turned out that I did not read the JDBC documentation closely enough (blush)

Setting the fetchSize to 1 (or something >0) only uses a cursor if autocommit is
disabled (due to the annoying "transaction is aborted" when running DDL scripts
that have errors, I usually turn autocommit on).

With autocommit off, the rows will be returned "immediately" (so the driver is
using a cursor to fetch the data)

Regards
Thomas



Re: SELECT very slow

From
PFC
Date:
> The problem is, that a SELECT * FROM foobar; takes ages (roughly 3  
> minutes) to return the first row. I played around with the fetchSize()  
> to disable the result set caching in the Java program first (before I  
> tried psql) but that did not change anything.
Hello,
Yours seemed strange so I tried this :Created a table with 128K lines, 4 TEXT columns containing about 70 chars  
each...

---------------------------------------------
\d bigtest; Colonne |  Type   |                      Modificateurs
---------+---------+--------------------------------------------------------- id      | integer | not null default  
nextval('public.bigtest_id_seq'::text) data1   | text    | data2   | text    | data3   | text    | data4   | text    |
Index :    «bigtest_pkey» PRIMARY KEY, btree (id)

---------------------------------------------

SELECT count(*) from bigtest; count
-------- 131072

---------------------------------------------

explain analyze select * from bigtest;                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------ Seq
Scanon bigtest  (cost=0.00..7001.72 rows=131072 width=308) (actual  
 
time=0.035..484.249 rows=131072 loops=1) Total runtime: 875.095 ms

So grabbing the data takes 0.875 seconds.

---------------------------------------------

SELECT avg(length(data1)),  
avg(length(data2)),avg(length(data3)),avg(length(data3)) from bigtest;         avg         |         avg         |
  avg         |          
 
avg
---------------------+---------------------+---------------------+--------------------- 72.1629180908203125 |
72.2342376708984375| 72.3680572509765625 |  
 
72.3680572509765625

Here you see the average data sizes.

---------------------------------------------
Now I fire up python, do a SELECT * from the table and retrieve all the  
data as native objects... Hm, it takes about 1.3 seconds... on my  
Pentium-M 1600 laptop...
I was about to suggest you use a less slow and bloated language than  
Java, but then on my machine psql takes about 5 seconds to display the  
results, so it looks like it ain't Java. psql is slow because it has to  
format the result and compute the column widths.
Don't you have a problem somewhere ? Are you sure it's not swapping ? did  
you check memory ? Are you transferring all this data over the network ?  
Might an obscure cabling problem have reverted your connection to 10 Mbps ?
I'm using pg 8.0.something on Linux.
Ouch. I saw you're on Windows so I tried it on the windows machine there  
which has a postgres installed, over a 100Mbps network, querying from my  
linux laptop. The windows machine is a piece of crap, Pentium-II 300 and  
256 MB Ram, it takes 7 seconds to retrieve the whole table in a python  
native object.
So...













Re: SELECT very slow

From
Thomas Kellerer
Date:

PFC wrote on 14.06.2005 14:26:
[...]
>     Now I fire up python, do a SELECT * from the table and retrieve all 
> the  data as native objects... Hm, it takes about 1.3 seconds... on my  
> Pentium-M 1600 laptop...
> 
>     Don't you have a problem somewhere ? Are you sure it's not swapping 
> ? did  you check memory ? Are you transferring all this data over the 
> network ?  Might an obscure cabling problem have reverted your 
> connection to 10 Mbps ?

I'm sure. Everything is running on the same machine, about 350 MB free memory.

>     Ouch. I saw you're on Windows so I tried it on the windows machine 
> there  which has a postgres installed, over a 100Mbps network, querying 
> from my  linux laptop. The windows machine is a piece of crap, 
> Pentium-II 300 and  256 MB Ram, it takes 7 seconds to retrieve the whole 
> table in a python  native object.
> 
It's not the program or Java. The same program takes about 20 seconds with 
Firebird and the exactly same data.

Thomas




Re: SELECT very slow

From
PFC
Date:
> It's not the program or Java. The same program takes about 20 seconds  
> with Firebird and the exactly same data.
Hm, that's still very slow (it should do it in a couple seconds like my  
PC does... maybe the problem is common to postgres and firebird ?)
Try eliminating disk IO by writing a set returning function which returns  
1000000 rows, something simple like just a sequence number and a text  
value... if this is slow too... i don't know... do you have an antivirus  
or zonealarm or something ?
Have you tried connecting from another machine ?

>
> Thomas
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>




Re: SELECT very slow

From
Thomas Kellerer
Date:
PFC wrote on 15.06.2005 22:04:

> 
>> It's not the program or Java. The same program takes about 20 seconds  
>> with Firebird and the exactly same data.
> 
> 
>     Hm, that's still very slow (it should do it in a couple seconds like 
> my  PC does... maybe the problem is common to postgres and firebird ?)
> 
>     Try eliminating disk IO by writing a set returning function which 
> returns  1000000 rows, something simple like just a sequence number and 
> a text  value... if this is slow too... i don't know... 
> do you have an antivirus  or zonealarm or something ?
Wouldn't that affect all DB access not only PG? And as I said, all other

The 20 seconds are ok. This includes processing of the data in the 
application. If I simply loop over the result set and get each column's 
value without further processing it takes 4 seconds with Firebird.

Basically I'm doing the following:

rs = stmt.executeQuery("select * from foo");
while (rs.next())
{  for (int i=0; i < 4; i++)  {    Object o = rs.getObject(i+1);  }
}

As I said in my other post, the behaviour/performance in PG is dependent on 
the autocommit setting for the connection.

With autocommit set to false the above code takes about 3 seconds in PG 
but wit autocommit set to true, PG takes 3 minutes! It seems that it also 
is very dependent on the fetchsize (apparently the number of rows that are 
cached by the driver). Anything above 100 seems to slow down the overall 
process.



Regards
Thomas




Re: SELECT very slow

From
Scott Marlowe
Date:
On Wed, 2005-06-15 at 17:08, Thomas Kellerer wrote:
> PFC wrote on 15.06.2005 22:04:
> 
> > 
> >> It's not the program or Java. The same program takes about 20 seconds  
> >> with Firebird and the exactly same data.
> > 
> > 
> >     Hm, that's still very slow (it should do it in a couple seconds like 
> > my  PC does... maybe the problem is common to postgres and firebird ?)
> > 
> >     Try eliminating disk IO by writing a set returning function which 
> > returns  1000000 rows, something simple like just a sequence number and 
> > a text  value... if this is slow too... i don't know... 
> 
>  > do you have an antivirus  or zonealarm or something ?
> Wouldn't that affect all DB access not only PG? And as I said, all other
> 
> The 20 seconds are ok. This includes processing of the data in the 
> application. If I simply loop over the result set and get each column's 
> value without further processing it takes 4 seconds with Firebird.
> 
> Basically I'm doing the following:
> 
> rs = stmt.executeQuery("select * from foo");
> while (rs.next())
> {
>    for (int i=0; i < 4; i++)
>    {
>      Object o = rs.getObject(i+1);
>    }
> }
> 
> As I said in my other post, the behaviour/performance in PG is dependent on 
> the autocommit setting for the connection.
> 
> With autocommit set to false the above code takes about 3 seconds in PG 
> but wit autocommit set to true, PG takes 3 minutes! It seems that it also 
> is very dependent on the fetchsize (apparently the number of rows that are 
> cached by the driver). Anything above 100 seems to slow down the overall 
> process.

There's got to be more happening than what this is showing us.  A
select, and looping through it, should involve no writes, and therefore
no real performance difference from autocommit versus not.  Is there
some underlying trigger on the view or something like that?  Some kind
of auditing function?


Re: SELECT very slow

From
Thomas Kellerer
Date:
On 16.06.2005 16:00 Scott Marlowe wrote:

> There's got to be more happening than what this is showing us.  A
> select, and looping through it, should involve no writes, and therefore
> no real performance difference from autocommit versus not.  Is there
> some underlying trigger on the view or something like that?  Some kind
> of auditing function?

That's exactly the code that produced the mentioned timings. This is - according
to the JDBC driver's documentation - the expected behaviour. The driver can be
set to use cursor based fetching but *only* if autocommit is false.

If autocommit is on (or fetch size is zero) then the driver will build the whole
result set before returning to the caller.

http://jdbc.postgresql.org/documentation/80/query.html#query-with-cursor

Thomas



Re: SELECT very slow

From
PFC
Date:

> If autocommit is on (or fetch size is zero) then the driver will build  
> the whole
> result set before returning to the caller.
Sure, but that is not your problem : even building the whole result set  
should not take longer than a few seconds (I gave you test timings in a  
previous message).So... what ?What does the taskman say ? CPU at 100% ? how much kernel time ?

>
> http://jdbc.postgresql.org/documentation/80/query.html#query-with-cursor
>
> Thomas
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>




Re: SELECT very slow

From
Thomas Kellerer
Date:
On 16.06.2005 17:29 PFC wrote:

>> If autocommit is on (or fetch size is zero) then the driver will
>> build  the whole
>> result set before returning to the caller.
> 
> 
>     Sure, but that is not your problem : even building the whole result
> set  should not take longer than a few seconds (I gave you test timings
> in a  previous message).
>     So... what ?
>     What does the taskman say ? CPU at 100% ? how much kernel time ?
> 

Taskmanager (I'm on windows) does not show any high CPU usage. And (as I wrote
in the initial post) I cannot see any memory increase in the PG process as well
(which I would expect with a result set of that size built up in memory).

Thomas