Thread: Clarification reqeusted for "select * from a huge table"

Clarification reqeusted for "select * from a huge table"

From

"Gokulakannan Somasundaram"

Date:

12 November 2007, 09:22:05

Hi,   I had a chance to test one of the real world cases with Oracle and
PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM
machine) both in oracle and Postgresql. Just write a JDBC program for
a 'select *' on that table. With PostgreSQL as backend, java crashes
saying that it has met 'Out Of Memory'. With Oracle it doesn't.   Postgres tried to send all the results back to the
clientat one
 
shot, whereas in Oracle it works like a Cursor. Is this issue already
well known among hackers community? If known, why is it designed this
way?
   I also noticed that it doesn't crash with psql, but it takes a
long time to show the first set of records. It takes a long time, even
to quit after i pressed 'q'.  With oracle SQLPlus, it is quite instantaneous.

-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Re: Clarification reqeusted for "select * from a huge table"

From

Richard Huxton

Date:

12 November 2007, 10:56:19

Gokulakannan Somasundaram wrote:
> Hi,
>     I had a chance to test one of the real world cases with Oracle and
> PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM
> machine) both in oracle and Postgresql. Just write a JDBC program for
> a 'select *' on that table. With PostgreSQL as backend, java crashes
> saying that it has met 'Out Of Memory'. With Oracle it doesn't.
>     Postgres tried to send all the results back to the client at one
> shot, whereas in Oracle it works like a Cursor. Is this issue already
> well known among hackers community? If known, why is it designed this
> way?

It returns everything because you've asked for it. If you wanted 
something that looks like a cursor, PG assumes you'll request a cursor.

As to why, there are two reasons:
1. It's always been that way and changing it now would irritate most of 
the existing user-base.
2. Repeat your test with 5,10,50,100 clients all running different big 
queries and see which puts a greater load on the server. PG favours 
supporting lots of clients by pushing the load onto them.

>     I also noticed that it doesn't crash with psql, but it takes a
> long time to show the first set of records. It takes a long time, even
> to quit after i pressed 'q'.
>    With oracle SQLPlus, it is quite instantaneous.

Again, you're measuring different things. What is the time to the *last* 
row?

--   Richard Huxton  Archonet Ltd

Re: Clarification reqeusted for "select * from a huge table"

From

Heikki Linnakangas

Date:

12 November 2007, 11:31:27

Gokulakannan Somasundaram wrote:
>     I had a chance to test one of the real world cases with Oracle and
> PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM
> machine) both in oracle and Postgresql. Just write a JDBC program for
> a 'select *' on that table. With PostgreSQL as backend, java crashes
> saying that it has met 'Out Of Memory'. With Oracle it doesn't.
>     Postgres tried to send all the results back to the client at one
> shot, whereas in Oracle it works like a Cursor. Is this issue already
> well known among hackers community? If known, why is it designed this
> way?

http://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Clarification reqeusted for "select * from a huge table"

From

"Gokulakannan Somasundaram"

Date:

12 November 2007, 11:35:28

On Nov 12, 2007 5:25 PM, Richard Huxton <dev@archonet.com> wrote:
> Gokulakannan Somasundaram wrote:
> > Hi,
> >     I had a chance to test one of the real world cases with Oracle and
> > PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM
> > machine) both in oracle and Postgresql. Just write a JDBC program for
> > a 'select *' on that table. With PostgreSQL as backend, java crashes
> > saying that it has met 'Out Of Memory'. With Oracle it doesn't.
> >     Postgres tried to send all the results back to the client at one
> > shot, whereas in Oracle it works like a Cursor. Is this issue already
> > well known among hackers community? If known, why is it designed this
> > way?
>
> It returns everything because you've asked for it. If you wanted
> something that looks like a cursor, PG assumes you'll request a cursor.
>
> As to why, there are two reasons:
> 1. It's always been that way and changing it now would irritate most of
> the existing user-base.
JDBC API, as you might know has a rs.next() and rs.prev() to scan
backwards and forwards. The API looks more like a cursor.Currently,
all the results for Postgres is returned to the client during the
executeQuery(). This is more like the SAX / DOM argument. I just feel
JDBC APIs provide a feeling a SAX.

> 2. Repeat your test with 5,10,50,100 clients all running different big
> queries and see which puts a greater load on the server. PG favours
> supporting lots of clients by pushing the load onto them.
When we say Postgres pushes the load to the client, its a huge memory
overhead on the  client part. In oracle's scenario, the overhead is
just maintaining the state(say from which block the scan should be
continued). My point is that there are some operations, which are not
possible with postgres, whereas it is possible by an another database.
It would be better,, if we can support it. (There are some places
where DOM parsing is not possible and we prefer SAX)


>
> >     I also noticed that it doesn't crash with psql, but it takes a
> > long time to show the first set of records. It takes a long time, even
> > to quit after i pressed 'q'.
> >    With oracle SQLPlus, it is quite instantaneous.
>
> Again, you're measuring different things. What is the time to the *last*
> row?

I made this point, because people usually fire select * from table
query in the psql prompt to get a feel of the table. Ofcourse they can
fire select * from table limit 10; But i just feel its more key
strokes and it would be better, if it is interactive. I accept that it
is a debatable point and people can prefer otherwise to have more key
strokes.

Imagine, you need a large batch operation. In oracle we can fire the
SQL and we can be sure that the client won't crash, but with postgres
we have a region of uncertainity. There are some JDBC hints like
setFetchSize(), which actually affects Oracle's behaviour. But it
doesn't seem to do anything with postgres. But JDBC has declared these
commands as hint commands and has provided a warning to users, about
the fact that it may get ignored



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Re: Clarification reqeusted for "select * from a huge table"

From

"Gokulakannan Somasundaram"

Date:

12 November 2007, 11:43:48

> we have a region of uncertainity. There are some JDBC hints like
> setFetchSize(), which actually affects Oracle's behaviour. But it
> doesn't seem to do anything with postgres. But JDBC has declared these
> commands as hint commands and has provided a warning to users, about
> the fact that it may get ignored
This is a wrong statement. I would like to take it back...


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Re: Clarification reqeusted for "select * from a huge table"

From

"Gokulakannan Somasundaram"

Date:

12 November 2007, 11:43:50

On Nov 12, 2007 6:01 PM, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> Gokulakannan Somasundaram wrote:
> >     I had a chance to test one of the real world cases with Oracle and
> > PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM
> > machine) both in oracle and Postgresql. Just write a JDBC program for
> > a 'select *' on that table. With PostgreSQL as backend, java crashes
> > saying that it has met 'Out Of Memory'. With Oracle it doesn't.
> >     Postgres tried to send all the results back to the client at one
> > shot, whereas in Oracle it works like a Cursor. Is this issue already
> > well known among hackers community? If known, why is it designed this
> > way?
>
> http://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
>
> --
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
>

Thanks Heikki. That answered my question. Can you tell me, what is the
similar setting for psql?

-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

Re: Clarification reqeusted for "select * from a huge table"

From

Richard Huxton

Date:

12 November 2007, 11:46:54

Gokulakannan Somasundaram wrote:
>>>     I also noticed that it doesn't crash with psql, but it takes a
>>> long time to show the first set of records. It takes a long time, even
>>> to quit after i pressed 'q'.
>>>    With oracle SQLPlus, it is quite instantaneous.
>> Again, you're measuring different things. What is the time to the *last*
>> row?
> 
> I made this point, because people usually fire select * from table
> query in the psql prompt to get a feel of the table.

Well, I don't. If nothing else, it's difficult to fit a whole row across 
the screen with many tables.
> Ofcourse they can
> fire select * from table limit 10; But i just feel its more key
> strokes and it would be better, if it is interactive. I accept that it
> is a debatable point and people can prefer otherwise to have more key
> strokes.

If you want few keystrokes to see what columns a table has, how about
\d table

> Imagine, you need a large batch operation. In oracle we can fire the
> SQL and we can be sure that the client won't crash, but with postgres
> we have a region of uncertainity.

Well, if your client doesn't know if it can handle 1 million rows, maybe 
it shouldn't ask for them?
> There are some JDBC hints like
> setFetchSize(), which actually affects Oracle's behaviour. But it
> doesn't seem to do anything with postgres. But JDBC has declared these
> commands as hint commands and has provided a warning to users, about
> the fact that it may get ignored

I see you've checked and seen it does work. I'm not a java man, but I do 
know there are ways to control this. Perhaps try the -jdbc mailing list.

In any case, I think we've established it's nothing for the hackers list.

--   Richard Huxton  Archonet Ltd

Re: Clarification reqeusted for "select * from a huge table"

From

"Trevor Talbot"

Date:

12 November 2007, 12:18:51

On 11/12/07, Richard Huxton <dev@archonet.com> wrote:
> Gokulakannan Somasundaram wrote:

> >>>     I also noticed that it doesn't crash with psql, but it takes a
> >>> long time to show the first set of records. It takes a long time, even
> >>> to quit after i pressed 'q'.
> >>>    With oracle SQLPlus, it is quite instantaneous.

> > Imagine, you need a large batch operation. In oracle we can fire the
> > SQL and we can be sure that the client won't crash, but with postgres
> > we have a region of uncertainity.
>
> Well, if your client doesn't know if it can handle 1 million rows, maybe
> it shouldn't ask for them?

Isn't that exactly his point?  He's talking about the default behavior
of clients designed for postgres, one of which is psql.

Re: Clarification reqeusted for "select * from a huge table"

From

Bernd Helmle

Date:

12 November 2007, 12:20:09

--On Montag, November 12, 2007 18:10:12 +0530 Gokulakannan Somasundaram 
<gokul007@gmail.com> wrote:

>> http://jdbc.postgresql.org/documentation/head/query.html#query-with-curs
>> or
>>
>> --
>>    Heikki Linnakangas
>>    EnterpriseDB   http://www.enterprisedb.com
>>
>
> Thanks Heikki. That answered my question. Can you tell me, what is the
> similar setting for psql?

See the FETCH_COUNT psql variable in Version 8.2 and above.

--  Thanks
                   Bernd

Re: Clarification reqeusted for "select * from a huge table"

From

"Gokulakannan Somasundaram"

Date:

12 November 2007, 13:07:55

On Nov 12, 2007 6:48 PM, Bernd Helmle <mailings@oopsware.de> wrote:
> --On Montag, November 12, 2007 18:10:12 +0530 Gokulakannan Somasundaram
> <gokul007@gmail.com> wrote:
>
> >> http://jdbc.postgresql.org/documentation/head/query.html#query-with-curs
> >> or
> >>
> >> --
> >>    Heikki Linnakangas
> >>    EnterpriseDB   http://www.enterprisedb.com
> >>
> >
> > Thanks Heikki. That answered my question. Can you tell me, what is the
> > similar setting for psql?
>
> See the FETCH_COUNT psql variable in Version 8.2 and above.
>
> --
>   Thanks
>
>                     Bernd
>
Thanks a lot Brend. I apologize for posting this query in Hackers.



-- 
Thanks,
Gokul.

Re: Clarification reqeusted for "select * from a huge table"

From

Richard Huxton

Date:

12 November 2007, 13:16:52

Trevor Talbot wrote:
> On 11/12/07, Richard Huxton <dev@archonet.com> wrote:
>> Gokulakannan Somasundaram wrote:
> 
>>>>>     I also noticed that it doesn't crash with psql, but it takes a
>>>>> long time to show the first set of records. It takes a long time, even
>>>>> to quit after i pressed 'q'.
>>>>>    With oracle SQLPlus, it is quite instantaneous.
> 
>>> Imagine, you need a large batch operation. In oracle we can fire the
>>> SQL and we can be sure that the client won't crash, but with postgres
>>> we have a region of uncertainity.
>> Well, if your client doesn't know if it can handle 1 million rows, maybe
>> it shouldn't ask for them?
> 
> Isn't that exactly his point?  He's talking about the default behavior
> of clients designed for postgres, one of which is psql.

Psql isn't the client - you are.

--   Richard Huxton  Archonet Ltd