Thread: Clarification reqeusted for "select * from a huge table"
Hi, I had a chance to test one of the real world cases with Oracle and PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM machine) both in oracle and Postgresql. Just write a JDBC program for a 'select *' on that table. With PostgreSQL as backend, java crashes saying that it has met 'Out Of Memory'. With Oracle it doesn't. Postgres tried to send all the results back to the clientat one shot, whereas in Oracle it works like a Cursor. Is this issue already well known among hackers community? If known, why is it designed this way? I also noticed that it doesn't crash with psql, but it takes a long time to show the first set of records. It takes a long time, even to quit after i pressed 'q'. With oracle SQLPlus, it is quite instantaneous. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Gokulakannan Somasundaram wrote: > Hi, > I had a chance to test one of the real world cases with Oracle and > PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM > machine) both in oracle and Postgresql. Just write a JDBC program for > a 'select *' on that table. With PostgreSQL as backend, java crashes > saying that it has met 'Out Of Memory'. With Oracle it doesn't. > Postgres tried to send all the results back to the client at one > shot, whereas in Oracle it works like a Cursor. Is this issue already > well known among hackers community? If known, why is it designed this > way? It returns everything because you've asked for it. If you wanted something that looks like a cursor, PG assumes you'll request a cursor. As to why, there are two reasons: 1. It's always been that way and changing it now would irritate most of the existing user-base. 2. Repeat your test with 5,10,50,100 clients all running different big queries and see which puts a greater load on the server. PG favours supporting lots of clients by pushing the load onto them. > I also noticed that it doesn't crash with psql, but it takes a > long time to show the first set of records. It takes a long time, even > to quit after i pressed 'q'. > With oracle SQLPlus, it is quite instantaneous. Again, you're measuring different things. What is the time to the *last* row? -- Richard Huxton Archonet Ltd
Gokulakannan Somasundaram wrote: > I had a chance to test one of the real world cases with Oracle and > PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM > machine) both in oracle and Postgresql. Just write a JDBC program for > a 'select *' on that table. With PostgreSQL as backend, java crashes > saying that it has met 'Out Of Memory'. With Oracle it doesn't. > Postgres tried to send all the results back to the client at one > shot, whereas in Oracle it works like a Cursor. Is this issue already > well known among hackers community? If known, why is it designed this > way? http://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Nov 12, 2007 5:25 PM, Richard Huxton <dev@archonet.com> wrote: > Gokulakannan Somasundaram wrote: > > Hi, > > I had a chance to test one of the real world cases with Oracle and > > PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM > > machine) both in oracle and Postgresql. Just write a JDBC program for > > a 'select *' on that table. With PostgreSQL as backend, java crashes > > saying that it has met 'Out Of Memory'. With Oracle it doesn't. > > Postgres tried to send all the results back to the client at one > > shot, whereas in Oracle it works like a Cursor. Is this issue already > > well known among hackers community? If known, why is it designed this > > way? > > It returns everything because you've asked for it. If you wanted > something that looks like a cursor, PG assumes you'll request a cursor. > > As to why, there are two reasons: > 1. It's always been that way and changing it now would irritate most of > the existing user-base. JDBC API, as you might know has a rs.next() and rs.prev() to scan backwards and forwards. The API looks more like a cursor.Currently, all the results for Postgres is returned to the client during the executeQuery(). This is more like the SAX / DOM argument. I just feel JDBC APIs provide a feeling a SAX. > 2. Repeat your test with 5,10,50,100 clients all running different big > queries and see which puts a greater load on the server. PG favours > supporting lots of clients by pushing the load onto them. When we say Postgres pushes the load to the client, its a huge memory overhead on the client part. In oracle's scenario, the overhead is just maintaining the state(say from which block the scan should be continued). My point is that there are some operations, which are not possible with postgres, whereas it is possible by an another database. It would be better,, if we can support it. (There are some places where DOM parsing is not possible and we prefer SAX) > > > I also noticed that it doesn't crash with psql, but it takes a > > long time to show the first set of records. It takes a long time, even > > to quit after i pressed 'q'. > > With oracle SQLPlus, it is quite instantaneous. > > Again, you're measuring different things. What is the time to the *last* > row? I made this point, because people usually fire select * from table query in the psql prompt to get a feel of the table. Ofcourse they can fire select * from table limit 10; But i just feel its more key strokes and it would be better, if it is interactive. I accept that it is a debatable point and people can prefer otherwise to have more key strokes. Imagine, you need a large batch operation. In oracle we can fire the SQL and we can be sure that the client won't crash, but with postgres we have a region of uncertainity. There are some JDBC hints like setFetchSize(), which actually affects Oracle's behaviour. But it doesn't seem to do anything with postgres. But JDBC has declared these commands as hint commands and has provided a warning to users, about the fact that it may get ignored -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
> we have a region of uncertainity. There are some JDBC hints like > setFetchSize(), which actually affects Oracle's behaviour. But it > doesn't seem to do anything with postgres. But JDBC has declared these > commands as hint commands and has provided a warning to users, about > the fact that it may get ignored This is a wrong statement. I would like to take it back... -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
On Nov 12, 2007 6:01 PM, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Gokulakannan Somasundaram wrote: > > I had a chance to test one of the real world cases with Oracle and > > PostgreSQL. Create a Table with 10 million rows (i worked on a 1GB RAM > > machine) both in oracle and Postgresql. Just write a JDBC program for > > a 'select *' on that table. With PostgreSQL as backend, java crashes > > saying that it has met 'Out Of Memory'. With Oracle it doesn't. > > Postgres tried to send all the results back to the client at one > > shot, whereas in Oracle it works like a Cursor. Is this issue already > > well known among hackers community? If known, why is it designed this > > way? > > http://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > Thanks Heikki. That answered my question. Can you tell me, what is the similar setting for psql? -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Gokulakannan Somasundaram wrote: >>> I also noticed that it doesn't crash with psql, but it takes a >>> long time to show the first set of records. It takes a long time, even >>> to quit after i pressed 'q'. >>> With oracle SQLPlus, it is quite instantaneous. >> Again, you're measuring different things. What is the time to the *last* >> row? > > I made this point, because people usually fire select * from table > query in the psql prompt to get a feel of the table. Well, I don't. If nothing else, it's difficult to fit a whole row across the screen with many tables. > Ofcourse they can > fire select * from table limit 10; But i just feel its more key > strokes and it would be better, if it is interactive. I accept that it > is a debatable point and people can prefer otherwise to have more key > strokes. If you want few keystrokes to see what columns a table has, how about \d table > Imagine, you need a large batch operation. In oracle we can fire the > SQL and we can be sure that the client won't crash, but with postgres > we have a region of uncertainity. Well, if your client doesn't know if it can handle 1 million rows, maybe it shouldn't ask for them? > There are some JDBC hints like > setFetchSize(), which actually affects Oracle's behaviour. But it > doesn't seem to do anything with postgres. But JDBC has declared these > commands as hint commands and has provided a warning to users, about > the fact that it may get ignored I see you've checked and seen it does work. I'm not a java man, but I do know there are ways to control this. Perhaps try the -jdbc mailing list. In any case, I think we've established it's nothing for the hackers list. -- Richard Huxton Archonet Ltd
On 11/12/07, Richard Huxton <dev@archonet.com> wrote: > Gokulakannan Somasundaram wrote: > >>> I also noticed that it doesn't crash with psql, but it takes a > >>> long time to show the first set of records. It takes a long time, even > >>> to quit after i pressed 'q'. > >>> With oracle SQLPlus, it is quite instantaneous. > > Imagine, you need a large batch operation. In oracle we can fire the > > SQL and we can be sure that the client won't crash, but with postgres > > we have a region of uncertainity. > > Well, if your client doesn't know if it can handle 1 million rows, maybe > it shouldn't ask for them? Isn't that exactly his point? He's talking about the default behavior of clients designed for postgres, one of which is psql.
--On Montag, November 12, 2007 18:10:12 +0530 Gokulakannan Somasundaram <gokul007@gmail.com> wrote: >> http://jdbc.postgresql.org/documentation/head/query.html#query-with-curs >> or >> >> -- >> Heikki Linnakangas >> EnterpriseDB http://www.enterprisedb.com >> > > Thanks Heikki. That answered my question. Can you tell me, what is the > similar setting for psql? See the FETCH_COUNT psql variable in Version 8.2 and above. -- Thanks Bernd
On Nov 12, 2007 6:48 PM, Bernd Helmle <mailings@oopsware.de> wrote: > --On Montag, November 12, 2007 18:10:12 +0530 Gokulakannan Somasundaram > <gokul007@gmail.com> wrote: > > >> http://jdbc.postgresql.org/documentation/head/query.html#query-with-curs > >> or > >> > >> -- > >> Heikki Linnakangas > >> EnterpriseDB http://www.enterprisedb.com > >> > > > > Thanks Heikki. That answered my question. Can you tell me, what is the > > similar setting for psql? > > See the FETCH_COUNT psql variable in Version 8.2 and above. > > -- > Thanks > > Bernd > Thanks a lot Brend. I apologize for posting this query in Hackers. -- Thanks, Gokul.
Trevor Talbot wrote: > On 11/12/07, Richard Huxton <dev@archonet.com> wrote: >> Gokulakannan Somasundaram wrote: > >>>>> I also noticed that it doesn't crash with psql, but it takes a >>>>> long time to show the first set of records. It takes a long time, even >>>>> to quit after i pressed 'q'. >>>>> With oracle SQLPlus, it is quite instantaneous. > >>> Imagine, you need a large batch operation. In oracle we can fire the >>> SQL and we can be sure that the client won't crash, but with postgres >>> we have a region of uncertainity. >> Well, if your client doesn't know if it can handle 1 million rows, maybe >> it shouldn't ask for them? > > Isn't that exactly his point? He's talking about the default behavior > of clients designed for postgres, one of which is psql. Psql isn't the client - you are. -- Richard Huxton Archonet Ltd