Thread: Large Tables(>1 Gb)
(I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel 2.2.9, with libc-2.1.2 I am running Postgres 7.0 which I compiled myself.) So, I created a database, a table, and started dumping data into it. Then I added an index on the table. Life was good. After a few weeks, my table eclipsed approximately 1Gb, and when I looked at it in my PG_DATA/database directory, I noticed that there were two files: MYTABLE and MYTABLE.1. I was curious why this happened, but I figured that Postgres must break up tables over 1Gb into multiple files.(right?) Then, while running psql, I did a "select * from MYTABLE;" Well, psql just sits there while the hard drive light blinks like crazy, pulling the table up into memory. I have 256Mb of RAM, so this takes awhile. When I start up "top" and watch my process table, the postgres backend is sucking up the CPU time pulling the data and the psql frontend is sucking up the memory accepting the results. So, I figured that psql must be piling everything up in a "less" like pager. So, I kll the current request, do a "\pset pager" and toggle the pager off. I re-run the select *, and the same thing happens. This time however, I let everything run until my memory taken up by the psql process goes over 256Mb, which means that my system RAM is all used up. Then, my whole machine kinda locks up. My load average hits 5(!) and psql starts taking up well over 300Mb. I am also running X. As best I can figure, my poor machine is getting hammered by physical memory being disk-swapped while simultaneously trying to pull up a 1Gb database. I barely have enough CPU power left over for me to telnet in from another box and kill psql! (1) I don't know what psql thinks it is doing, or why my kernel is letting it do it, but... (2) I figure I can fix things....so: I look around at some backend configuration parameters to see if I can get Postgres to do some neat memory stuff(but later realize that it was the front-end and not the backend that was eating up memory...I tried pg_dump on the database/table, and stuff started spooling right away) Rather than trying to fix the problem, I decided to subvert it by breaking my table into a bunch of little tables, each one less than my RAM size, so that I would never dig into SWAP space on a select *....(all of you who are laugh at me, you can just quit reading right now). Then I planned to re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT * UNION SELECT * UNION...etc. Then I find out that UNIONS and VIEWs aren't implemented together....(I don't see this explicitly stated on the to-do list either). Then I started digging into the source code, trying to see if the query parser was the reason that this wasn't implemented...perhaps I could help. I don't quite see where it is. Anyway, just wanted to see if all my assumptions are correct, or if anyone has a better explanation for my observation, and/or some solutions. Fred
On Thu, 29 Jun 2000 Fred_Zellinger@seagate.com wrote: This doesn't directly answer your question...but do you actually have a need to select all of a 1GB table? I've got about 1.5GB of data in a table, but I can't think of an application that would need to pull it all out in one query. # Date: Thu, 29 Jun 2000 22:26:41 -0500 # From: Fred_Zellinger@seagate.com # To: pgsql-general@hub.org # Subject: [GENERAL] Large Tables(>1 Gb) # # # (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel # 2.2.9, with libc-2.1.2 # I am running Postgres 7.0 which I compiled myself.) # # So, I created a database, a table, and started dumping data into it. Then # I added an index on the table. Life was good. # # After a few weeks, my table eclipsed approximately 1Gb, and when I looked # at it in my PG_DATA/database directory, I noticed that there were two # files: MYTABLE and MYTABLE.1. I was curious why this happened, but I # figured that Postgres must break up tables over 1Gb into multiple # files.(right?) # # Then, while running psql, I did a "select * from MYTABLE;" Well, psql just # sits there while the hard drive light blinks like crazy, pulling the table # up into memory. I have 256Mb of RAM, so this takes awhile. When I start # up "top" and watch my process table, the postgres backend is sucking up the # CPU time pulling the data and the psql frontend is sucking up the memory # accepting the results. # # So, I figured that psql must be piling everything up in a "less" like # pager. So, I kll the current request, do a "\pset pager" and toggle the # pager off. I re-run the select *, and the same thing happens. # # This time however, I let everything run until my memory taken up by the # psql process goes over 256Mb, which means that my system RAM is all used # up. Then, my whole machine kinda locks up. My load average hits 5(!) and # psql starts taking up well over 300Mb. I am also running X. As best I can # figure, my poor machine is getting hammered by physical memory being # disk-swapped while simultaneously trying to pull up a 1Gb database. I # barely have enough CPU power left over for me to telnet in from another box # and kill psql! # # (1) I don't know what psql thinks it is doing, or why my kernel is letting # it do it, but... # (2) I figure I can fix things....so: # # I look around at some backend configuration parameters to see if I can get # Postgres to do some neat memory stuff(but later realize that it was the # front-end and not the backend that was eating up memory...I tried pg_dump # on the database/table, and stuff started spooling right away) # # Rather than trying to fix the problem, I decided to subvert it by breaking # my table into a bunch of little tables, each one less than my RAM size, so # that I would never dig into SWAP space on a select *....(all of you who are # laugh at me, you can just quit reading right now). Then I planned to # re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT * # UNION SELECT * UNION...etc. Then I find out that UNIONS and VIEWs aren't # implemented together....(I don't see this explicitly stated on the to-do # list either). # # Then I started digging into the source code, trying to see if the query # parser was the reason that this wasn't implemented...perhaps I could help. # I don't quite see where it is. # # # Anyway, just wanted to see if all my assumptions are correct, or if anyone # has a better explanation for my observation, and/or some solutions. # # # Fred # # # # # -- dustin sallings The world is watching America, http://2852210114/~dustin/ and America is watching TV.
On Thu, 29 Jun 2000 Fred_Zellinger@seagate.com wrote: > I look around at some backend configuration parameters to see if I can get > Postgres to do some neat memory stuff(but later realize that it was the > front-end and not the backend that was eating up memory...I tried pg_dump > on the database/table, and stuff started spooling right away) > Rather than trying to fix the problem, I decided to subvert it by breaking > my table into a bunch of little tables, each one less than my RAM size, so > that I would never dig into SWAP space on a select *....(all of you who are > laugh at me, you can just quit reading right now). *stops laughing* ;-) > Anyway, just wanted to see if all my assumptions are correct, or if anyone > has a better explanation for my observation, and/or some solutions. If you want to SELECT 1GB of data into RAM, you ought to have over 1GB of RAM, don't you think? What exactly is the problem you're trying to fix? - Andrew
Hello, > After a few weeks, my table eclipsed approximately 1Gb, and when I looked > at it in my PG_DATA/database directory, I noticed that there were two > files: MYTABLE and MYTABLE.1. I was curious why this happened, but I > figured that Postgres must break up tables over 1Gb into multiple > files.(right?) Yeps. > Then, while running psql, I did a "select * from MYTABLE;" Well, psql just > sits there while the hard drive light blinks like crazy, pulling the table > up into memory. I have 256Mb of RAM, so this takes awhile. When I start > up "top" and watch my process table, the postgres backend is sucking up the > CPU time pulling the data and the psql frontend is sucking up the memory > accepting the results. It's OK. The problem is that postgres try to combine fukk answer for your request in memory. And for sure this is a little bit hard for him. You just have not enough memory for such games... But if you would like to do such things you can do them. You should consider to use either of 2 possible solutions: 1. Use cursors. This is the most natural way to do this. You just should create cursor and then fetch data by some amount of tuples. Something like: declare my_cursor cursor for select * from big_table; fetch 1000; fetch 1000; close my_cursor; 2. Use limit & offset capability of postgres. select * from big_table limit 1000 offset 0; select * from big_table limit 1000 offset 1000; ... > So, I figured that psql must be piling everything up in a "less" like > pager. So, I kll the current request, do a "\pset pager" and toggle the > pager off. I re-run the select *, and the same thing happens. > > This time however, I let everything run until my memory taken up by the > psql process goes over 256Mb, which means that my system RAM is all used > up. Then, my whole machine kinda locks up. My load average hits 5(!) and > psql starts taking up well over 300Mb. I am also running X. As best I can > figure, my poor machine is getting hammered by physical memory being > disk-swapped while simultaneously trying to pull up a 1Gb database. I > barely have enough CPU power left over for me to telnet in from another box > and kill psql! > > (1) I don't know what psql thinks it is doing, or why my kernel is letting > it do it, but... > (2) I figure I can fix things....so: > > I look around at some backend configuration parameters to see if I can get > Postgres to do some neat memory stuff(but later realize that it was the > front-end and not the backend that was eating up memory...I tried pg_dump > on the database/table, and stuff started spooling right away) > > Rather than trying to fix the problem, I decided to subvert it by breaking > my table into a bunch of little tables, each one less than my RAM size, so > that I would never dig into SWAP space on a select *....(all of you who are > laugh at me, you can just quit reading right now). Then I planned to > re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT * > UNION SELECT * UNION...etc. Then I find out that UNIONS and VIEWs aren't > implemented together....(I don't see this explicitly stated on the to-do > list either). > > Then I started digging into the source code, trying to see if the query > parser was the reason that this wasn't implemented...perhaps I could help. > I don't quite see where it is. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Fred_Zellinger@seagate.com writes: > After a few weeks, my table eclipsed approximately 1Gb, and when I looked > at it in my PG_DATA/database directory, I noticed that there were two > files: MYTABLE and MYTABLE.1. I was curious why this happened, but I > figured that Postgres must break up tables over 1Gb into multiple > files.(right?) Check. It's to work around OSes that don't handle large files. > Then, while running psql, I did a "select * from MYTABLE;" Well, psql just > sits there while the hard drive light blinks like crazy, pulling the table > up into memory. I have 256Mb of RAM, so this takes awhile. When I start > up "top" and watch my process table, the postgres backend is sucking up the > CPU time pulling the data and the psql frontend is sucking up the memory > accepting the results. Yeah. libpq has this nifty little API that provides random access to a query result set --- so it wants to suck the entire result set into the client application's RAM before it will let the app have any of it. Actually, there are error-handling reasons for doing it that way too. But anyway the point is that that client-side API is not well designed for huge result sets. It's not a backend problem. The usual workaround is to use DECLARE CURSOR and FETCH to grab the result in bite-size chunks, like a few hundred or thousand rows at a time. Sooner or later someone will probably extend libpq to offer some kind of "streaming" API for scanning through large result sets without buffering them in client RAM. Doesn't seem to have gotten to the top of anyone's TODO list yet though... the FETCH solution works well enough to keep the annoyance level down... regards, tom lane
Fred_Zellinger@seagate.com wrote: > (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel > 2.2.9, with libc-2.1.2 > I am running Postgres 7.0 which I compiled myself.) > > So, I created a database, a table, and started dumping data into it. Then > I added an index on the table. Life was good. > > After a few weeks, my table eclipsed approximately 1Gb, and when I looked > at it in my PG_DATA/database directory, I noticed that there were two > files: MYTABLE and MYTABLE.1. I was curious why this happened, but I > figured that Postgres must break up tables over 1Gb into multiple > files.(right?) > > Then, while running psql, I did a "select * from MYTABLE;" Well, psql just > sits there while the hard drive light blinks like crazy, pulling the table > up into memory. I have 256Mb of RAM, so this takes awhile. When I start > up "top" and watch my process table, the postgres backend is sucking up the > CPU time pulling the data and the psql frontend is sucking up the memory > accepting the results. > > Fred Okay, I didn't laugh the entire time... I suggest you take a look at cursors. I have the same thing. There are times I will need to select my entire >2Gig table but instead of doing: SELECT * FROM table ; I do DECLARE tmp CURSOR FOR SELECT * FROM table ; do { FETCH 100 FORWARD FROM tmp ; } while there are rows left. This only pulls 100 (or whatever number you specify) into memory at a time. Jeff
You could also use LIMIT and OFFSET.. That's what I do (though my database isn't to a gigabyte yet).. Maybe using a CURSOR is better, I'm not sure... -Mitch ----- Original Message ----- From: Jeffery Collins <collins@onyx-technologies.com> To: <Fred_Zellinger@seagate.com> Cc: <pgsql-general@hub.org> Sent: Friday, June 30, 2000 8:47 AM Subject: Re: [GENERAL] Large Tables(>1 Gb) > Fred_Zellinger@seagate.com wrote: > > > (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel > > 2.2.9, with libc-2.1.2 > > I am running Postgres 7.0 which I compiled myself.) > > > > So, I created a database, a table, and started dumping data into it. Then > > I added an index on the table. Life was good. > > > > After a few weeks, my table eclipsed approximately 1Gb, and when I looked > > at it in my PG_DATA/database directory, I noticed that there were two > > files: MYTABLE and MYTABLE.1. I was curious why this happened, but I > > figured that Postgres must break up tables over 1Gb into multiple > > files.(right?) > > > > Then, while running psql, I did a "select * from MYTABLE;" Well, psql just > > sits there while the hard drive light blinks like crazy, pulling the table > > up into memory. I have 256Mb of RAM, so this takes awhile. When I start > > up "top" and watch my process table, the postgres backend is sucking up the > > CPU time pulling the data and the psql frontend is sucking up the memory > > accepting the results. > > > > Fred > > Okay, I didn't laugh the entire time... > > I suggest you take a look at cursors. I have the same thing. There are times > I will need to select my entire >2Gig table but instead of doing: > > SELECT * FROM table ; > > I do > > DECLARE tmp CURSOR FOR SELECT * FROM table ; > > do { > FETCH 100 FORWARD FROM tmp ; > } while there are rows left. > > This only pulls 100 (or whatever number you specify) into memory at a time. > > Jeff > > >
we use cursors and they perform well for us for selects. our largest table is just over 7.5g containing 38mil+ rows...but we have a lot of tables over 1 gig... mikeo At 10:19 AM 6/30/00 -0400, Mitch Vincent wrote: >You could also use LIMIT and OFFSET.. That's what I do (though my database >isn't to a gigabyte yet).. > >Maybe using a CURSOR is better, I'm not sure... > >-Mitch > >----- Original Message ----- >From: Jeffery Collins <collins@onyx-technologies.com> >To: <Fred_Zellinger@seagate.com> >Cc: <pgsql-general@hub.org> >Sent: Friday, June 30, 2000 8:47 AM >Subject: Re: [GENERAL] Large Tables(>1 Gb) > > >> Fred_Zellinger@seagate.com wrote: >> >> > (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel >> > 2.2.9, with libc-2.1.2 >> > I am running Postgres 7.0 which I compiled myself.) >> > >> > So, I created a database, a table, and started dumping data into it. >Then >> > I added an index on the table. Life was good. >> > >> > After a few weeks, my table eclipsed approximately 1Gb, and when I >looked >> > at it in my PG_DATA/database directory, I noticed that there were two >> > files: MYTABLE and MYTABLE.1. I was curious why this happened, but I >> > figured that Postgres must break up tables over 1Gb into multiple >> > files.(right?) >> > >> > Then, while running psql, I did a "select * from MYTABLE;" Well, psql >just >> > sits there while the hard drive light blinks like crazy, pulling the >table >> > up into memory. I have 256Mb of RAM, so this takes awhile. When I >start >> > up "top" and watch my process table, the postgres backend is sucking up >the >> > CPU time pulling the data and the psql frontend is sucking up the memory >> > accepting the results. >> > >> > Fred >> >> Okay, I didn't laugh the entire time... >> >> I suggest you take a look at cursors. I have the same thing. There are >times >> I will need to select my entire >2Gig table but instead of doing: >> >> SELECT * FROM table ; >> >> I do >> >> DECLARE tmp CURSOR FOR SELECT * FROM table ; >> >> do { >> FETCH 100 FORWARD FROM tmp ; >> } while there are rows left. >> >> This only pulls 100 (or whatever number you specify) into memory at a >time. >> >> Jeff >> >> >> >
Denis Perchine <dyp@perchine.com> writes: > 2. Use limit & offset capability of postgres. > select * from big_table limit 1000 offset 0; > select * from big_table limit 1000 offset 1000; This is a risky way to do it --- the Postgres optimizer considers limit/offset when choosing a plan, and is quite capable of choosing different plans that yield different tuple orderings depending on the size of the offset+limit. For a plain SELECT as above you'd probably be safe enough, but in more complex cases such as having potentially- indexable WHERE clauses you'll very likely get bitten, unless you have an ORDER BY clause to guarantee a unique tuple ordering. Another advantage of FETCH is that you get a consistent result set even if other backends are modifying the table, since it all happens within one transaction. regards, tom lane
Thanks for all the great responses on this(doing select * from large tables and hanging psql). Here is what I have: --- psql uses libpq which tries to load everything into memory before spooling it. --- use cursors to FETCH selected amount of rows and then spool those. --- use "select * from big_table limit 1000 offset 0;" for simple queries. Sometimes you want to do a simple select * from mytable just to get a look at the data, but you don't care which data. I am about to go take my multiple broken up tables and dump them back into one table(and then shut off all those BASH shell scripts I wrote which checked the system date and created new monthly tables if needed...good scripting practice but a waste of time). However, there is still something bugging me. Even though many people related stories of 7.5 Gb+ Dbs, I still can't make that little voice in me quit saying "breaking things into smaller chunks means faster work" theories. There must exist a relationship between file sizes and DB performance. This relationship can be broken into 3 parts: 1. How the hardware is arranged to pull in large files(fragmentation, partitions, etc) 2. How the underlying OS deals with large files 3. How Postgres deals with(or is affected by) large files. I imagine that the first two are the dominant factors in the relationship, but does anyone have any experience with how small/removed of a factor the internals Postgres are? Are there any internal coding concerns that have had to deal with this(like the one mentioned about files being split at about 1Gb)? (Curious) Fred
Fred_Zellinger@seagate.com wrote: > However, there is still something bugging me. Even though many people > related stories of 7.5 Gb+ Dbs, I still can't make that little voice in me > quit saying "breaking things into smaller chunks means faster work" > theories. > > There must exist a relationship between file sizes and DB performance. If your data doesn't completely fit into main memory, at least some of it will have to be saved off-line somewhere. Your question is: should the off-line portion be split into more than one file to speed performance? I won't try to be precise here. There are good textbooks on the subject if your interested. I've just been reading one, actually, but it's at home and I don't remember the name :( Knuth would of course be good reading on the subject. Maybe think of it this way: what's the difference between one file and two, really? You've basically just got a bunch of bits on a block device, either way. By saving your data to a single file, you have more control of the data layout, so you can organize it in the manner most appropriate to your needs. ________________________ Ron Peterson rpeterson@yellowbank.com
You should probably be looking into cursors if you're attempting to grab a 1Gb result set, otherwise the system is going to try to pass the entire result set to the front end in one big lump, which is what you're probably seeing. I haven't played with them really, but probably something like... begin; declare testcursor cursor for select * from MYTABLE; fetch 100 in testcursor; <fetch repeated until you stop getting results> close testcursor; end; might work better. Stephan Szabo sszabo@bigpanda.com On Thu, 29 Jun 2000 Fred_Zellinger@seagate.com wrote: > > (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel > 2.2.9, with libc-2.1.2 > I am running Postgres 7.0 which I compiled myself.) > > So, I created a database, a table, and started dumping data into it. Then > I added an index on the table. Life was good. > > After a few weeks, my table eclipsed approximately 1Gb, and when I looked > at it in my PG_DATA/database directory, I noticed that there were two > files: MYTABLE and MYTABLE.1. I was curious why this happened, but I > figured that Postgres must break up tables over 1Gb into multiple > files.(right?) > > Then, while running psql, I did a "select * from MYTABLE;" Well, psql just > sits there while the hard drive light blinks like crazy, pulling the table > up into memory. I have 256Mb of RAM, so this takes awhile. When I start > up "top" and watch my process table, the postgres backend is sucking up the > CPU time pulling the data and the psql frontend is sucking up the memory > accepting the results. > > So, I figured that psql must be piling everything up in a "less" like > pager. So, I kll the current request, do a "\pset pager" and toggle the > pager off. I re-run the select *, and the same thing happens. > > This time however, I let everything run until my memory taken up by the > psql process goes over 256Mb, which means that my system RAM is all used > up. Then, my whole machine kinda locks up. My load average hits 5(!) and > psql starts taking up well over 300Mb. I am also running X. As best I can > figure, my poor machine is getting hammered by physical memory being > disk-swapped while simultaneously trying to pull up a 1Gb database. I > barely have enough CPU power left over for me to telnet in from another box > and kill psql! > > (1) I don't know what psql thinks it is doing, or why my kernel is letting > it do it, but... > (2) I figure I can fix things....so: > > I look around at some backend configuration parameters to see if I can get > Postgres to do some neat memory stuff(but later realize that it was the > front-end and not the backend that was eating up memory...I tried pg_dump > on the database/table, and stuff started spooling right away) > > Rather than trying to fix the problem, I decided to subvert it by breaking > my table into a bunch of little tables, each one less than my RAM size, so > that I would never dig into SWAP space on a select *....(all of you who are > laugh at me, you can just quit reading right now). Then I planned to > re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT * > UNION SELECT * UNION...etc. Then I find out that UNIONS and VIEWs aren't > implemented together....(I don't see this explicitly stated on the to-do > list either). > > Then I started digging into the source code, trying to see if the query > parser was the reason that this wasn't implemented...perhaps I could help. > I don't quite see where it is. > > > Anyway, just wanted to see if all my assumptions are correct, or if anyone > has a better explanation for my observation, and/or some solutions. > > > Fred > > > >