Thread: Large Tables(>1 Gb)

Large Tables(>1 Gb)

From
Fred_Zellinger@seagate.com
Date:
(I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel
2.2.9, with libc-2.1.2
I am running Postgres 7.0 which I compiled myself.)

So, I created a database, a table, and started dumping data into it.  Then
I added an index on the table.  Life was good.

After a few weeks, my table eclipsed approximately 1Gb, and when I looked
at it in my PG_DATA/database directory, I noticed that there were two
files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
figured that Postgres must break up tables over 1Gb into multiple
files.(right?)

Then, while running psql, I did a "select * from MYTABLE;"  Well, psql just
sits there while the hard drive light blinks like crazy, pulling the table
up into memory.  I have 256Mb of RAM, so this takes awhile.  When I start
up "top" and watch my process table, the postgres backend is sucking up the
CPU time pulling the data and the psql frontend is sucking up the memory
accepting the results.

So, I figured that psql must be piling everything up in a "less" like
pager.  So, I kll the current request, do a "\pset pager" and toggle the
pager off.  I re-run the select *, and the same thing happens.

This time however, I let everything run until my memory taken up by the
psql process goes over 256Mb, which means that my system RAM is all used
up.  Then, my whole machine kinda locks up.  My load average hits 5(!) and
psql starts taking up well over 300Mb.  I am also running X.  As best I can
figure, my poor machine is getting hammered by physical memory being
disk-swapped while simultaneously trying to pull up a 1Gb database.  I
barely have enough CPU power left over for me to telnet in from another box
and kill psql!

(1)  I don't know what psql thinks it is doing, or why my kernel is letting
it do it, but...
(2)  I figure I can fix things....so:

I look around at some backend configuration parameters to see if I can get
Postgres to do some neat memory stuff(but later realize that it was the
front-end and not the backend that was eating up memory...I tried pg_dump
on the database/table, and stuff started spooling right away)

Rather than trying to fix the problem, I decided to subvert it by breaking
my table into a bunch of little tables, each one less than my RAM size, so
that I would never dig into SWAP space on a select *....(all of you who are
laugh at me, you can just quit reading right now).  Then I planned to
re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT *
UNION SELECT * UNION...etc.  Then I find out that UNIONS and VIEWs aren't
implemented together....(I don't see this explicitly stated on the to-do
list either).

Then I started digging into the source code, trying to see if the query
parser was the reason that this wasn't implemented...perhaps I could help.
I don't quite see where it is.


Anyway, just wanted to see if all my assumptions are correct, or if anyone
has a better explanation for my observation, and/or some solutions.


Fred





Re: Large Tables(>1 Gb)

From
Dustin Sallings
Date:
On Thu, 29 Jun 2000 Fred_Zellinger@seagate.com wrote:

    This doesn't directly answer your question...but do you actually
have a need to select all of a 1GB table?  I've got about 1.5GB of data in
a table, but I can't think of an application that would need to pull it
all out in one query.

# Date: Thu, 29 Jun 2000 22:26:41 -0500
# From: Fred_Zellinger@seagate.com
# To: pgsql-general@hub.org
# Subject: [GENERAL] Large Tables(>1 Gb)
#
#
# (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel
# 2.2.9, with libc-2.1.2
# I am running Postgres 7.0 which I compiled myself.)
#
# So, I created a database, a table, and started dumping data into it.  Then
# I added an index on the table.  Life was good.
#
# After a few weeks, my table eclipsed approximately 1Gb, and when I looked
# at it in my PG_DATA/database directory, I noticed that there were two
# files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
# figured that Postgres must break up tables over 1Gb into multiple
# files.(right?)
#
# Then, while running psql, I did a "select * from MYTABLE;"  Well, psql just
# sits there while the hard drive light blinks like crazy, pulling the table
# up into memory.  I have 256Mb of RAM, so this takes awhile.  When I start
# up "top" and watch my process table, the postgres backend is sucking up the
# CPU time pulling the data and the psql frontend is sucking up the memory
# accepting the results.
#
# So, I figured that psql must be piling everything up in a "less" like
# pager.  So, I kll the current request, do a "\pset pager" and toggle the
# pager off.  I re-run the select *, and the same thing happens.
#
# This time however, I let everything run until my memory taken up by the
# psql process goes over 256Mb, which means that my system RAM is all used
# up.  Then, my whole machine kinda locks up.  My load average hits 5(!) and
# psql starts taking up well over 300Mb.  I am also running X.  As best I can
# figure, my poor machine is getting hammered by physical memory being
# disk-swapped while simultaneously trying to pull up a 1Gb database.  I
# barely have enough CPU power left over for me to telnet in from another box
# and kill psql!
#
# (1)  I don't know what psql thinks it is doing, or why my kernel is letting
# it do it, but...
# (2)  I figure I can fix things....so:
#
# I look around at some backend configuration parameters to see if I can get
# Postgres to do some neat memory stuff(but later realize that it was the
# front-end and not the backend that was eating up memory...I tried pg_dump
# on the database/table, and stuff started spooling right away)
#
# Rather than trying to fix the problem, I decided to subvert it by breaking
# my table into a bunch of little tables, each one less than my RAM size, so
# that I would never dig into SWAP space on a select *....(all of you who are
# laugh at me, you can just quit reading right now).  Then I planned to
# re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT *
# UNION SELECT * UNION...etc.  Then I find out that UNIONS and VIEWs aren't
# implemented together....(I don't see this explicitly stated on the to-do
# list either).
#
# Then I started digging into the source code, trying to see if the query
# parser was the reason that this wasn't implemented...perhaps I could help.
# I don't quite see where it is.
#
#
# Anyway, just wanted to see if all my assumptions are correct, or if anyone
# has a better explanation for my observation, and/or some solutions.
#
#
# Fred
#
#
#
#
#

--
dustin sallings                            The world is watching America,
http://2852210114/~dustin/                 and America is watching TV.


Re: Large Tables(>1 Gb)

From
Andrew Snow
Date:

On Thu, 29 Jun 2000 Fred_Zellinger@seagate.com wrote:

> I look around at some backend configuration parameters to see if I can get
> Postgres to do some neat memory stuff(but later realize that it was the
> front-end and not the backend that was eating up memory...I tried pg_dump
> on the database/table, and stuff started spooling right away)

> Rather than trying to fix the problem, I decided to subvert it by breaking
> my table into a bunch of little tables, each one less than my RAM size, so
> that I would never dig into SWAP space on a select *....(all of you who are
> laugh at me, you can just quit reading right now).

*stops laughing* ;-)

> Anyway, just wanted to see if all my assumptions are correct, or if anyone
> has a better explanation for my observation, and/or some solutions.

If you want to SELECT 1GB of data into RAM, you ought to have over 1GB of
RAM, don't you think?

What exactly is the problem you're trying to fix?


- Andrew



Re: Large Tables(>1 Gb)

From
Denis Perchine
Date:
Hello,

> After a few weeks, my table eclipsed approximately 1Gb, and when I looked
> at it in my PG_DATA/database directory, I noticed that there were two
> files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
> figured that Postgres must break up tables over 1Gb into multiple
> files.(right?)

Yeps.

> Then, while running psql, I did a "select * from MYTABLE;"  Well, psql just
> sits there while the hard drive light blinks like crazy, pulling the table
> up into memory.  I have 256Mb of RAM, so this takes awhile.  When I start
> up "top" and watch my process table, the postgres backend is sucking up the
> CPU time pulling the data and the psql frontend is sucking up the memory
> accepting the results.

It's OK. The problem is that postgres try to combine fukk answer for your
request in memory. And for sure this is a little bit hard for him. You just have not
enough memory for such games...

But if you would like to do such things you can do them. You should consider to use
either of 2 possible solutions:
1. Use cursors. This is the most natural way to do this. You just should create cursor
and then fetch data by some amount of tuples. Something like:
declare my_cursor cursor for select * from big_table;
fetch 1000;
fetch 1000;
close my_cursor;

2. Use limit & offset capability of postgres.

select * from big_table limit 1000 offset 0;
select * from big_table limit 1000 offset 1000;
...

> So, I figured that psql must be piling everything up in a "less" like
> pager.  So, I kll the current request, do a "\pset pager" and toggle the
> pager off.  I re-run the select *, and the same thing happens.
>
> This time however, I let everything run until my memory taken up by the
> psql process goes over 256Mb, which means that my system RAM is all used
> up.  Then, my whole machine kinda locks up.  My load average hits 5(!) and
> psql starts taking up well over 300Mb.  I am also running X.  As best I can
> figure, my poor machine is getting hammered by physical memory being
> disk-swapped while simultaneously trying to pull up a 1Gb database.  I
> barely have enough CPU power left over for me to telnet in from another box
> and kill psql!
>
> (1)  I don't know what psql thinks it is doing, or why my kernel is letting
> it do it, but...
> (2)  I figure I can fix things....so:
>
> I look around at some backend configuration parameters to see if I can get
> Postgres to do some neat memory stuff(but later realize that it was the
> front-end and not the backend that was eating up memory...I tried pg_dump
> on the database/table, and stuff started spooling right away)
>
> Rather than trying to fix the problem, I decided to subvert it by breaking
> my table into a bunch of little tables, each one less than my RAM size, so
> that I would never dig into SWAP space on a select *....(all of you who are
> laugh at me, you can just quit reading right now).  Then I planned to
> re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT *
> UNION SELECT * UNION...etc.  Then I find out that UNIONS and VIEWs aren't
> implemented together....(I don't see this explicitly stated on the to-do
> list either).
>
> Then I started digging into the source code, trying to see if the query
> parser was the reason that this wasn't implemented...perhaps I could help.
> I don't quite see where it is.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Large Tables(>1 Gb)

From
Tom Lane
Date:
Fred_Zellinger@seagate.com writes:
> After a few weeks, my table eclipsed approximately 1Gb, and when I looked
> at it in my PG_DATA/database directory, I noticed that there were two
> files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
> figured that Postgres must break up tables over 1Gb into multiple
> files.(right?)

Check.  It's to work around OSes that don't handle large files.

> Then, while running psql, I did a "select * from MYTABLE;"  Well, psql just
> sits there while the hard drive light blinks like crazy, pulling the table
> up into memory.  I have 256Mb of RAM, so this takes awhile.  When I start
> up "top" and watch my process table, the postgres backend is sucking up the
> CPU time pulling the data and the psql frontend is sucking up the memory
> accepting the results.

Yeah.  libpq has this nifty little API that provides random access to
a query result set --- so it wants to suck the entire result set into
the client application's RAM before it will let the app have any of it.
Actually, there are error-handling reasons for doing it that way too.
But anyway the point is that that client-side API is not well designed
for huge result sets.  It's not a backend problem.

The usual workaround is to use DECLARE CURSOR and FETCH to grab the
result in bite-size chunks, like a few hundred or thousand rows at
a time.

Sooner or later someone will probably extend libpq to offer some kind
of "streaming" API for scanning through large result sets without
buffering them in client RAM.  Doesn't seem to have gotten to the top
of anyone's TODO list yet though... the FETCH solution works well
enough to keep the annoyance level down...

            regards, tom lane

Re: Large Tables(>1 Gb)

From
Jeffery Collins
Date:
Fred_Zellinger@seagate.com wrote:

> (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel
> 2.2.9, with libc-2.1.2
> I am running Postgres 7.0 which I compiled myself.)
>
> So, I created a database, a table, and started dumping data into it.  Then
> I added an index on the table.  Life was good.
>
> After a few weeks, my table eclipsed approximately 1Gb, and when I looked
> at it in my PG_DATA/database directory, I noticed that there were two
> files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
> figured that Postgres must break up tables over 1Gb into multiple
> files.(right?)
>
> Then, while running psql, I did a "select * from MYTABLE;"  Well, psql just
> sits there while the hard drive light blinks like crazy, pulling the table
> up into memory.  I have 256Mb of RAM, so this takes awhile.  When I start
> up "top" and watch my process table, the postgres backend is sucking up the
> CPU time pulling the data and the psql frontend is sucking up the memory
> accepting the results.
>
> Fred

Okay, I didn't laugh the entire time...

I suggest you take a look at cursors.  I have the same thing.  There are times
I will need to select my entire >2Gig table but instead of doing:

    SELECT * FROM table ;

I do

    DECLARE tmp CURSOR FOR SELECT * FROM table ;

    do {
        FETCH 100 FORWARD FROM tmp ;
     } while there are rows left.

This only pulls 100 (or whatever number you specify) into memory at a time.

Jeff



Re: Large Tables(>1 Gb)

From
"Mitch Vincent"
Date:
You could also use LIMIT and OFFSET.. That's what I do (though my database
isn't to a gigabyte yet)..

Maybe using a CURSOR  is better, I'm not sure...

-Mitch

----- Original Message -----
From: Jeffery Collins <collins@onyx-technologies.com>
To: <Fred_Zellinger@seagate.com>
Cc: <pgsql-general@hub.org>
Sent: Friday, June 30, 2000 8:47 AM
Subject: Re: [GENERAL] Large Tables(>1 Gb)


> Fred_Zellinger@seagate.com wrote:
>
> > (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel
> > 2.2.9, with libc-2.1.2
> > I am running Postgres 7.0 which I compiled myself.)
> >
> > So, I created a database, a table, and started dumping data into it.
Then
> > I added an index on the table.  Life was good.
> >
> > After a few weeks, my table eclipsed approximately 1Gb, and when I
looked
> > at it in my PG_DATA/database directory, I noticed that there were two
> > files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
> > figured that Postgres must break up tables over 1Gb into multiple
> > files.(right?)
> >
> > Then, while running psql, I did a "select * from MYTABLE;"  Well, psql
just
> > sits there while the hard drive light blinks like crazy, pulling the
table
> > up into memory.  I have 256Mb of RAM, so this takes awhile.  When I
start
> > up "top" and watch my process table, the postgres backend is sucking up
the
> > CPU time pulling the data and the psql frontend is sucking up the memory
> > accepting the results.
> >
> > Fred
>
> Okay, I didn't laugh the entire time...
>
> I suggest you take a look at cursors.  I have the same thing.  There are
times
> I will need to select my entire >2Gig table but instead of doing:
>
>     SELECT * FROM table ;
>
> I do
>
>     DECLARE tmp CURSOR FOR SELECT * FROM table ;
>
>     do {
>         FETCH 100 FORWARD FROM tmp ;
>      } while there are rows left.
>
> This only pulls 100 (or whatever number you specify) into memory at a
time.
>
> Jeff
>
>
>


Re: Large Tables(>1 Gb)

From
mikeo
Date:
we use cursors and they perform well for us for
selects.
our largest table is just over 7.5g containing
38mil+ rows...but we have a lot of tables over
1 gig...

mikeo


At 10:19 AM 6/30/00 -0400, Mitch Vincent wrote:
>You could also use LIMIT and OFFSET.. That's what I do (though my database
>isn't to a gigabyte yet)..
>
>Maybe using a CURSOR  is better, I'm not sure...
>
>-Mitch
>
>----- Original Message -----
>From: Jeffery Collins <collins@onyx-technologies.com>
>To: <Fred_Zellinger@seagate.com>
>Cc: <pgsql-general@hub.org>
>Sent: Friday, June 30, 2000 8:47 AM
>Subject: Re: [GENERAL] Large Tables(>1 Gb)
>
>
>> Fred_Zellinger@seagate.com wrote:
>>
>> > (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel
>> > 2.2.9, with libc-2.1.2
>> > I am running Postgres 7.0 which I compiled myself.)
>> >
>> > So, I created a database, a table, and started dumping data into it.
>Then
>> > I added an index on the table.  Life was good.
>> >
>> > After a few weeks, my table eclipsed approximately 1Gb, and when I
>looked
>> > at it in my PG_DATA/database directory, I noticed that there were two
>> > files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
>> > figured that Postgres must break up tables over 1Gb into multiple
>> > files.(right?)
>> >
>> > Then, while running psql, I did a "select * from MYTABLE;"  Well, psql
>just
>> > sits there while the hard drive light blinks like crazy, pulling the
>table
>> > up into memory.  I have 256Mb of RAM, so this takes awhile.  When I
>start
>> > up "top" and watch my process table, the postgres backend is sucking up
>the
>> > CPU time pulling the data and the psql frontend is sucking up the memory
>> > accepting the results.
>> >
>> > Fred
>>
>> Okay, I didn't laugh the entire time...
>>
>> I suggest you take a look at cursors.  I have the same thing.  There are
>times
>> I will need to select my entire >2Gig table but instead of doing:
>>
>>     SELECT * FROM table ;
>>
>> I do
>>
>>     DECLARE tmp CURSOR FOR SELECT * FROM table ;
>>
>>     do {
>>         FETCH 100 FORWARD FROM tmp ;
>>      } while there are rows left.
>>
>> This only pulls 100 (or whatever number you specify) into memory at a
>time.
>>
>> Jeff
>>
>>
>>
>

Re: Large Tables(>1 Gb)

From
Tom Lane
Date:
Denis Perchine <dyp@perchine.com> writes:
> 2. Use limit & offset capability of postgres.

> select * from big_table limit 1000 offset 0;
> select * from big_table limit 1000 offset 1000;

This is a risky way to do it --- the Postgres optimizer considers
limit/offset when choosing a plan, and is quite capable of choosing
different plans that yield different tuple orderings depending on the
size of the offset+limit.  For a plain SELECT as above you'd probably
be safe enough, but in more complex cases such as having potentially-
indexable WHERE clauses you'll very likely get bitten, unless you have
an ORDER BY clause to guarantee a unique tuple ordering.

Another advantage of FETCH is that you get a consistent result set
even if other backends are modifying the table, since it all happens
within one transaction.

            regards, tom lane

Re: Large Tables(>1 Gb)

From
Fred_Zellinger@seagate.com
Date:

Thanks for all the great responses on this(doing select * from large tables
and hanging psql).

Here is what I have:
--- psql uses libpq which tries to load everything into memory before
spooling it.
--- use cursors to FETCH selected amount of rows and then spool those.
--- use "select * from big_table limit 1000 offset 0;" for simple queries.

Sometimes you want to do a simple select * from mytable just to get a look
at the data, but you don't care which data.

I am about to go take my multiple broken up tables and dump them back into
one table(and then shut off all those BASH shell scripts I wrote which
checked the system date and created new monthly tables if needed...good
scripting practice but a waste of time).

However, there is still something bugging me.  Even though many people
related stories of 7.5 Gb+ Dbs, I still can't make that little voice in me
quit saying "breaking things into smaller chunks means faster work"
theories.

There must exist a relationship between file sizes and DB performance.
This relationship can be broken into 3 parts:
1.  How the hardware is arranged to pull in large files(fragmentation,
partitions, etc)
2.  How the underlying OS deals with large files
3.  How Postgres deals with(or is affected by) large files.

I imagine that the first two are the dominant factors in the relationship,
but does anyone have any experience with how small/removed of a factor the
internals Postgres are?  Are there any internal coding concerns that have
had to deal with this(like the one mentioned about files being split at
about 1Gb)?

(Curious) Fred



Re: Large Tables(>1 Gb)

From
Ron Peterson
Date:
Fred_Zellinger@seagate.com wrote:

> However, there is still something bugging me.  Even though many people
> related stories of 7.5 Gb+ Dbs, I still can't make that little voice in me
> quit saying "breaking things into smaller chunks means faster work"
> theories.
>
> There must exist a relationship between file sizes and DB performance.

If your data doesn't completely fit into main memory, at least some of
it will have to be saved off-line somewhere.  Your question is: should
the off-line portion be split into more than one file to speed
performance?

I won't try to be precise here.  There are good textbooks on the subject
if your interested.  I've just been reading one, actually, but it's at
home and I don't remember the name :(  Knuth would of course be good
reading on the subject.

Maybe think of it this way: what's the difference between one file and
two, really?  You've basically just got a bunch of bits on a block
device, either way.  By saving your data to a single file, you have more
control of the data layout, so you can organize it in the manner most
appropriate to your needs.

________________________
Ron Peterson
rpeterson@yellowbank.com

Re: Large Tables(>1 Gb)

From
Stephan Szabo
Date:
You should probably be looking into cursors if you're attempting to grab
a 1Gb result set, otherwise the system is going to try to pass the entire
result set to the front end in one big lump, which is what you're probably
seeing.

I haven't played with them really, but probably something like...
begin;
declare testcursor cursor for select * from MYTABLE;
fetch 100 in testcursor;
<fetch repeated until you stop getting results>
close testcursor;
end;

might work better.

Stephan Szabo
sszabo@bigpanda.com

On Thu, 29 Jun 2000 Fred_Zellinger@seagate.com wrote:

>
> (I have a 500MHz PentIII, with 256 Mb RAM, UW SCSI, running Linux Kernel
> 2.2.9, with libc-2.1.2
> I am running Postgres 7.0 which I compiled myself.)
>
> So, I created a database, a table, and started dumping data into it.  Then
> I added an index on the table.  Life was good.
>
> After a few weeks, my table eclipsed approximately 1Gb, and when I looked
> at it in my PG_DATA/database directory, I noticed that there were two
> files:  MYTABLE and MYTABLE.1.  I was curious why this happened, but I
> figured that Postgres must break up tables over 1Gb into multiple
> files.(right?)
>
> Then, while running psql, I did a "select * from MYTABLE;"  Well, psql just
> sits there while the hard drive light blinks like crazy, pulling the table
> up into memory.  I have 256Mb of RAM, so this takes awhile.  When I start
> up "top" and watch my process table, the postgres backend is sucking up the
> CPU time pulling the data and the psql frontend is sucking up the memory
> accepting the results.
>
> So, I figured that psql must be piling everything up in a "less" like
> pager.  So, I kll the current request, do a "\pset pager" and toggle the
> pager off.  I re-run the select *, and the same thing happens.
>
> This time however, I let everything run until my memory taken up by the
> psql process goes over 256Mb, which means that my system RAM is all used
> up.  Then, my whole machine kinda locks up.  My load average hits 5(!) and
> psql starts taking up well over 300Mb.  I am also running X.  As best I can
> figure, my poor machine is getting hammered by physical memory being
> disk-swapped while simultaneously trying to pull up a 1Gb database.  I
> barely have enough CPU power left over for me to telnet in from another box
> and kill psql!
>
> (1)  I don't know what psql thinks it is doing, or why my kernel is letting
> it do it, but...
> (2)  I figure I can fix things....so:
>
> I look around at some backend configuration parameters to see if I can get
> Postgres to do some neat memory stuff(but later realize that it was the
> front-end and not the backend that was eating up memory...I tried pg_dump
> on the database/table, and stuff started spooling right away)
>
> Rather than trying to fix the problem, I decided to subvert it by breaking
> my table into a bunch of little tables, each one less than my RAM size, so
> that I would never dig into SWAP space on a select *....(all of you who are
> laugh at me, you can just quit reading right now).  Then I planned to
> re-join all of the tables in a VIEW by doing a CREATE VIEW AS SELECT *
> UNION SELECT * UNION...etc.  Then I find out that UNIONS and VIEWs aren't
> implemented together....(I don't see this explicitly stated on the to-do
> list either).
>
> Then I started digging into the source code, trying to see if the query
> parser was the reason that this wasn't implemented...perhaps I could help.
> I don't quite see where it is.
>
>
> Anyway, just wanted to see if all my assumptions are correct, or if anyone
> has a better explanation for my observation, and/or some solutions.
>
>
> Fred
>
>
>
>