Thread: Large Result and Memory Limit

Large Result and Memory Limit

From
Mike Ginsburg
Date:
Hello,
  I am working on a personnel registry that has upwards of 50,000
registrants.  Currently I am working on an export module that will
create a CSV from multiple tables.  I have managed to keep the script
(PHP) under the memory limit when creating and inserting the CSV into
the database.  The problem comes when I try to query for the data and
export it.  Memory limit is a major concern, but the query for one row
returns a result set too large and PHP fails.

I've thought about storing the data in multiple rows and then querying
one-by-one and outputting, but was hoping there was a better way.

Thanks in advance for the help.

MG

Mike Ginsburg
Collaborative Fusion, Inc.
mginsburg@collaborativefusion.com
412-422-3463 x4015


Re: Large Result and Memory Limit

From
André Volpato
Date:
Mike Ginsburg escreveu:
> Hello,
>  I am working on a personnel registry that has upwards of 50,000
> registrants.  Currently I am working on an export module that will
> create a CSV from multiple tables.  I have managed to keep the script
> (PHP) under the memory limit
okay... some info needed here.
1. memory on the DB server
2. memory_limit on php.ini
> when creating and inserting the CSV into the database.  The problem
> comes when I try to query for the data and export it.  Memory limit is
> a major concern, but the query for one row returns a result set too
> large and PHP fails.
a single row is enough to crash PHP ?

>
> I've thought about storing the data in multiple rows and then querying
> one-by-one and outputting, but was hoping there was a better way.
if you can´t raise memory_limit, I think it´s the only way.

[]´s
ACV


Re: Large Result and Memory Limit

From
Mike Ginsburg
Date:
André Volpato wrote:
> Mike Ginsburg escreveu:
>> Hello,
>>  I am working on a personnel registry that has upwards of 50,000
>> registrants.  Currently I am working on an export module that will
>> create a CSV from multiple tables.  I have managed to keep the script
>> (PHP) under the memory limit
> okay... some info needed here.
> 1. memory on the DB server
> 2. memory_limit on php.ini
PHP Memory Limit is 16M.  We're running multiple installations on a
single webserver, so memory is a concern
DB Server is separate from the webserver.
>> when creating and inserting the CSV into the database.  The problem
>> comes when I try to query for the data and export it.  Memory limit
>> is a major concern, but the query for one row returns a result set
>> too large and PHP fails.
> a single row is enough to crash PHP ?
Well the "data" field in the table (text) contains 50K lines.  It's over
30M in size for the full export.
>
>>
>> I've thought about storing the data in multiple rows and then
>> querying one-by-one and outputting, but was hoping there was a better
>> way.
> if you can´t raise memory_limit, I think it´s the only way.
I was afraid that would be the answer.
>
> []´s
> ACV
>
>
>
>
>
>
>

Mike Ginsburg
Collaborative Fusion, Inc.
mginsburg@collaborativefusion.com
412-422-3463 x4015



Re: Large Result and Memory Limit

From
Alvaro Herrera
Date:
Mike Ginsburg wrote:
> Hello,
>  I am working on a personnel registry that has upwards of 50,000
> registrants.  Currently I am working on an export module that will create a
> CSV from multiple tables.  I have managed to keep the script (PHP) under
> the memory limit when creating and inserting the CSV into the database.
> The problem comes when I try to query for the data and export it.  Memory
> limit is a major concern, but the query for one row returns a result set
> too large and PHP fails.

One row?  Wow, I didn't know PHP was that broken.

Try declaring a cursor and fetching a few rows at a time.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"No es bueno caminar con un hombre muerto"

Re: Large Result and Memory Limit

From
André Volpato
Date:
Mike Ginsburg escreveu:
> André Volpato wrote:
>> Mike Ginsburg escreveu:
>>> Hello,
>>>  I am working on a personnel registry that has upwards of 50,000
>>> registrants.  Currently I am working on an export module that will
>>> create a CSV from multiple tables.  I have managed to keep the
>>> script (PHP) under the memory limit
>> okay... some info needed here.
>> 1. memory on the DB server
>> 2. memory_limit on php.ini
> PHP Memory Limit is 16M.  We're running multiple installations on a
> single webserver, so memory is a concern
> DB Server is separate from the webserver.
>>> when creating and inserting the CSV into the database.  The problem
>>> comes when I try to query for the data and export it.  Memory limit
>>> is a major concern, but the query for one row returns a result set
>>> too large and PHP fails.
>> a single row is enough to crash PHP ?
> Well the "data" field in the table (text) contains 50K lines.  It's
> over 30M in size for the full export.

>>
>>>
>>> I've thought about storing the data in multiple rows and then
>>> querying one-by-one and outputting, but was hoping there was a
>>> better way.
>> if you can´t raise memory_limit, I think it´s the only way.
> I was afraid that would be the answer.


Well, you could try to retrieve data with substr(), say 10k lines in 5
queries.
It´s kinda ugly, but should work in this case...

[]´s
ACV



Re: Large Result and Memory Limit

From
André Volpato
Date:
Alvaro Herrera escreveu: <blockquote cite="mid:20071004200537.GB28896@alvh.no-ip.org" type="cite"><pre wrap="">Mike
Ginsburgwrote: </pre><blockquote type="cite"><pre wrap="">Hello,I am working on a personnel registry that has upwards
of50,000  
registrants.  Currently I am working on an export module that will create a
CSV from multiple tables.  I have managed to keep the script (PHP) under
the memory limit when creating and inserting the CSV into the database.
The problem comes when I try to query for the data and export it.  Memory
limit is a major concern, but the query for one row returns a result set
too large and PHP fails.   </pre></blockquote><pre wrap="">
One row?  Wow, I didn't know PHP was that broken.

Try declaring a cursor and fetching a few rows at a time.</pre></blockquote> PHP is just respecting memory_limit when
retrievingdata. <br /> In this case, a single row is about 30M, a lot more than the limit of 16M.<br /> I think cursors
wouldn´thelp anyway.<br /><br /> []´s,<br /> ACV<br /><br /> 

Re: Large Result and Memory Limit

From
"Scott Marlowe"
Date:
On 10/4/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Mike Ginsburg wrote:
> > Hello,
> >  I am working on a personnel registry that has upwards of 50,000
> > registrants.  Currently I am working on an export module that will create a
> > CSV from multiple tables.  I have managed to keep the script (PHP) under
> > the memory limit when creating and inserting the CSV into the database.
> > The problem comes when I try to query for the data and export it.  Memory
> > limit is a major concern, but the query for one row returns a result set
> > too large and PHP fails.
>
> One row?  Wow, I didn't know PHP was that broken.

No, it's not php, it's his approach that's broken.

He was saying that a single db row has a text column with 50,000 lines.

The fact that php throws and error and stop rather than running your
machine out of memory would hardly qualify as broken.  It's got a
memory limit for a reason.  Even Java I believe has one.

Re: Large Result and Memory Limit

From
Alvaro Herrera
Date:
Scott Marlowe escribió:
> On 10/4/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > Mike Ginsburg wrote:
> > > Hello,
> > >  I am working on a personnel registry that has upwards of 50,000
> > > registrants.  Currently I am working on an export module that will create a
> > > CSV from multiple tables.  I have managed to keep the script (PHP) under
> > > the memory limit when creating and inserting the CSV into the database.
> > > The problem comes when I try to query for the data and export it.  Memory
> > > limit is a major concern, but the query for one row returns a result set
> > > too large and PHP fails.
> >
> > One row?  Wow, I didn't know PHP was that broken.
>
> No, it's not php, it's his approach that's broken.

Oh, I didn't realize that's what he was saying.

Of course, you can get the thing out of the database using substring()
but really this is a matter of really poor design.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"No hay cielo posible sin hundir nuestras raíces
 en la profundidad de la tierra"                        (Malucha Pinto)

Re: Large Result and Memory Limit

From
"Scott Marlowe"
Date:
On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:

> export it.  Memory limit is a major concern, but the query for one row
> returns a result set too large and PHP fails.

If this is true, and one single db row makes php exceed its memory
limit just by returning it, you've done something very very in your
design.

You'd proably be better served using either a plain text file system
to store these things, or large objects in postgresql.

But if you're stuffing ~8 megs worth of csv text data into a single
row* you're probably not using a very relational layout of your data.
And you're losing all the advantages (checking your data for
consistency and such) that a relational db could give you.

* Note that I'm assuming a few things up there.  1: php uses about 2:1
memory to store data it's holding, roughly.  If you're set to 16 Meg
max, I'm assuming your return set is 8Meg or larger.

Re: Large Result and Memory Limit

From
Mike Ginsburg
Date:
Scott Marlowe wrote:
On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:
 
export it.  Memory limit is a major concern, but the query for one row
returns a result set too large and PHP fails.   
If this is true, and one single db row makes php exceed its memory
limit just by returning it, you've done something very very in your
design. 

This is for the export only.  Since it is an export of ~50,000 registrants, it takes some time to process.  We also have load balanced web servers, so unless I want to create identical processes on all webservers, or write some crazy script to scp it across the board, storing it as a text file is not an option.  I realize that my way of doing it is flawed, which the reason I came here for advice.  The CSV contains data from approximately 15 tables, several of which are many-to-ones making joins a little tricky.  My thought was to do all of the processing in the background, store the results in the DB, and allowing the requester to download it at their convenience.

Would it be a good idea to create a temporary table that stored all of the export data in it broken out by rows and columns, and when download time comes, query from their?

You'd proably be better served using either a plain text file system
to store these things, or large objects in postgresql.

But if you're stuffing ~8 megs worth of csv text data into a single
row* you're probably not using a very relational layout of your data.
And you're losing all the advantages (checking your data for
consistency and such) that a relational db could give you.

* Note that I'm assuming a few things up there.  1: php uses about 2:1
memory to store data it's holding, roughly.  If you're set to 16 Meg
max, I'm assuming your return set is 8Meg or larger.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
              http://archives.postgresql.org/





 

Mike Ginsburg
Collaborative Fusion, Inc.
mginsburg@collaborativefusion.com
412-422-3463 x4015

Re: Large Result and Memory Limit

From
"Scott Marlowe"
Date:
On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:

>  This is for the export only.  Since it is an export of ~50,000 registrants,
> it takes some time to process.  We also have load balanced web servers, so
> unless I want to create identical processes on all webservers, or write some
> crazy script to scp it across the board, storing it as a text file is not an
> option.  I realize that my way of doing it is flawed, which the reason I
> came here for advice.  The CSV contains data from approximately 15 tables,
> several of which are many-to-ones making joins a little tricky.  My thought
> was to do all of the processing in the background, store the results in the
> DB, and allowing the requester to download it at their convenience.
>
>  Would it be a good idea to create a temporary table that stored all of the
> export data in it broken out by rows and columns, and when download time
> comes, query from their?

Yeah, I tend to think that would be better.  Then you could use a
cursor to retrieve then and serve them one line at a time and not have
to worry about overloading your php server.

Re: Large Result and Memory Limit

From
Bill Moran
Date:
In response to "Scott Marlowe" <scott.marlowe@gmail.com>:

> On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:
>
> > export it.  Memory limit is a major concern, but the query for one row
> > returns a result set too large and PHP fails.
>
> If this is true, and one single db row makes php exceed its memory
> limit just by returning it, you've done something very very in your
> design.

I work with Mike, and I'm not sure there's something "very wrong" in
the design.

The problem we're looking at is an export.  This big TEXT field is
nothing more than a staging area.  The export itself can take several
hours to run, so we background the process, and store the result in
a big TEXT field (it ends up being a CSV).  Later, the user can log
in and download the exported file via the web interface.  (But it's
unlikely that anyone is going to click "export", then wait 2 hours
for their browser to refresh :)

The process is, web interface -> request export -> get an estimate
on completion time -> come back later and check the status -> download
if complete.

It's difficult (maybe impossible) to materialize the data on a schedule,
since the user has a number of options how how to export the data
(filters, etc) so we chose to do it on-demand.

> You'd proably be better served using either a plain text file system
> to store these things, or large objects in postgresql.

We have multiple web servers with a load balancer, so saving the
result to a file doesn't really work.  We could put shared storage
in place, but I'm still not convinced that's the best fix for this.

Large objects are one option we were considering.  The problem is
they don't work with Slony, and our redundancy relies on Slony.  Granted,
we might want to just make this a non-replicated table, since it's
only a staging area anyway, but we're looking for a better solution.

My (specific) question is whether or not anyone has experimented with
putting something like a CSV file in a table with one tuple for each
row, and compared performance, etc to putting it in a large object?

> But if you're stuffing ~8 megs worth of csv text data into a single
> row* you're probably not using a very relational layout of your data.
> And you're losing all the advantages (checking your data for
> consistency and such) that a relational db could give you.

Actually, we _are_ using lots of tables with lots of relations and
foreign keys and triggers and stored procedures and all sorts of other
stuff.  That's why it takes multiple hours to "flatten" everything (is
there a better term to describe the process of turning relational data
into a single flat file?)

> * Note that I'm assuming a few things up there.  1: php uses about 2:1
> memory to store data it's holding, roughly.  If you're set to 16 Meg
> max, I'm assuming your return set is 8Meg or larger.

I'm not sure, but that's probably correct.

--
Bill Moran
http://www.potentialtech.com

Re: Large Result and Memory Limit

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill Moran wrote:
> In response to "Scott Marlowe" <scott.marlowe@gmail.com>:

> The process is, web interface -> request export -> get an estimate
> on completion time -> come back later and check the status -> download
> if complete.
>
> It's difficult (maybe impossible) to materialize the data on a schedule,
> since the user has a number of options how how to export the data
> (filters, etc) so we chose to do it on-demand.

Export to large object, not text.

Joshua D. Drake

- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBVbBATb/zqfZUUQRAivqAKCSiEcdssuENH/MBZy8L/SV7jP2igCdFVb5
VEevi+031549JN8WvXrHFnI=
=o5hR
-----END PGP SIGNATURE-----