Thread: large resultset

large resultset

From
AI Rumman
Date:
How to return large resutlset (almost 2 millions record) in php?

Re: large resultset

From
Thom Brown
Date:
On 15 June 2010 09:31, AI Rumman <rummandba@gmail.com> wrote:
How to return large resutlset (almost 2 millions record) in php?

Presumably this is so people can download results rather than display them in a browser?

Here's a basic a raw version of what you can do:

$db = pg_connect($connection_string);

$results = pg_query($db, $query);

while ($result = pg_fetch_array($results)
{
    echo implode("|", $result) . "\n";
}

pg_close($db);

Obviously that wouldn't do by itself, but it's quite simple to loop over a result set.

Regards

Thom

Re: large resultset

From
vinny
Date:
On Tue, 15 Jun 2010 10:33:05 +0100, Thom Brown <thombrown@gmail.com>
wrote:
> On 15 June 2010 09:31, AI Rumman  wrote:
>  How to return large resutlset (almost 2 millions record) in php?
>
> Presumably this is so people can download results rather than display
> them in a browser?
>
> Here's a basic a raw version of what you can do:
>
> $db = pg_connect($connection_string);
>
> $results = pg_query($db, $query);
>
> while ($result = pg_fetch_array($results)
> {
>     echo implode("|", $result) . "n";
> }
>
> pg_close($db);
>
> Obviously that wouldn't do by itself, but it's quite simple to loop over
> a result set.
>
> Regards
>
> Thom
>

If that's the case surely you'd use some SQL to merge the data into one
long string and return it in a single record.
Looping over two million results is going to take a while, not to mention
quite a bit of memory.
v.


Re: large resultset

From
Thom Brown
Date:
On 15 June 2010 10:40, vinny <vinny@xs4all.nl> wrote:

On Tue, 15 Jun 2010 10:33:05 +0100, Thom Brown <thombrown@gmail.com>
wrote:
> On 15 June 2010 09:31, AI Rumman  wrote:
>  How to return large resutlset (almost 2 millions record) in php?
>
> Presumably this is so people can download results rather than display
> them in a browser?
>
> Here's a basic a raw version of what you can do:
>
> $db = pg_connect($connection_string);
>
> $results = pg_query($db, $query);
>
> while ($result = pg_fetch_array($results)
> {
>     echo implode("|", $result) . "\n"; //typo before
> }
>
> pg_close($db);
>
> Obviously that wouldn't do by itself, but it's quite simple to loop over
> a result set.
>
> Regards
>
> Thom
>

If that's the case surely you'd use some SQL to merge the data into one
long string and return it in a single record.
Looping over two million results is going to take a while, not to mention
quite a bit of memory.
v.


I don't understand.  Return all data as a single massive string?

Thom

Re: large resultset

From
AI Rumman
Date:
No. I need to send 2 million records. I want to know what is the best possible way to send these records?
HOw should I write the plpgsql procedure to send record ony by one to improve the response time to the users?

On Tue, Jun 15, 2010 at 3:43 PM, Thom Brown <thombrown@gmail.com> wrote:
On 15 June 2010 10:40, vinny <vinny@xs4all.nl> wrote:

On Tue, 15 Jun 2010 10:33:05 +0100, Thom Brown <thombrown@gmail.com>
wrote:
> On 15 June 2010 09:31, AI Rumman  wrote:
>  How to return large resutlset (almost 2 millions record) in php?
>
> Presumably this is so people can download results rather than display
> them in a browser?
>
> Here's a basic a raw version of what you can do:
>
> $db = pg_connect($connection_string);
>
> $results = pg_query($db, $query);
>
> while ($result = pg_fetch_array($results)
> {
>     echo implode("|", $result) . "\n"; //typo before

> }
>
> pg_close($db);
>
> Obviously that wouldn't do by itself, but it's quite simple to loop over
> a result set.
>
> Regards
>
> Thom
>

If that's the case surely you'd use some SQL to merge the data into one
long string and return it in a single record.
Looping over two million results is going to take a while, not to mention
quite a bit of memory.
v.


I don't understand.  Return all data as a single massive string?

Thom

Re: large resultset

From
Andrew McMillan
Date:
On Tue, 2010-06-15 at 16:01 +0600, AI Rumman wrote:
> No. I need to send 2 million records. I want to know what is the best
> possible way to send these records?
> HOw should I write the plpgsql procedure to send record ony by one to
> improve the response time to the users?

I don't think you're providing enough information for us to help you.

Your problem with two million users might be:

* But it takes so long to loop through them...
* I run out of memory receiving the resultset from the far end.
* How do I optimise this SQL query that fetches 2 million records.
* Or (likely) something I haven't considered.

Your 'How' question might be:

* Should I be using a cursor to access these efficiently, by sending
data in several chunks?

* How can I write this so I don't waste my time if the person on the far
end gave up waiting?

Etc.


Fundamentally sending 2million of anything can get problematic pretty
darn quickly, unless the 'thing' is less than 100 bytes.


My personal favourite would be to write a record somewhere saying 'so
and so wants these 2 million records', and give the user a URL where
they can fetch them from.  Or e-mail them to the user, or... just about
anything, except try and generate them in-line with the page, in a
reasonable time for their browser to not give up, or their proxy to not
give up, or their ISP's transparent proxy to not give up.

Why do they want 2 million record anyway?  2 million of what?  Will
another user drop by 10 seconds later and also want 2 million records?
The same 2 million?  Why does the user want 2 million records?  Is there
something that can be done to the 2 million records to make them a
smaller but more useful set of information?


Hopefully this stream of consciousness has some help buried in it
somewhere :-)


Cheers,
                    Andrew McMillan.


--
------------------------------------------------------------------------
http://andrew.mcmillan.net.nz/                     Porirua, New Zealand
Twitter: _karora                                  Phone: +64(272)DEBIAN
            Water, taken in moderation cannot hurt anybody.
                             -- Mark Twain

------------------------------------------------------------------------


Attachment

Re: large resultset

From
Raymond O'Donnell
Date:
On 15/06/2010 11:01, AI Rumman wrote:
> No. I need to send 2 million records. I want to know what is the best
> possible way to send these records?
> HOw should I write the plpgsql procedure to send record ony by one to
> improve the response time to the users?

Are you really going to display all 2 million at once? Surely you'd be
better off displaying smaller pages at a time.

In any case, you haven't been clear about what you are trying to do - if
you explain in more detail, we'll be better able to help.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Re: large resultset

From
Jasen Betts
Date:
On 2010-06-15, vinny <vinny@xs4all.nl> wrote:
>
> On Tue, 15 Jun 2010 10:33:05 +0100, Thom Brown <thombrown@gmail.com>
> wrote:
>> On 15 June 2010 09:31, AI Rumman  wrote:
>>  How to return large resutlset (almost 2 millions record) in php?
>>
>> Presumably this is so people can download results rather than display
>> them in a browser?
>>
>> Here's a basic a raw version of what you can do:
>>
>> $db = pg_connect($connection_string);
>>
>> $results = pg_query($db, $query);
>>
>> while ($result = pg_fetch_array($results)
>> {
>>     echo implode("|", $result) . "n";
>> }
>>
>> pg_close($db);
>>
>> Obviously that wouldn't do by itself, but it's quite simple to loop over
>> a result set.
>>
>> Regards
>>
>> Thom
>>
>
> If that's the case surely you'd use some SQL to merge the data into one
> long string and return it in a single record.
> Looping over two million results is going to take a while,

yeah, so is transmitting the bytes.

> not to mention quite a bit of memory.

yeah, libpq is going to want to load the result-set into memory.
you'll probably hit PHPs memory limit and it'll be game over.

The only pure php solution is to use a cursor. (seing as pg_get_line
is not avaulable)

OTOH you can do copy like this...

  $c=escapeshellarg($connstr);
  $q=escapeshellarg($query);
  system ("psql $c -c 'copy ( '$q' ) to stdout with csv header'");

needs *nix based server and pgversion >= 8.3 (I think).

You don't get to choose the format beyond what pg supports, but I
think you can tweak encoding on the command-line if needed.
OTOH you could use ICONV for that.

Re: large resultset

From
AI Rumman
Date:
Thanks a lot.
Actualy I am new with Postgresql.
I am using POstgresql 8.1.
The application is for giving the administrator all their email activities. Thats why it is 2 million of records. It is an CRM application.

On Tue, Jun 15, 2010 at 4:37 PM, Andrew McMillan <andrew@morphoss.com> wrote:
On Tue, 2010-06-15 at 16:01 +0600, AI Rumman wrote:
> No. I need to send 2 million records. I want to know what is the best
> possible way to send these records?
> HOw should I write the plpgsql procedure to send record ony by one to
> improve the response time to the users?

I don't think you're providing enough information for us to help you.

Your problem with two million users might be:

* But it takes so long to loop through them...
* I run out of memory receiving the resultset from the far end.
* How do I optimise this SQL query that fetches 2 million records.
* Or (likely) something I haven't considered.

Your 'How' question might be:

* Should I be using a cursor to access these efficiently, by sending
data in several chunks?

* How can I write this so I don't waste my time if the person on the far
end gave up waiting?

Etc.


Fundamentally sending 2million of anything can get problematic pretty
darn quickly, unless the 'thing' is less than 100 bytes.


My personal favourite would be to write a record somewhere saying 'so
and so wants these 2 million records', and give the user a URL where
they can fetch them from.  Or e-mail them to the user, or... just about
anything, except try and generate them in-line with the page, in a
reasonable time for their browser to not give up, or their proxy to not
give up, or their ISP's transparent proxy to not give up.

Why do they want 2 million record anyway?  2 million of what?  Will
another user drop by 10 seconds later and also want 2 million records?
The same 2 million?  Why does the user want 2 million records?  Is there
something that can be done to the 2 million records to make them a
smaller but more useful set of information?


Hopefully this stream of consciousness has some help buried in it
somewhere :-)


Cheers,
                                       Andrew McMillan.


--
------------------------------------------------------------------------
http://andrew.mcmillan.net.nz/                     Porirua, New Zealand
Twitter: _karora                                  Phone: +64(272)DEBIAN
           Water, taken in moderation cannot hurt anybody.
                            -- Mark Twain

------------------------------------------------------------------------


Re: large resultset

From
Jasen Betts
Date:
On 2010-06-15, Andrew McMillan <andrew@morphoss.com> wrote:

> Fundamentally sending 2million of anything can get problematic pretty
> darn quickly, unless the 'thing' is less than 100 bytes.
>
> My personal favourite would be to write a record somewhere saying 'so
> and so wants these 2 million records', and give the user a URL where
> they can fetch them from.  Or e-mail them to the user, or... just about
> anything, except try and generate them in-line with the page, in a
> reasonable time for their browser to not give up, or their proxy to not
> give up, or their ISP's transparent proxy to not give up.

email often fails for sizes over 10Mb

> Why do they want 2 million record anyway?  2 million of what?  Will
> another user drop by 10 seconds later and also want 2 million records?
> The same 2 million?  Why does the user want 2 million records?  Is there
> something that can be done to the 2 million records to make them a
> smaller but more useful set of information?

/Nobody/ wants a web page with 2 million lines on it,
(scrolling gets tricky when each pixel is 2000 lines of data, plus
most browsers aren't designed to handle it well)


still if it's served with "Content-Disposition: Attachment"
they'll get offered it for download instead.
(unless they use IE and you use cookies and SSL in which case it
doesn't work)

Re: large resultset

From
vinny
Date:
On Tue, 15 Jun 2010 18:03:24 +0600, AI Rumman <rummandba@gmail.com> wrote:
> Thanks a lot.
> Actualy I am new with Postgresql.
> I am using POstgresql 8.1.
> The application is for giving the administrator all their email
> activities. Thats why it is 2 million of records. It is an CRM
> application.
>

But do you want to display all 2 Million rows on screen? Because that's
not going to work.
If you just want to offer the data as a download you're ok if you do it in
SQL and do not send all rows seperately.

Re: large resultset

From
Andrew McMillan
Date:
On Tue, 2010-06-15 at 18:03 +0600, AI Rumman wrote:
> Thanks a lot.
> Actualy I am new with Postgresql.
> I am using POstgresql 8.1.
> The application is for giving the administrator all their email
> activities. Thats why it is 2 million of records. It is an CRM
> application.

OK, so it still seems to me that you should go back to your user and say
"What are you going to do with these records?".  They may well say
something like "I'm going to load them into $SPREADSHEET and do
$ANALYSIS of $SOMETHING" at which point you can shortcut their future
pain by pointing out "but you realise there are 2 million of them, and
we have them in a database already.  Why not tell me what analysis you
want to do now, and I will just present you with the analysis?".  It may
well be that they will only ever want subsets of the results, so you can
write your page to offer them narrower sets of records, and never all at
once.

Or if they can't really adequately explain what they are going to do
with these records, and still insist they will want the whole 2million
at once, write the actual processing in a shell script and e-mail them
the results - just trigger it from the web request.

My experience is that users only want ridiculous things if they think
that what they are requesting is reasonable.  So if their request seems
unreasonable then either you don't understand it well enough, or they
don't understand computers, and talking to them isn't ever going to make
it worse.

Cheers,
                    Andrew McMillan.

>
> On Tue, Jun 15, 2010 at 4:37 PM, Andrew McMillan <andrew@morphoss.com>
> wrote:
>         On Tue, 2010-06-15 at 16:01 +0600, AI Rumman wrote:
>         > No. I need to send 2 million records. I want to know what is
>         the best
>         > possible way to send these records?
>         > HOw should I write the plpgsql procedure to send record ony
>         by one to
>         > improve the response time to the users?
>
>
>         I don't think you're providing enough information for us to
>         help you.
>
>         Your problem with two million users might be:
>
>         * But it takes so long to loop through them...
>         * I run out of memory receiving the resultset from the far
>         end.
>         * How do I optimise this SQL query that fetches 2 million
>         records.
>         * Or (likely) something I haven't considered.
>
>         Your 'How' question might be:
>
>         * Should I be using a cursor to access these efficiently, by
>         sending
>         data in several chunks?
>
>         * How can I write this so I don't waste my time if the person
>         on the far
>         end gave up waiting?
>
>         Etc.
>
>
>         Fundamentally sending 2million of anything can get problematic
>         pretty
>         darn quickly, unless the 'thing' is less than 100 bytes.
>
>
>         My personal favourite would be to write a record somewhere
>         saying 'so
>         and so wants these 2 million records', and give the user a URL
>         where
>         they can fetch them from.  Or e-mail them to the user, or...
>         just about
>         anything, except try and generate them in-line with the page,
>         in a
>         reasonable time for their browser to not give up, or their
>         proxy to not
>         give up, or their ISP's transparent proxy to not give up.
>
>         Why do they want 2 million record anyway?  2 million of what?
>          Will
>         another user drop by 10 seconds later and also want 2 million
>         records?
>         The same 2 million?  Why does the user want 2 million
>         records?  Is there
>         something that can be done to the 2 million records to make
>         them a
>         smaller but more useful set of information?
>
>
>         Hopefully this stream of consciousness has some help buried in
>         it
>         somewhere :-)
>
>
>         Cheers,
>                                                Andrew McMillan.
>
>
>         --
>         ------------------------------------------------------------------------
>         http://andrew.mcmillan.net.nz/                     Porirua,
>         New Zealand
>         Twitter: _karora                                  Phone:
>         +64(272)DEBIAN
>                    Water, taken in moderation cannot hurt anybody.
>                                     -- Mark Twain
>
>         ------------------------------------------------------------------------
>
>

--
------------------------------------------------------------------------
andrew (AT) morphoss (DOT) com                            +64(272)DEBIAN
Though a superhero, Bruce Schneier disdains the use of a mask or secret
               identity as 'security through obscurity'.

------------------------------------------------------------------------


Attachment

Re: large resultset

From
Marco Dieckhoff
Date:
Am 15.06.2010 11:40, schrieb vinny:
> On Tue, 15 Jun 2010 10:33:05 +0100, Thom Brown<thombrown@gmail.com>
> wrote:
>
>> On 15 June 2010 09:31, AI Rumman  wrote:
>>   How to return large resutlset (almost 2 millions record) in php?
>>
>>
>> Obviously that wouldn't do by itself, but it's quite simple to loop over
>> a result set.
>>
> If that's the case surely you'd use some SQL to merge the data into one
> long string and return it in a single record.
> Looping over two million results is going to take a while, not to mention
> quite a bit of memory.
> v.
>

huh?

Returning one large record containg the information of two million
records will almost certainly fill up your memory.
First the memory of the sql server, in order to process the combination
of two million records, and if that's even successfull, the memory of
the server processing php.

Looping may take "a little" longer, but as only one row at a time is
fetched, it should be the least possible memory use.

In most cases, you don't want to process every two million datasets in
php anyway.
You may want to aggregate the data or maybe use created functions on the
sql server.


regards,
Marco


Re: large resultset

From
vinny
Date:
On Tue, 2010-06-15 at 11:49 +0200, Marco Dieckhoff wrote:
> Am 15.06.2010 11:40, schrieb vinny:
> > On Tue, 15 Jun 2010 10:33:05 +0100, Thom Brown<thombrown@gmail.com>
> > wrote:
> >
> >> On 15 June 2010 09:31, AI Rumman  wrote:
> >>   How to return large resutlset (almost 2 millions record) in php?
> >>
> >>
> >> Obviously that wouldn't do by itself, but it's quite simple to loop over
> >> a result set.
> >>
> > If that's the case surely you'd use some SQL to merge the data into one
> > long string and return it in a single record.
> > Looping over two million results is going to take a while, not to mention
> > quite a bit of memory.
> > v.
> >
>
> huh?
>
> Returning one large record containg the information of two million
> records will almost certainly fill up your memory.
> First the memory of the sql server, in order to process the combination
> of two million records, and if that's even successfull, the memory of
> the server processing php.

True, I was a little optimistic there :-)

> Looping may take "a little" longer, but as only one row at a time is
> fetched, it should be the least possible memory use.

As far as I know, PHP always downloads the entire queryresult at once
and makes it available locally. The database may have less trouble
sending two million seperate records but PHP will still need enough
memory to hold the entire resultset at once.

If you'd really want to reduce memoryusage you'd have to execute several
queries that fetch a few thousand rows at a time so you really only deal
with a small portion of the records both on the dbserver and in PHP.