Thread: DBD::Pg/perl question, kind of...

DBD::Pg/perl question, kind of...

From
Neal Clark
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi.

I am in the middle of moving a product from MySQL to Postgre. One of
the tables is relatively big, with 100M+ rows and growing, each of
which has a column that usually contains between 1-500k of data (the
'MYD' file it is currently 94G).

Most of the software that interacts with this database is written in
Perl. The machine has 2G of memory, and a lot of times I need to
process result sets that are bigger than that. Under normal
circumstances using the perl DBI, say you got something going on like

my $sth = $dbh->prepare(qq{SOME_QUERY});
$sth->execute;
while (my $href = $sth->fetchrow_hashref) {
    # do stuff
}

right? The perl DBI tries so stuff the entire result set into memory
and dies if it can't. Then the fetchrow_hashref calls to $sth yield
references to hashes that represent the rows, but they do not really
'fetch' at all, its just shifting references to data that it already
fetched. This is my understanding of things, feel free to correct me
anywhere.

So with mysql, I can just say $dbh->{'mysql-use-result'} = 1, and
then it switches so that the fetchrow_hashref calls are actually
fetching from the database, and I only have one row in memory at a
time, unless I copy the reference and let it wander off somewhere
else, or something.

So all I'm really asking is, how does postgre approach the use result/
store result issue? Can I easily process result sets that are larger
than memory? And if it handles it similar to mysql, does it also
cause the same table locking behaviour?

Thanks,
Neal Clark


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFF9OiIOUuHw4wCzDMRAma+AJ4pUPjVmPZUn7GYlVe4diTQaMCShwCghqCb
7hKG4ZbrSzhO2aqqyIyQu8k=
=OkYX
-----END PGP SIGNATURE-----

Re: DBD::Pg/perl question, kind of...

From
"Albe Laurenz"
Date:
Neal Clark wrote:
> my $sth = $dbh->prepare(qq{SOME_QUERY});
> $sth->execute;
> while (my $href = $sth->fetchrow_hashref) {
>     # do stuff
> }
>
[...]
>
> So with mysql, I can just say $dbh->{'mysql-use-result'} = 1, and
> then it switches so that the fetchrow_hashref calls are actually
> fetching from the database, and I only have one row in memory at a
> time, unless I copy the reference and let it wander off somewhere
> else, or something.
>
> So all I'm really asking is, how does postgre approach the use result/

> store result issue? Can I easily process result sets that are larger
> than memory? And if it handles it similar to mysql, does it also
> cause the same table locking behaviour?

The man page of DBD::Pg says, and my experiments with tcpdump confirm:

   RowCacheSize  (integer)
   Implemented by DBI, not used by this driver.

And more:

   Cursors

   Although PostgreSQL has a cursor concept, it has not been used in the
   current implementation. Cursors in PostgreSQL can only be used inside
a
   transaction block. Because only one transaction block at a time is
   allowed, this would have implied the restriction not to use any
nested
   "SELECT" statements. Hence the "execute" method fetches all data at
   once into data structures located in the front-end application. This
   approach must to be considered when selecting large amounts of data!

So there is no automatic way of handling it.

You will probably have to consider it in your code and use
SELECT-Statements
with a LIMIT clause.

Yours,
Laurenz Albe

Re: DBD::Pg/perl question, kind of...

From
Douglas McNaught
Date:
"Albe Laurenz" <all@adv.magwien.gv.at> writes:

> So there is no automatic way of handling it.
>
> You will probably have to consider it in your code and use
> SELECT-Statements
> with a LIMIT clause.

Either that, or explicitly DECLARE a CURSOR and use FETCH from that
cursor in batches.  You can do this in Perl even though the automatic
cursor support is not yet there.  You are restricted to staying in a
transaction while the cursor is open, so if you want to work outside
of transactions LIMIT/OFFSET is your only way.

-Doug

Re: DBD::Pg/perl question, kind of...

From
Michael Fuhr
Date:
On Mon, Mar 12, 2007 at 08:38:52AM -0400, Douglas McNaught wrote:
> You are restricted to staying in a transaction while the cursor is
> open, so if you want to work outside of transactions LIMIT/OFFSET
> is your only way.

http://www.postgresql.org/docs/8.2/interactive/sql-declare.html

"If WITH HOLD is specified and the transaction that created the
cursor successfully commits, the cursor can continue to be accessed
by subsequent transactions in the same session. (But if the creating
transaction is aborted, the cursor is removed.) A cursor created
with WITH HOLD is closed when an explicit CLOSE command is issued
on it, or the session ends. In the current implementation, the rows
represented by a held cursor are copied into a temporary file or
memory area so that they remain available for subsequent transactions."

--
Michael Fuhr

Re: DBD::Pg/perl question, kind of...

From
Neal Clark
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for all the replies everyone. Not really knowing what a cursor
is, I suppose I have some work to do. I can do the SELECT/LIMIT/
OFFSET approach but that seems like kind of a headache, esp. when its
hard to predict what # of rows will max out memory... I'd have to
keep that number pretty small, effectively making the same exact
query over and over, which sounds pretty slow.

I'm not really using pgsql yet, so a lot of this is beyond me, I'm
just thinking ahead as I start to migrate from mysql...

Thanks again,
Neal

On Mar 12, 2007, at 7:09 AM, Michael Fuhr wrote:

> On Mon, Mar 12, 2007 at 08:38:52AM -0400, Douglas McNaught wrote:
>> You are restricted to staying in a transaction while the cursor is
>> open, so if you want to work outside of transactions LIMIT/OFFSET
>> is your only way.
>
> http://www.postgresql.org/docs/8.2/interactive/sql-declare.html
>
> "If WITH HOLD is specified and the transaction that created the
> cursor successfully commits, the cursor can continue to be accessed
> by subsequent transactions in the same session. (But if the creating
> transaction is aborted, the cursor is removed.) A cursor created
> with WITH HOLD is closed when an explicit CLOSE command is issued
> on it, or the session ends. In the current implementation, the rows
> represented by a held cursor are copied into a temporary file or
> memory area so that they remain available for subsequent
> transactions."
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFF9ar2OUuHw4wCzDMRAtE6AKCKt226m/qql6lFGw4VkU7tRQC2ogCfebGs
B47wxieD8TBK5GgAQbwDUxk=
=+rwM
-----END PGP SIGNATURE-----

Re: DBD::Pg/perl question, kind of...

From
Neal Clark
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Okay, I don't have any postgresql tables big enough to verify this is
doing what I think it is (namely, only keeping one row from my result
set in memory at a time), and I still don't really know much about
cursors or pg, but this appears to be doing what I want to do:

$dbh->do('BEGIN WORK;');
$dbh->do('DECLARE my_cur CURSOR FOR SELECT * FROM my_table ORDER BY
account_id;');
my $sth = $dbh->prepare(qq{FETCH FORWARD 1 FROM my_cur});

$sth->execute;
while (my $href = $sth->fetchrow_hashref) {
     my $field1 = $href->{field1};
     my $account_id = $href->{account_id};

     ## do stuff

     $sth->execute;
}

$dbh->do("COMMIT WORK;");

really the only thing that doesn't look DBI-ish about it is calling
$sth->execute at the end of the while loop... like to fill up the
statement handler with the data to fetchrow_hashref on the next time
around.

comments?

On Mar 12, 2007, at 12:49 PM, A.M. wrote:

>
> On Mar 12, 2007, at 15:33 , Neal Clark wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Thanks for all the replies everyone. Not really knowing what a
>> cursor is, I suppose I have some work to do. I can do the SELECT/
>> LIMIT/OFFSET approach but that seems like kind of a headache, esp.
>> when its hard to predict what # of rows will max out memory... I'd
>> have to keep that number pretty small, effectively making the same
>> exact query over and over, which sounds pretty slow.
>>
>> I'm not really using pgsql yet, so a lot of this is beyond me, I'm
>> just thinking ahead as I start to migrate from mysql...
>
> Don't use LIMIT/OFFSET. The cursor is exactly what you want. A
> cursor effectively caches the query results on the server side and
> feeds the results to the client on demand.
>
> Cheers,
> M
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFF9bhGOUuHw4wCzDMRAsfsAKCt+mtj0ITygdzenTCEZSA/1UibHwCgqPVe
rKEOlx3dCWD50C2kQ7nzhRc=
=RUbR
-----END PGP SIGNATURE-----

Re: DBD::Pg/perl question, kind of...

From
Douglas McNaught
Date:
Neal Clark <nclark@securescience.net> writes:

> comments?

Looks like the right idea.  If you have a lot of rows to process,
you'll benefit by fetching in batches, e.g.

my $sth = $dbh->prepare(qq{FETCH FORWARD 1000 FROM my_cur});

# iterate through the result set here....

-Doug