Thread: DBD::Pg/perl question, kind of...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. I am in the middle of moving a product from MySQL to Postgre. One of the tables is relatively big, with 100M+ rows and growing, each of which has a column that usually contains between 1-500k of data (the 'MYD' file it is currently 94G). Most of the software that interacts with this database is written in Perl. The machine has 2G of memory, and a lot of times I need to process result sets that are bigger than that. Under normal circumstances using the perl DBI, say you got something going on like my $sth = $dbh->prepare(qq{SOME_QUERY}); $sth->execute; while (my $href = $sth->fetchrow_hashref) { # do stuff } right? The perl DBI tries so stuff the entire result set into memory and dies if it can't. Then the fetchrow_hashref calls to $sth yield references to hashes that represent the rows, but they do not really 'fetch' at all, its just shifting references to data that it already fetched. This is my understanding of things, feel free to correct me anywhere. So with mysql, I can just say $dbh->{'mysql-use-result'} = 1, and then it switches so that the fetchrow_hashref calls are actually fetching from the database, and I only have one row in memory at a time, unless I copy the reference and let it wander off somewhere else, or something. So all I'm really asking is, how does postgre approach the use result/ store result issue? Can I easily process result sets that are larger than memory? And if it handles it similar to mysql, does it also cause the same table locking behaviour? Thanks, Neal Clark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFF9OiIOUuHw4wCzDMRAma+AJ4pUPjVmPZUn7GYlVe4diTQaMCShwCghqCb 7hKG4ZbrSzhO2aqqyIyQu8k= =OkYX -----END PGP SIGNATURE-----
Neal Clark wrote: > my $sth = $dbh->prepare(qq{SOME_QUERY}); > $sth->execute; > while (my $href = $sth->fetchrow_hashref) { > # do stuff > } > [...] > > So with mysql, I can just say $dbh->{'mysql-use-result'} = 1, and > then it switches so that the fetchrow_hashref calls are actually > fetching from the database, and I only have one row in memory at a > time, unless I copy the reference and let it wander off somewhere > else, or something. > > So all I'm really asking is, how does postgre approach the use result/ > store result issue? Can I easily process result sets that are larger > than memory? And if it handles it similar to mysql, does it also > cause the same table locking behaviour? The man page of DBD::Pg says, and my experiments with tcpdump confirm: RowCacheSize (integer) Implemented by DBI, not used by this driver. And more: Cursors Although PostgreSQL has a cursor concept, it has not been used in the current implementation. Cursors in PostgreSQL can only be used inside a transaction block. Because only one transaction block at a time is allowed, this would have implied the restriction not to use any nested "SELECT" statements. Hence the "execute" method fetches all data at once into data structures located in the front-end application. This approach must to be considered when selecting large amounts of data! So there is no automatic way of handling it. You will probably have to consider it in your code and use SELECT-Statements with a LIMIT clause. Yours, Laurenz Albe
"Albe Laurenz" <all@adv.magwien.gv.at> writes: > So there is no automatic way of handling it. > > You will probably have to consider it in your code and use > SELECT-Statements > with a LIMIT clause. Either that, or explicitly DECLARE a CURSOR and use FETCH from that cursor in batches. You can do this in Perl even though the automatic cursor support is not yet there. You are restricted to staying in a transaction while the cursor is open, so if you want to work outside of transactions LIMIT/OFFSET is your only way. -Doug
On Mon, Mar 12, 2007 at 08:38:52AM -0400, Douglas McNaught wrote: > You are restricted to staying in a transaction while the cursor is > open, so if you want to work outside of transactions LIMIT/OFFSET > is your only way. http://www.postgresql.org/docs/8.2/interactive/sql-declare.html "If WITH HOLD is specified and the transaction that created the cursor successfully commits, the cursor can continue to be accessed by subsequent transactions in the same session. (But if the creating transaction is aborted, the cursor is removed.) A cursor created with WITH HOLD is closed when an explicit CLOSE command is issued on it, or the session ends. In the current implementation, the rows represented by a held cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions." -- Michael Fuhr
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks for all the replies everyone. Not really knowing what a cursor is, I suppose I have some work to do. I can do the SELECT/LIMIT/ OFFSET approach but that seems like kind of a headache, esp. when its hard to predict what # of rows will max out memory... I'd have to keep that number pretty small, effectively making the same exact query over and over, which sounds pretty slow. I'm not really using pgsql yet, so a lot of this is beyond me, I'm just thinking ahead as I start to migrate from mysql... Thanks again, Neal On Mar 12, 2007, at 7:09 AM, Michael Fuhr wrote: > On Mon, Mar 12, 2007 at 08:38:52AM -0400, Douglas McNaught wrote: >> You are restricted to staying in a transaction while the cursor is >> open, so if you want to work outside of transactions LIMIT/OFFSET >> is your only way. > > http://www.postgresql.org/docs/8.2/interactive/sql-declare.html > > "If WITH HOLD is specified and the transaction that created the > cursor successfully commits, the cursor can continue to be accessed > by subsequent transactions in the same session. (But if the creating > transaction is aborted, the cursor is removed.) A cursor created > with WITH HOLD is closed when an explicit CLOSE command is issued > on it, or the session ends. In the current implementation, the rows > represented by a held cursor are copied into a temporary file or > memory area so that they remain available for subsequent > transactions." > > -- > Michael Fuhr > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFF9ar2OUuHw4wCzDMRAtE6AKCKt226m/qql6lFGw4VkU7tRQC2ogCfebGs B47wxieD8TBK5GgAQbwDUxk= =+rwM -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Okay, I don't have any postgresql tables big enough to verify this is doing what I think it is (namely, only keeping one row from my result set in memory at a time), and I still don't really know much about cursors or pg, but this appears to be doing what I want to do: $dbh->do('BEGIN WORK;'); $dbh->do('DECLARE my_cur CURSOR FOR SELECT * FROM my_table ORDER BY account_id;'); my $sth = $dbh->prepare(qq{FETCH FORWARD 1 FROM my_cur}); $sth->execute; while (my $href = $sth->fetchrow_hashref) { my $field1 = $href->{field1}; my $account_id = $href->{account_id}; ## do stuff $sth->execute; } $dbh->do("COMMIT WORK;"); really the only thing that doesn't look DBI-ish about it is calling $sth->execute at the end of the while loop... like to fill up the statement handler with the data to fetchrow_hashref on the next time around. comments? On Mar 12, 2007, at 12:49 PM, A.M. wrote: > > On Mar 12, 2007, at 15:33 , Neal Clark wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Thanks for all the replies everyone. Not really knowing what a >> cursor is, I suppose I have some work to do. I can do the SELECT/ >> LIMIT/OFFSET approach but that seems like kind of a headache, esp. >> when its hard to predict what # of rows will max out memory... I'd >> have to keep that number pretty small, effectively making the same >> exact query over and over, which sounds pretty slow. >> >> I'm not really using pgsql yet, so a lot of this is beyond me, I'm >> just thinking ahead as I start to migrate from mysql... > > Don't use LIMIT/OFFSET. The cursor is exactly what you want. A > cursor effectively caches the query results on the server side and > feeds the results to the client on demand. > > Cheers, > M > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFF9bhGOUuHw4wCzDMRAsfsAKCt+mtj0ITygdzenTCEZSA/1UibHwCgqPVe rKEOlx3dCWD50C2kQ7nzhRc= =RUbR -----END PGP SIGNATURE-----
Neal Clark <nclark@securescience.net> writes: > comments? Looks like the right idea. If you have a lot of rows to process, you'll benefit by fetching in batches, e.g. my $sth = $dbh->prepare(qq{FETCH FORWARD 1000 FROM my_cur}); # iterate through the result set here.... -Doug