Thread: Perl/DBI vs Native
Hi, I have ran quite a few tests comparing how long a query takes to execute from Perl/DBI as compared to psql/pqlib. No matterhow many times I run the test the results were always the same. I run a SELECT all on a fairly big table and enabled the log_min_duration_statement option. With psql postgres consistentlylogs half a second while the exact same query executed with Perl/DBI takes again consistently 2 seconds. If I were timing the applications I would have been too much surprised by these results, obviously, processing with Perlwould be slower than a native application. But it's the postmaster that gives these results. Could it be because theDBI module is slower at assimilating the data? Any light on the subject would be greatly appreciated. Regards, Val __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
Valentin Bogdanov wrote: > Hi, > > I have ran quite a few tests comparing how long a query takes to execute from Perl/DBI as compared to psql/pqlib. No matterhow many times I run the test the results were always the same. > > I run a SELECT all on a fairly big table and enabled the log_min_duration_statement option. With psql postgres consistentlylogs half a second while the exact same query executed with Perl/DBI takes again consistently 2 seconds. > > If I were timing the applications I would have been too much surprised by these results, obviously, processing with Perlwould be slower than a native application. But it's the postmaster that gives these results. Could it be because theDBI module is slower at assimilating the data? > > Any light on the subject would be greatly appreciated. Random guess: Perl's DBI is using parameterized prepared statements, preventing the optimizer from using its knowledge about common values in the table to decide whether or not index use is appropriate. When you're writing the query in psql, you're not using prepared statements so the optimizer can be cleverer. Try comparing: SELECT statement to PREPARE test(params) AS statement; EXECUTE test(params); eg: SELECT x + 44 FROM t; vs: PREPARE test(int) AS x + $1 FROM t; EXECUTE test(44); Use EXPLAIN ANALYZE to better understand the changes in the query plan. -- Craig Ringer
On Jul 21, 2008, at 5:19 AM, Valentin Bogdanov wrote: > Hi, > > I have ran quite a few tests comparing how long a query takes to > execute from Perl/DBI as compared to psql/pqlib. No matter how many > times I run the test the results were always the same. > > I run a SELECT all on a fairly big table and enabled the > log_min_duration_statement option. With psql postgres consistently > logs half a second while the exact same query executed with Perl/DBI > takes again consistently 2 seconds. > > If I were timing the applications I would have been too much > surprised by these results, obviously, processing with Perl would be > slower than a native application. But it's the postmaster that gives > these results. Could it be because the DBI module is slower at > assimilating the data? Hi Val, Yes, DBI can be slower then the native C interface. The speed depends on how the data is being returned inside of Perl. Returning hashes is a slower method then returning arrays from what I've found due to the overhead in the creation of the objects in Perl. So: my $r = $dbh->selectall_arrayref("select * from table", { Columns => {}}); Is slower then: my $r = $dbh->selectall_arrayref("select * from table", undef); Secondarily, if you're returning a lot of rows you may want to look into using a cursor, so that you can fetch the rows a 1000 at a time in a tight loop then discard them once you are done with them. This will hopefully prevent the system from having continually allocate memory for all of your rows. For each field in each row Perl allocates memory to store the value from Postgres, so if you have many fields on your table this can be a large number of allocations depending on the number of rows. Any userland profile tool should help you debug what's going on here. Cheers, Rusty -- Rusty Conover InfoGears Inc. http://www.infogears.com - http://www.gearbuyer.com - http://www.footwearbuyer.com
Valentin Bogdanov wrote: > I have ran quite a few tests comparing how long a query takes to > execute from Perl/DBI as compared to psql/pqlib. No matter how many > times I run the test the results were always the same. > > I run a SELECT all on a fairly big table and enabled the > log_min_duration_statement option. With psql postgres consistently > logs half a second while the exact same query executed with Perl/DBI > takes again consistently 2 seconds. The problem may be that your two tests are not equivalent. When Perl executes a statement, it copies the *entire* resultset back to the client before it returns the first row. The following program might appear to just be fetching thefirst row: $sth = $dbh->prepare("select item from mytable"); $sth->execute(); $item = $sth->fetchrow_array(); But in fact, before Perl returns from the $sth->execute() statement, it has already run the query and copied all of the rowsinto a hidden, client-side cache. Each $sth->fetchrow_array() merely copies the data from the hidden cache into yourlocal variable. By contrast, psql executes the query, and starts returning the data a page at a time. So it may appear to be much faster. This also means that Perl has trouble with very large tables. If the "mytable" in the above example is very large, say ahundred billion rows, you simply can't execute this statement in Perl. It will try to copy 100 billion rows into memorybefore returning the first answer. The reason for Perl's behind-the-scenes caching is because it allows multiple connections to a single database, and multiplestatements on each database handle. By executing each statement completely, it gives the appearance that multipleconcurrent queries are supported. The downside is that it can be a huge memory hog. Craig
Craig James <craig_james@emolecules.com> writes: > Valentin Bogdanov wrote: >> I have ran quite a few tests comparing how long a query takes to >> execute from Perl/DBI as compared to psql/pqlib. No matter how many >> times I run the test the results were always the same. >> >> I run a SELECT all on a fairly big table and enabled the >> log_min_duration_statement option. With psql postgres consistently >> logs half a second while the exact same query executed with Perl/DBI >> takes again consistently 2 seconds. > The problem may be that your two tests are not equivalent. When Perl > executes a statement, it copies the *entire* result set back to the > client before it returns the first row. Sure, but so does psql (unless you've turned on the magic FETCH_COUNT setting). I think the theories about prepared versus literal statements were more promising; but I don't know DBI well enough to know exactly what it was sending to the server. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Tom Lane wrote: > Sure, but so does psql (unless you've turned on the magic FETCH_COUNT > setting). I think the theories about prepared versus literal statements > were more promising; but I don't know DBI well enough to know exactly > what it was sending to the server. Almost certainly a prepared_statement unless no placeholders were being used at all. Another way to test (from the DBI side) is to set $sth->{pg_server_prepare} = 0, which will send the SQL directly to the backend, just as if you've typed it in at a command prompt. You can also use the tracing mechanism of DBI to see what's going on behind the scenes. For example: $dbh->trace('SQL'); $dbh->do("SELECT 1234 FROM pg_class WHERE relname = 'bob'"); $dbh->do("SELECT 1234 FROM pg_class WHERE relname = ?", undef, 'mallory'); $sth = $dbh->prepare("SELECT 4567 FROM pg_class WHERE relname = ?"); $sth->execute('alice'); $sth->{pg_server_prepare} = 0; $sth->execute('eve1'); $sth->{pg_server_prepare} = 1; $sth->execute('eve2'); $dbh->commit; Outputs: === begin; SELECT 1234 FROM pg_class WHERE relname = 'bob'; EXECUTE SELECT 1234 FROM pg_class WHERE relname = $1 ( $1: mallory ); PREPARE dbdpg_p22988_1 AS SELECT 4567 FROM pg_class WHERE relname = $1; EXECUTE dbdpg_p22988_1 ( $1: alice ); SELECT 4567 FROM pg_class WHERE relname = 'eve1'; EXECUTE dbdpg_p22988_1 ( $1: eve2 ); commit; DEALLOCATE dbdpg_p22988_1; === You can even view exactly which libpq calls are being used at each point with: $dbh->trace('SQL,libpq'); To get back to the original poster's complaint, you may want to figure out why the difference is so great for a prepared plan. It may be that you need to cast the placeholder(s) to a specific type, for example. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200807211637 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkiE83wACgkQvJuQZxSWSsiGrwCdGMLgauGwR2UzfoMPrTH/mrRg nxsAnjx14goMV23a9yRjtSw+ixJWQkuI =gjVE -----END PGP SIGNATURE-----
Thanks to everyone who replied. There were some really good points. However, I found what is causing the difference. The perl program was connecting to the database via a TCP socket while theC version was using Unix socket. I changed the connect in my perl script, so that it now uses Unix sockets as well. Runthe tests again and got identical results for both programs. In case someone is wondering, the way to force DBI to use unix sockets is by not specifying a host and port in the connectcall. Cheers, Val __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > In case someone is wondering, the way to force DBI to use unix > sockets is by not specifying a host and port in the connect call. Actually, the host defaults to the local socket. Using the port may still be needed: if you leave it out, it simply uses the default value (5432) if left out. Thus, for most purposes, just leaving the host out is enough to cause a socket connection on the default port. For completeness in the archives, you can also specify the complete path to the unix socket directory in the host line, for those cases in which the socket is not where you expect it to be: $dbh = DBI->connect('dbi:Pg:dbname=test;host=/var/local/sockets', $user, $pass, {AutoCommit=>0, RaiseError=>1}); - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200807221248 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkiGD1UACgkQvJuQZxSWSsjHTQCfYbGnh3dvs9ggZX0FCSwMro81 sJsAoOUcDyu6vQM43EJOGAay/vXyKWES =hYLf -----END PGP SIGNATURE-----
On Tue, Jul 22, 2008 at 9:48 AM, Greg Sabino Mullane <greg@turnstep.com> wrote: >> In case someone is wondering, the way to force DBI to use unix >> sockets is by not specifying a host and port in the connect call. > > Actually, the host defaults to the local socket. Using the port > may still be needed: if you leave it out, it simply uses the default > value (5432) if left out. Thus, for most purposes, just leaving > the host out is enough to cause a socket connection on the default > port. For the further illumination of the historical record, the best practice here is probably to use the pg_service.conf file, which may or may not live in /etc depending on your operating system. Then you can connect in DBI using dbi:Pg:service=whatever, and change the definition of "whatever" in pg_service.conf. This has the same semantics as PGSERVICE=whatever when using psql. It's a good idea to keep these connection details out of your program code. -jwb
Thanks Guys, this is really useful, especially the pg_service.conf. I have got an app where the connection parameters haveto be set in 3 different places I was thinking of writing something myself but now that I know of pg_service.conf, problemsolved. Regards, Val --- On Tue, 22/7/08, Jeffrey Baker <jwbaker@gmail.com> wrote: > From: Jeffrey Baker <jwbaker@gmail.com> > Subject: Re: [PERFORM] Perl/DBI vs Native > To: "Greg Sabino Mullane" <greg@turnstep.com> > Cc: pgsql-performance@postgresql.org > Date: Tuesday, 22 July, 2008, 9:35 PM > On Tue, Jul 22, 2008 at 9:48 AM, Greg Sabino Mullane > <greg@turnstep.com> wrote: > >> In case someone is wondering, the way to force DBI > to use unix > >> sockets is by not specifying a host and port in > the connect call. > > > > Actually, the host defaults to the local socket. Using > the port > > may still be needed: if you leave it out, it simply > uses the default > > value (5432) if left out. Thus, for most purposes, > just leaving > > the host out is enough to cause a socket connection on > the default > > port. > > For the further illumination of the historical record, the > best > practice here is probably to use the pg_service.conf file, > which may > or may not live in /etc depending on your operating system. > Then you > can connect in DBI using dbi:Pg:service=whatever, and > change the > definition of "whatever" in pg_service.conf. > This has the same > semantics as PGSERVICE=whatever when using psql. It's > a good idea to > keep these connection details out of your program code. > > -jwb > > -- > Sent via pgsql-performance mailing list > (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html