Re: overload - Mailing list pgsql-sql

From lists-pgsql@useunix.net
Subject Re: overload
Date
Msg-id 20110708194949.GM13621@slacker.ja10629.home
Whole thread Raw
In response to Re: overload  (Viktor Bojović <viktor.bojovic@gmail.com>)
Responses Re: overload  (Viktor Bojović <viktor.bojovic@gmail.com>)
List pgsql-sql
Hi Viktor,

I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.

Your method issues many queries and will take longer for each successive
query.  And the number of queries will increase as table size increases. 
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.

If you can tolerate the above issues then so be it, if not you really
should look at cursors.

Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.

Wayne


On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
> 
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
> 
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
> 
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
> 
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
> 
> 
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
> 
> > I'm have the same situation with large tables.  Take a look at using a
> > cursor to fetch several thousand rows at a time.  I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory.  If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution.  You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE  FUNCTION pattern_counter("patLength" integer)
> > >   RETURNS varchar AS
> > > $BODY$
> > >     my $rv = spi_exec_query("select sequence from entry");
> > >     my $rowCount = $rv->{processed};
> > >     my $patLen = $_[0];
> > >     my $patt = '';
> > >     my %patterns=();
> > >     foreach my $rn (0 .. $rowCount -1){
> > >     my $row = $rv->{rows}[$rn];
> > >     my $seq = $row->{sequence};
> > >     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > >         $patt=substr($seq,$x,$patLen);
> > >         if (! defined $patterns{$patt}) {
> > >         $patterns{$patt}=1;
> > >         }else{
> > >         $patterns{$patt}++;
> > >         }
> > >     }
> > >     }
> > >     foreach $patt (keys %patterns){
> > >     my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > >     spi_exec_query($sql);
> > >     }
> > > return '';
> > > $BODY$
> > >   LANGUAGE plperl VOLATILE
> > >   COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
> 
> 
> 
> -- 
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me


pgsql-sql by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: overload
Next
From: Uwe Bartels
Date:
Subject: using explain output within pgsql