Re: overload - Mailing list pgsql-sql

From Viktor Bojović
Subject Re: overload
Date
Msg-id CAJu1cLbPwGCFuFVUJJRDvKzVqUFZWscvaMCHjpczrdMu8D0T7A@mail.gmail.com
Whole thread Raw
In response to Re: overload  (lists-pgsql@useunix.net)
Responses Re: overload  (Viktor Bojović <viktor.bojovic@gmail.com>)
List pgsql-sql
Thanx Wayne and Pavel,

i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.

I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors.
Any of these methods links/examples would be helpful to me.

Thanx in advance

On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql@useunix.net> wrote:
Hi Viktor,

I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.

Your method issues many queries and will take longer for each successive
query.  And the number of queries will increase as table size increases.
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.

If you can tolerate the above issues then so be it, if not you really
should look at cursors.

Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.

Wayne


On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
>
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
>
> > I'm have the same situation with large tables.  Take a look at using a
> > cursor to fetch several thousand rows at a time.  I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory.  If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution.  You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE  FUNCTION pattern_counter("patLength" integer)
> > >   RETURNS varchar AS
> > > $BODY$
> > >     my $rv = spi_exec_query("select sequence from entry");
> > >     my $rowCount = $rv->{processed};
> > >     my $patLen = $_[0];
> > >     my $patt = '';
> > >     my %patterns=();
> > >     foreach my $rn (0 .. $rowCount -1){
> > >     my $row = $rv->{rows}[$rn];
> > >     my $seq = $row->{sequence};
> > >     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > >         $patt=substr($seq,$x,$patLen);
> > >         if (! defined $patterns{$patt}) {
> > >         $patterns{$patt}=1;
> > >         }else{
> > >         $patterns{$patt}++;
> > >         }
> > >     }
> > >     }
> > >     foreach $patt (keys %patterns){
> > >     my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > >     spi_exec_query($sql);
> > >     }
> > > return '';
> > > $BODY$
> > >   LANGUAGE plperl VOLATILE
> > >   COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me



--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

pgsql-sql by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: what is similar to like operator in mysql for postgresql
Next
From: Viktor Bojović
Date:
Subject: Re: overload