Thread: overload
Hi,<br />while reading 20GB table through PL/PERL function , it constantly grows in RAM. <br />I wanted to ask you whichis the best way to read table inside that function without such memory consumption.<br />Thanks in advance<br /><br/>Code is here:<br /><br />CREATE FUNCTION pattern_counter("patLength" integer)<br /> RETURNS varchar AS<br />$BODY$<br/> my $rv = spi_exec_query("select sequence from entry");<br /> my $rowCount = $rv->{processed};<br /> my $patLen = $_[0];<br /> my $patt = '';<br /> my %patterns=();<br /> foreach my $rn (0 .. $rowCount -1){<br/> my $row = $rv->{rows}[$rn];<br /> my $seq = $row->{sequence};<br /> for (my $x = 1;$x<=length($seq)- $patLen;$x++){<br /> $patt=substr($seq,$x,$patLen);<br /> if (! defined $patterns{$patt}){<br /> $patterns{$patt}=1;<br /> }else{<br /> $patterns{$patt}++;<br /> }<br/> }<br /> }<br /> foreach $patt (keys %patterns){<br /> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";<br/> spi_exec_query($sql);<br /> }<br />return '';<br />$BODY$<br /> LANGUAGEplperl VOLATILE<br /> COST 100;<br /><br /><br clear="all" /><br />-- <br />---------------------------------------<br/>Viktor Bojović<br />---------------------------------------<br />Wherever Igo, Murphy goes with me<br />
I'm have the same situation with large tables. Take a look at using a cursor to fetch several thousand rows at a time. I presume what's happening is that perl is attempting to create a massive list/array in memory. If you use a cursor the list should only contain X number of rows where X in the number specified at each fetch execution. You'll need to define the cursor inside a transaction block. - begin transaction - define the cursor - fetch rows from cursor - while row count from previous step > 0, execute previous step - terminate transaction Or you could use plpgsql instead of plperl, FOR loops over result sets in plpgsql implicitly use cursors... it's just a little less code. Hope that helps, Wayne On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote: > Hi, > while reading 20GB table through PL/PERL function , it constantly grows in > RAM. > I wanted to ask you which is the best way to read table inside that > function without such memory consumption. > Thanks in advance > > Code is here: > > CREATE FUNCTION pattern_counter("patLength" integer) > RETURNS varchar AS > $BODY$ > my $rv = spi_exec_query("select sequence from entry"); > my $rowCount = $rv->{processed}; > my $patLen = $_[0]; > my $patt = ''; > my %patterns=(); > foreach my $rn (0 .. $rowCount -1){ > my $row = $rv->{rows}[$rn]; > my $seq = $row->{sequence}; > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ > $patt=substr($seq,$x,$patLen); > if (! defined $patterns{$patt}) { > $patterns{$patt}=1; > }else{ > $patterns{$patt}++; > } > } > } > foreach $patt (keys %patterns){ > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")"; > spi_exec_query($sql); > } > return ''; > $BODY$ > LANGUAGE plperl VOLATILE > COST 100; > > > > -- > --------------------------------------- > Viktor Bojovi?? > --------------------------------------- > Wherever I go, Murphy goes with me
Thanx Wayne,
at the end i did it that way and it works.
The code is below.
CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character varying
LANGUAGE plperl
AS $_X$
my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
#my $rowCountAll = $rvCnt->{processed};
my $row = $rvCnt->{rows}[0];
my $rowCountAll = $row->{cnt};
my $windowSize = 500000;
my %patterns=();
for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
my $sql="select sequence from entry limit $windowSize offset $p";
my $rv = spi_exec_query($sql);
my $rowCount = $rv->{processed};
my $patLen = $_[0];
my $patt = '';
foreach my $rn (0 .. $rowCount -1){
my $row = $rv->{rows}[$rn];
my $seq = $row->{sequence};
for (my $x = 1;$x<=length($seq) - $patLen;$x++){
$patt=substr($seq,$x,$patLen);
if (! defined $patterns{$patt}) {
$patterns{$patt}=1;
}else{
$patterns{$patt}++;
}
}
}
}
foreach $patt (keys %patterns){
my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
spi_exec_query($sql);
}
return $tmp;
$_X$;
LANGUAGE plperl
AS $_X$
my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
#my $rowCountAll = $rvCnt->{processed};
my $row = $rvCnt->{rows}[0];
my $rowCountAll = $row->{cnt};
my $windowSize = 500000;
my %patterns=();
for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
my $sql="select sequence from entry limit $windowSize offset $p";
my $rv = spi_exec_query($sql);
my $rowCount = $rv->{processed};
my $patLen = $_[0];
my $patt = '';
foreach my $rn (0 .. $rowCount -1){
my $row = $rv->{rows}[$rn];
my $seq = $row->{sequence};
for (my $x = 1;$x<=length($seq) - $patLen;$x++){
$patt=substr($seq,$x,$patLen);
if (! defined $patterns{$patt}) {
$patterns{$patt}=1;
}else{
$patterns{$patt}++;
}
}
}
}
foreach $patt (keys %patterns){
my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
spi_exec_query($sql);
}
return $tmp;
$_X$;
On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
I'm have the same situation with large tables. Take a look at using a
cursor to fetch several thousand rows at a time. I presume what's
happening is that perl is attempting to create a massive list/array in
memory. If you use a cursor the list should only contain X number of
rows where X in the number specified at each fetch execution. You'll
need to define the cursor inside a transaction block.
- begin transaction
- define the cursor
- fetch rows from cursor
- while row count from previous step > 0, execute previous step
- terminate transaction
Or you could use plpgsql instead of plperl, FOR loops over result sets in
plpgsql implicitly use cursors... it's just a little less code.
Hope that helps,
Wayne> Viktor Bojovi??
On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> Hi,
> while reading 20GB table through PL/PERL function , it constantly grows in
> RAM.
> I wanted to ask you which is the best way to read table inside that
> function without such memory consumption.
> Thanks in advance
>
> Code is here:
>
> CREATE FUNCTION pattern_counter("patLength" integer)
> RETURNS varchar AS
> $BODY$
> my $rv = spi_exec_query("select sequence from entry");
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
> my %patterns=();
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return '';
> $BODY$
> LANGUAGE plperl VOLATILE
> COST 100;
>
>
>
> --
> ---------------------------------------> ---------------------------------------
> Wherever I go, Murphy goes with me
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
Hello using a "window" implemented via LIMIT OFFSET is not good - it is solution on some systems where cursors are not available, but it is bad solution on PostgreSQL. Use a cursor instead - it is significantly more efective with less memory requests. Regards Pavel Stehule 2011/7/8 Viktor Bojović <viktor.bojovic@gmail.com>: > Thanx Wayne, > at the end i did it that way and it works. > The code is below. > CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character > varying > LANGUAGE plperl > AS $_X$ > my $rvCnt = spi_exec_query("select count(1) as cnt from entry"); > #my $rowCountAll = $rvCnt->{processed}; > my $row = $rvCnt->{rows}[0]; > my $rowCountAll = $row->{cnt}; > my $windowSize = 500000; > my %patterns=(); > for (my $p=0;$p<$rowCountAll;$p+=$windowSize){ > my $sql="select sequence from entry limit $windowSize offset $p"; > > my $rv = spi_exec_query($sql); > my $rowCount = $rv->{processed}; > my $patLen = $_[0]; > my $patt = ''; > > foreach my $rn (0 .. $rowCount -1){ > my $row = $rv->{rows}[$rn]; > my $seq = $row->{sequence}; > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ > $patt=substr($seq,$x,$patLen); > if (! defined $patterns{$patt}) { > $patterns{$patt}=1; > }else{ > $patterns{$patt}++; > } > } > } > } > > foreach $patt (keys %patterns){ > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")"; > spi_exec_query($sql); > } > return $tmp; > $_X$; > > > On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote: >> >> I'm have the same situation with large tables. Take a look at using a >> cursor to fetch several thousand rows at a time. I presume what's >> happening is that perl is attempting to create a massive list/array in >> memory. If you use a cursor the list should only contain X number of >> rows where X in the number specified at each fetch execution. You'll >> need to define the cursor inside a transaction block. >> >> - begin transaction >> - define the cursor >> - fetch rows from cursor >> - while row count from previous step > 0, execute previous step >> - terminate transaction >> >> Or you could use plpgsql instead of plperl, FOR loops over result sets in >> plpgsql implicitly use cursors... it's just a little less code. >> >> Hope that helps, >> Wayne >> >> On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote: >> > Hi, >> > while reading 20GB table through PL/PERL function , it constantly grows >> > in >> > RAM. >> > I wanted to ask you which is the best way to read table inside that >> > function without such memory consumption. >> > Thanks in advance >> > >> > Code is here: >> > >> > CREATE FUNCTION pattern_counter("patLength" integer) >> > RETURNS varchar AS >> > $BODY$ >> > my $rv = spi_exec_query("select sequence from entry"); >> > my $rowCount = $rv->{processed}; >> > my $patLen = $_[0]; >> > my $patt = ''; >> > my %patterns=(); >> > foreach my $rn (0 .. $rowCount -1){ >> > my $row = $rv->{rows}[$rn]; >> > my $seq = $row->{sequence}; >> > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ >> > $patt=substr($seq,$x,$patLen); >> > if (! defined $patterns{$patt}) { >> > $patterns{$patt}=1; >> > }else{ >> > $patterns{$patt}++; >> > } >> > } >> > } >> > foreach $patt (keys %patterns){ >> > my $sql="insert into patterns >> > values('".$patt."',".$patterns{$patt}.")"; >> > spi_exec_query($sql); >> > } >> > return ''; >> > $BODY$ >> > LANGUAGE plperl VOLATILE >> > COST 100; >> > >> > >> > >> > -- >> > --------------------------------------- >> > Viktor Bojovi?? >> > --------------------------------------- >> > Wherever I go, Murphy goes with me > > > > -- > --------------------------------------- > Viktor Bojović > --------------------------------------- > Wherever I go, Murphy goes with me >
Hi Viktor, I'm not sure what your requirements are in terms of performance and stability of the your result set. See Pavel's response. A cursor issues a single query and renders a single result set. The result set is static, the cursor just gives you finer control/performance when retrieving rows from the set. Using a transaction will also render better performance when %patterns contains a large number of keys/values, insert all of them in one transaction, the same one you opened for the cursor. Your method issues many queries and will take longer for each successive query. And the number of queries will increase as table size increases. It could also return duplicate rows and/or missed rows due to other transactions completing between your select query. If you can tolerate the above issues then so be it, if not you really should look at cursors. Also there might be a bug in your code if you delete entries from 'entry'. Your depending on $rowCountAll to remain static which is not the case if you ever delete entries. You can fix this by skipping the "select count(1)" step and just breaking your loop when less then $windowSize entries are returned from the "select sequence.." query. Wayne On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote: > Thanx Wayne, > at the end i did it that way and it works. > The code is below. > > CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character > varying > LANGUAGE plperl > AS $_X$ > my $rvCnt = spi_exec_query("select count(1) as cnt from entry"); > #my $rowCountAll = $rvCnt->{processed}; > my $row = $rvCnt->{rows}[0]; > my $rowCountAll = $row->{cnt}; > my $windowSize = 500000; > my %patterns=(); > for (my $p=0;$p<$rowCountAll;$p+=$windowSize){ > my $sql="select sequence from entry limit $windowSize offset $p"; > > my $rv = spi_exec_query($sql); > my $rowCount = $rv->{processed}; > my $patLen = $_[0]; > my $patt = ''; > > foreach my $rn (0 .. $rowCount -1){ > my $row = $rv->{rows}[$rn]; > my $seq = $row->{sequence}; > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ > $patt=substr($seq,$x,$patLen); > if (! defined $patterns{$patt}) { > $patterns{$patt}=1; > }else{ > $patterns{$patt}++; > } > } > } > } > > foreach $patt (keys %patterns){ > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")"; > spi_exec_query($sql); > } > return $tmp; > $_X$; > > > On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote: > > > I'm have the same situation with large tables. Take a look at using a > > cursor to fetch several thousand rows at a time. I presume what's > > happening is that perl is attempting to create a massive list/array in > > memory. If you use a cursor the list should only contain X number of > > rows where X in the number specified at each fetch execution. You'll > > need to define the cursor inside a transaction block. > > > > - begin transaction > > - define the cursor > > - fetch rows from cursor > > - while row count from previous step > 0, execute previous step > > - terminate transaction > > > > Or you could use plpgsql instead of plperl, FOR loops over result sets in > > plpgsql implicitly use cursors... it's just a little less code. > > > > Hope that helps, > > Wayne > > > > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote: > > > Hi, > > > while reading 20GB table through PL/PERL function , it constantly grows > > in > > > RAM. > > > I wanted to ask you which is the best way to read table inside that > > > function without such memory consumption. > > > Thanks in advance > > > > > > Code is here: > > > > > > CREATE FUNCTION pattern_counter("patLength" integer) > > > RETURNS varchar AS > > > $BODY$ > > > my $rv = spi_exec_query("select sequence from entry"); > > > my $rowCount = $rv->{processed}; > > > my $patLen = $_[0]; > > > my $patt = ''; > > > my %patterns=(); > > > foreach my $rn (0 .. $rowCount -1){ > > > my $row = $rv->{rows}[$rn]; > > > my $seq = $row->{sequence}; > > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ > > > $patt=substr($seq,$x,$patLen); > > > if (! defined $patterns{$patt}) { > > > $patterns{$patt}=1; > > > }else{ > > > $patterns{$patt}++; > > > } > > > } > > > } > > > foreach $patt (keys %patterns){ > > > my $sql="insert into patterns > > values('".$patt."',".$patterns{$patt}.")"; > > > spi_exec_query($sql); > > > } > > > return ''; > > > $BODY$ > > > LANGUAGE plperl VOLATILE > > > COST 100; > > > > > > > > > > > > -- > > > --------------------------------------- > > > Viktor Bojovi?? > > > --------------------------------------- > > > Wherever I go, Murphy goes with me > > > > > > -- > --------------------------------------- > Viktor Bojovi?? > --------------------------------------- > Wherever I go, Murphy goes with me
Thanx Wayne and Pavel,
i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.
I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors.
Any of these methods links/examples would be helpful to me.
Thanx in advance
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.
I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors.
Any of these methods links/examples would be helpful to me.
Thanx in advance
On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql@useunix.net> wrote:
Hi Viktor,
I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.
Your method issues many queries and will take longer for each successive
query. And the number of queries will increase as table size increases.
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.
If you can tolerate the above issues then so be it, if not you really
should look at cursors.
Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.
Wayne
On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
>
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
>
> > I'm have the same situation with large tables. Take a look at using a
> > cursor to fetch several thousand rows at a time. I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory. If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution. You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE FUNCTION pattern_counter("patLength" integer)
> > > RETURNS varchar AS
> > > $BODY$
> > > my $rv = spi_exec_query("select sequence from entry");
> > > my $rowCount = $rv->{processed};
> > > my $patLen = $_[0];
> > > my $patt = '';
> > > my %patterns=();
> > > foreach my $rn (0 .. $rowCount -1){
> > > my $row = $rv->{rows}[$rn];
> > > my $seq = $row->{sequence};
> > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > > $patt=substr($seq,$x,$patLen);
> > > if (! defined $patterns{$patt}) {
> > > $patterns{$patt}=1;
> > > }else{
> > > $patterns{$patt}++;
> > > }
> > > }
> > > }
> > > foreach $patt (keys %patterns){
> > > my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > > spi_exec_query($sql);
> > > }
> > > return '';
> > > $BODY$
> > > LANGUAGE plperl VOLATILE
> > > COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
I have found cursors example in plperl. now it works fine. the code is below.
CREATE OR REPLACE FUNCTION uniprot_frekvencije.pattern_counter2("patLength" integer)
RETURNS character varying AS
$BODY$
my $sth = spi_query("select sequence from entry");
my $patLen = $_[0];
my $patt = '';
my $row;
my %patterns=();
while (defined ($row = spi_fetchrow($sth))) {
my $seq = $row->{sequence};
for (my $x = 0;$x<=length($seq) - $patLen;$x++){
$patt=substr($seq,$x,$patLen);
if (! defined $patterns{$patt}) {
$patterns{$patt}=1;
}else{
$patterns{$patt}++;
}
}
}
foreach $patt (keys %patterns){
my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
spi_exec_query($sql);
}
return '';
$BODY$
LANGUAGE plperl VOLATILE
COST 100;
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
CREATE OR REPLACE FUNCTION uniprot_frekvencije.pattern_counter2("patLength" integer)
RETURNS character varying AS
$BODY$
my $sth = spi_query("select sequence from entry");
my $patLen = $_[0];
my $patt = '';
my $row;
my %patterns=();
while (defined ($row = spi_fetchrow($sth))) {
my $seq = $row->{sequence};
for (my $x = 0;$x<=length($seq) - $patLen;$x++){
$patt=substr($seq,$x,$patLen);
if (! defined $patterns{$patt}) {
$patterns{$patt}=1;
}else{
$patterns{$patt}++;
}
}
}
foreach $patt (keys %patterns){
my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
spi_exec_query($sql);
}
return '';
$BODY$
LANGUAGE plperl VOLATILE
COST 100;
On Mon, Jul 11, 2011 at 4:42 PM, Viktor Bojović <viktor.bojovic@gmail.com> wrote:
Thanx Wayne and Pavel,
i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.
I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors.
Any of these methods links/examples would be helpful to me.
Thanx in advance--On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql@useunix.net> wrote:Hi Viktor,
I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.
Your method issues many queries and will take longer for each successive
query. And the number of queries will increase as table size increases.
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.
If you can tolerate the above issues then so be it, if not you really
should look at cursors.
Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.
Wayne
On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
>
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
>
> > I'm have the same situation with large tables. Take a look at using a
> > cursor to fetch several thousand rows at a time. I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory. If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution. You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE FUNCTION pattern_counter("patLength" integer)
> > > RETURNS varchar AS
> > > $BODY$
> > > my $rv = spi_exec_query("select sequence from entry");
> > > my $rowCount = $rv->{processed};
> > > my $patLen = $_[0];
> > > my $patt = '';
> > > my %patterns=();
> > > foreach my $rn (0 .. $rowCount -1){
> > > my $row = $rv->{rows}[$rn];
> > > my $seq = $row->{sequence};
> > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > > $patt=substr($seq,$x,$patLen);
> > > if (! defined $patterns{$patt}) {
> > > $patterns{$patt}=1;
> > > }else{
> > > $patterns{$patt}++;
> > > }
> > > }
> > > }
> > > foreach $patt (keys %patterns){
> > > my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > > spi_exec_query($sql);
> > > }
> > > return '';
> > > $BODY$
> > > LANGUAGE plperl VOLATILE
> > > COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me