Re: [GENERAL] postgres pg_bulkload c filter function in c programming - Mailing list pgsql-general

From Francisco Olarte
Subject Re: [GENERAL] postgres pg_bulkload c filter function in c programming
Date
Msg-id CA+bJJby3n6pb6MccRVDvMjE0QTOy46Kk+8vFz86vtQSW6hY5+w@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] postgres pg_bulkload c filter function in c programming  (Jan de Visser <jan@de-visser.net>)
Responses Re: [GENERAL] postgres pg_bulkload c filter function in c programming
List pgsql-general
Hi:

On Thu, Dec 29, 2016 at 3:01 PM, Jan de Visser <jan@de-visser.net> wrote:
> On Thursday, December 29, 2016 5:10:08 AM EST rajmhn wrote:
...
>> I'm new to C. Gone through this documentation. Not clear, how to start.
...
> It seems to me it would be much easier to load the data into a temporary
> table, and from there transform into the form you need it to be. If you're not
> experienced in C (and probably even if you are) you can import *a lot* of data
> in the time it would take you to develop that custom filter.
> Besides, your requirements don't appear to be that difficult.

for his type of requirements I would recommend going the perl ( any
similar language will do, but after using it for 25 years I find it is
the king for that ) + pipe route. I mean, write a script which spits
"copy xxx from stdin\n" plus the transformed rows, pipe it trough
psql.

Total time is difficult to beat, as debugging is very easy, write the
filtering function using <> - chomp - split - s/// - join - print and
debug it by feeding it some lines with head.

Untested code from which I remembter of the specs, could be something like:

print "Copy my_table from stdin;\n";
my $e = "The magic col-e default value";
while(<>) {
  chomp; # get rid of OPTIONAl line terminator
  # Get the columns and do the c-d column swap
  my ($a,$b,$d,$c)=split /\|/, $_;
  # Add dashed to the date in column b:
  $b =~ s/^(\d\d\d\d)(\d\d)(\d\d)$/$1-$2-$3/;
  # zap not numerics a to 0:
  ($a=~/^\d+$/) or ($a = 0);
  # And send it with the default separators ( scaping left as an
exercise to the reader, ask if needed, I've wrtten and tested it
several times ).
  print join("\t", $a,$b,$c,$d,$e),"\n";
}

Test it with "head the-datafile | perl the_script.pl |
my_favourite_pager" until correct, the beauty of this approache  is
you do not touch the db in debug, feed it to psql when done. In my
experience the perl script overhead is unnoticeable in any 2k+ machine
(and perl was specifically dessigned to be good at this kind of things
).

Francisco Olarte.


pgsql-general by date:

Previous
From: Gerhard Wiesinger
Date:
Subject: Re: [GENERAL] vacuumdb --analyze-only scans all pages?
Next
From: rajmhn
Date:
Subject: Re: [GENERAL] postgres pg_bulkload c filter function in c programming