Thread: WIP Patch: Selective binary conversion of CSV file foreign tables

WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
I would like to propose to improve parsing efficiency of contrib/file_fdw by
selective parsing proposed by Alagiannis et al.[1], which means that for a
CSV/TEXT file foreign table, file_fdw performs binary conversion only for
the columns needed for query processing.  Attached is a WIP patch
implementing the feature.

I evaluated the efficiency of the patch using SELECT count(*) on a CSV file
foreign table of 5,000,000 records, which had the same definition as the
pgbench history table.  The following run is done on a single core of a
3.00GHz Intel Xeon CPU with 8GB of RAM.  Configuration settings are all
default.

w/o the patch: 7255.898 ms
w/  the patch: 3363.297 ms

On reflection of [2], I think it would be better to disable this feature
when the validation option is set to 'true'; file_fdw converts all columns
to binary representation.  So, it verifies that each tuple meets all column
data types as well as all kinds of constraints.

I appreciate your comments.

Best regards,
Etsuro Fujita

[1] http://homepages.cwi.nl/~idreos/NoDBsigmod2012.pdf
[2] https://commitfest.postgresql.org/action/patch_view?id=822


Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
David Fetter
Date:
On Tue, May 08, 2012 at 08:26:02PM +0900, Etsuro Fujita wrote:
> I would like to propose to improve parsing efficiency of contrib/file_fdw by
> selective parsing proposed by Alagiannis et al.[1],

Is the patch they used against 9.0 published somewhere?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
> -----Original Message-----
> From: David Fetter [mailto:david@fetter.org]
> Sent: Wednesday, May 09, 2012 9:25 AM
> To: Etsuro Fujita
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
> 
> On Tue, May 08, 2012 at 08:26:02PM +0900, Etsuro Fujita wrote:
> > I would like to propose to improve parsing efficiency of
> > contrib/file_fdw by selective parsing proposed by Alagiannis et
> > al.[1],
> 
> Is the patch they used against 9.0 published somewhere?

It's not.  The patch's been created by myself.  I don't know their patch.

Best regards,
Etsuro Fujita

> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
> 
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
Robert Haas
Date:
On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I would like to propose to improve parsing efficiency of contrib/file_fdw by
> selective parsing proposed by Alagiannis et al.[1], which means that for a
> CSV/TEXT file foreign table, file_fdw performs binary conversion only for
> the columns needed for query processing.  Attached is a WIP patch
> implementing the feature.

Can you add this to the next CommitFest?  Looks interesting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Friday, May 11, 2012 1:36 AM
> To: Etsuro Fujita
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
>
> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>
> wrote:
> > I would like to propose to improve parsing efficiency of
> > contrib/file_fdw by selective parsing proposed by Alagiannis et
> > al.[1], which means that for a CSV/TEXT file foreign table, file_fdw
> > performs binary conversion only for the columns needed for query
> > processing.  Attached is a WIP patch implementing the feature.
>
> Can you add this to the next CommitFest?  Looks interesting.

Done.

Best regards,
Etsuro Fujita

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company
>




Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
Kohei KaiGai
Date:
Hi Fujita-san,

Could you rebase this patch towards the latest tree?
It was unavailable to apply the patch cleanly.

I looked over the patch, then noticed a few points.

At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
If so, cstate->convert_binary is not a suitable flag to check
redundant option. It seems to me cstate->convert_selectively
is more suitable flag to check it.

+       else if (strcmp(defel->defname, "convert_binary") == 0)
+       {
+           if (cstate->convert_binary)
+               ereport(ERROR,
+                       (errcode(ERRCODE_SYNTAX_ERROR),
+                        errmsg("conflicting or redundant options")));
+           cstate->convert_selectively = true;
+           if (defel->arg == NULL || IsA(defel->arg, List))
+               cstate->convert_binary = (List *) defel->arg;
+           else
+               ereport(ERROR,
+                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                        errmsg("argument to option \"%s\" must be a
list of column names",
+                               defel->defname)));
+       }


At NextCopyFrom(), this routine computes default values if configured.
In case when these values are not referenced, it might be possible to
skip unnecessary calculations.
Is it unavailable to add logic to avoid to construct cstate->defmap on
unreferenced columns at  BeginCopyFrom()?

Thanks,

2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> -----Original Message-----
>> From: Robert Haas [mailto:robertmhaas@gmail.com]
>> Sent: Friday, May 11, 2012 1:36 AM
>> To: Etsuro Fujita
>> Cc: pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>
>> wrote:
>> > I would like to propose to improve parsing efficiency of
>> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> > al.[1], which means that for a CSV/TEXT file foreign table, file_fdw
>> > performs binary conversion only for the columns needed for query
>> > processing.  Attached is a WIP patch implementing the feature.
>>
>> Can you add this to the next CommitFest?  Looks interesting.
>
> Done.
>
> Best regards,
> Etsuro Fujita
>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
>> Company
>>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
Hi KaiGai-san,

Thank you for the review.

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
> Sent: Wednesday, June 20, 2012 1:26 AM
> To: Etsuro Fujita
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
>
> Hi Fujita-san,
>
> Could you rebase this patch towards the latest tree?
> It was unavailable to apply the patch cleanly.

Sorry, I updated the patch.  Please find attached an updated version of the
patch.

> I looked over the patch, then noticed a few points.
>
> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
> If so, cstate->convert_binary is not a suitable flag to check redundant
> option. It seems to me cstate->convert_selectively is more suitable flag
> to check it.
>
> +       else if (strcmp(defel->defname, "convert_binary") == 0)
> +       {
> +           if (cstate->convert_binary)
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_SYNTAX_ERROR),
> +                        errmsg("conflicting or redundant options")));
> +           cstate->convert_selectively = true;
> +           if (defel->arg == NULL || IsA(defel->arg, List))
> +               cstate->convert_binary = (List *) defel->arg;
> +           else
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                        errmsg("argument to option \"%s\" must be a
> list of column names",
> +                               defel->defname)));
> +       }

Yes, defel->arg can be NIL.  defel->arg is a List structure listing all the
columns needed to be converted to binary representation, which is NIL for
the case where no columns are needed to be converted.  For example,
defel->arg is NIL for SELECT COUNT(*).  In this case, while
cstate->convert_selectively is set to true, no columns are converted at
NextCopyFrom().  Most efficient case!  In short, cstate->convert_selectively
represents whether to do selective binary conversion at NextCopyFrom(), and
cstate->convert_binary represents all the columns to be converted at
NextCopyFrom(), that can be NIL.

> At NextCopyFrom(), this routine computes default values if configured.
> In case when these values are not referenced, it might be possible to skip
> unnecessary calculations.
> Is it unavailable to add logic to avoid to construct cstate->defmap on
> unreferenced columns at  BeginCopyFrom()?

I think that we don't need to add the above logic because file_fdw does
BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
doesn't construct cstate->defmap at all.

I fixed a bug plus some minor optimization in check_binary_conversion() that
is renamed to check_selective_binary_conversion() in the updated version,
and now file_fdw gives up selective binary conversion for the following
cases:
 a) BINARY format b) CSV/TEXT format and whole row reference c) CSV/TEXT format and all the user attributes needed


Best regards,
Etsuro Fujita

> Thanks,
>
> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> >> -----Original Message-----
> >> From: Robert Haas [mailto:robertmhaas@gmail.com]
> >> Sent: Friday, May 11, 2012 1:36 AM
> >> To: Etsuro Fujita
> >> Cc: pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
> >> file foreign tables
> >>
> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
> > <fujita.etsuro@lab.ntt.co.jp>
> >> wrote:
> >> > I would like to propose to improve parsing efficiency of
> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
> >> > al.[1], which means that for a CSV/TEXT file foreign table,
> >> > file_fdw performs binary conversion only for the columns needed for
> >> > query processing. �Attached is a WIP patch implementing the feature.
> >>
> >> Can you add this to the next CommitFest? �Looks interesting.
> >
> > Done.
> >
> > Best regards,
> > Etsuro Fujita
> >
> >> --
> >> Robert Haas
> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> >> Company
> >>
> >
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
> > make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



begin 666 file_fdw_sel_bin_conv_v2.patch
M9&EF9B M+6=I="!A+V-O;G1R:6(O9FEL95]F9'<O9FEL95]F9'<N8R!B+V-O
M;G1R:6(O9FEL95]F9'<O9FEL95]F9'<N8PII;F1E>"!E,V(Y,C(S+BYF9#DT
M,3)F(#$P,#8T- HM+2T@82]C;VYT<FEB+V9I;&5?9F1W+V9I;&5?9F1W+F,*
M*RLK(&(O8V]N=')I8B]F:6QE7V9D=R]F:6QE7V9D=RYC"D! ("TQ-BPV("LQ
M-BPW($! "B C:6YC;'5D92 \=6YI<W1D+F@^"B *("-I;F-L=61E(")A8V-E
M<W,O<F5L;W!T:6]N<RYH(@HK(VEN8VQU9&4@(F%C8V5S<R]S>7-A='1R+F@B
M"B C:6YC;'5D92 B8V%T86QO9R]P9U]F;W)E:6=N7W1A8FQE+F@B"B C:6YC
M;'5D92 B8V]M;6%N9',O8V]P>2YH(@H@(VEN8VQU9&4@(F-O;6UA;F1S+V1E
M9G)E;2YH(@I 0" M,CDL-B K,S L-R! 0 H@(VEN8VQU9&4@(F]P=&EM:7IE
M<B]P871H;F]D92YH(@H@(VEN8VQU9&4@(F]P=&EM:7IE<B]P;&%N;6%I;BYH
M(@H@(VEN8VQU9&4@(F]P=&EM:7IE<B]R97-T<FEC=&EN9F\N:"(**R-I;F-L
M=61E(")O<'1I;6EZ97(O=F%R+F@B"B C:6YC;'5D92 B=71I;',O;65M=71I
M;',N:"(*("-I;F-L=61E(")U=&EL<R]R96PN:"(*( I 0" M,3,V+#8@*S$S
M."PY($! ('-T871I8R!B;V]L(&ES7W9A;&ED7V]P=&EO;BAC;VYS="!C:&%R
M("IO<'1I;VXL($]I9"!C;VYT97AT*3L*('-T871I8R!V;VED(&9I;&5'971/
M<'1I;VYS*$]I9"!F;W)E:6=N=&%B;&5I9"P*( D)"2 @(&-H87(@*BIF:6QE
M;F%M92P@3&ES=" J*F]T:&5R7V]P=&EO;G,I.PH@<W1A=&EC($QI<W0@*F=E
M=%]F:6QE7V9D=U]A='1R:6)U=&5?;W!T:6]N<RA/:60@<F5L:60I.PHK<W1A
M=&EC(&)O;VP@8VAE8VM?<V5L96-T:79E7V)I;F%R>5]C;VYV97)S:6]N*%)E
M;$]P=$EN9F\@*F)A<V5R96PL"BL)"0D)"0D)"0D)"2 @3VED(&9O<F5I9VYT
M86)L96ED+ HK"0D)"0D)"0D)"0D@($QI<W0@*BIC;VQU;6YS*3L*('-T871I
M8R!V;VED(&5S=&EM871E7W-I>F4H4&QA;FYE<DEN9F\@*G)O;W0L(%)E;$]P
M=$EN9F\@*F)A<V5R96PL"B )"0D@($9I;&5&9'=0;&%N4W1A=&4@*F9D=U]P
M<FEV871E*3L*('-T871I8R!V;VED(&5S=&EM871E7V-O<W1S*%!L86YN97))
M;F9O("IR;V]T+"!296Q/<'1);F9O("IB87-E<F5L+ I 0" M-#4W+#8@*S0V
M,BPQ-B! 0"!F:6QE1V5T1F]R96EG;E!A=&AS*%!L86YN97));F9O("IR;V]T
M+ H@"49I;&5&9'=0;&%N4W1A=&4@*F9D=U]P<FEV871E(#T@*$9I;&5&9'=0
M;&%N4W1A=&4@*BD@8F%S97)E;"T^9F1W7W!R:79A=&4["B )0V]S= D)<W1A
M<G1U<%]C;W-T.PH@"4-O<W0)"71O=&%L7V-O<W0["BL)8F]O; D)<F5S=6QT
M.PHK"4QI<W0)(" @*F-O;'5M;G,["BL)3&ES= D@(" J8V]P=&EO;B ]($Y)
M3#L**PHK"2\J($1E8VED92!W:&5T:&5R('1O('-E;&5C=&EV96QY('!E<F9O
M<FT@8FEN87)Y(&-O;G9E<G-I;VX@*B\**PER97-U;'0@/2!C:&5C:U]S96QE
M8W1I=F5?8FEN87)Y7V-O;G9E<G-I;VXH8F%S97)E;"P**PD)"0D)"0D)"0D)
M(" @9F]R96EG;G1A8FQE:60L"BL)"0D)"0D)"0D)"2 @("9C;VQU;6YS*3L*
M*PEI9B H<F5S=6QT*0HK"0EC;W!T:6]N(#T@;&ES=%]M86ME,2AM86ME1&5F
M16QE;2@B8V]N=F5R=%]B:6YA<GDB+" H3F]D92 J*2!C;VQU;6YS*2D["B *
M( DO*B!%<W1I;6%T92!C;W-T<R J+PH@"65S=&EM871E7V-O<W1S*')O;W0L
M(&)A<V5R96PL(&9D=U]P<FEV871E+ I 0" M-#<P+#<@*S0X-2PW($! (&9I
M;&5'971&;W)E:6=N4&%T:',H4&QA;FYE<DEN9F\@*G)O;W0L"B )"0D)"0D)
M"0D@=&]T86Q?8V]S="P*( D)"0D)"0D)"2!.24PL"0DO*B!N;R!P871H:V5Y
M<R J+PH@"0D)"0D)"0D)($Y53$PL"0DO*B!N;R!O=71E<B!R96P@96ET:&5R
M("HO"BT)"0D)"0D)"0D@3DE,*2D["0DO*B!N;R!F9'=?<')I=F%T92!D871A
M("HO"BL)"0D)"0D)"0D@8V]P=&EO;BDI.PH@"B )+RH*( D@*B!)9B!D871A
M(&9I;&4@=V%S('-O<G1E9"P@86YD('=E(&MN97<@:70@<V]M96AO=RP@=V4@
M8V]U;&0@:6YS97)T"D! ("TU,#<L-R K-3(R+#<@0$ @9FEL94=E=$9O<F5I
M9VY0;&%N*%!L86YN97));F9O("IR;V]T+ H@"0D)"0D)"7-C86Y?8VQA=7-E
M<RP*( D)"0D)"0ES8V%N7W)E;&ED+ H@"0D)"0D)"4Y)3"P)+RH@;F\@97AP
M<F5S<VEO;G,@=&\@979A;'5A=&4@*B\*+0D)"0D)"0E.24PI.PD)+RH@;F\@
M<')I=F%T92!S=&%T92!E:71H97(@*B\**PD)"0D)"0EB97-T7W!A=&@M/F9D
M=U]P<FEV871E*3L*('T*( H@+RH*0$ @+34T-"PV("LU-3DL-R! 0"!F:6QE
M17AP;&%I;D9O<F5I9VY38V%N*$9O<F5I9VY38V%N4W1A=&4@*FYO9&4L($5X
M<&QA:6Y3=&%T92 J97,I"B!S=&%T:6,@=F]I9 H@9FEL94)E9VEN1F]R96EG
M;E-C86XH1F]R96EG;E-C86Y3=&%T92 J;F]D92P@:6YT(&5F;&%G<RD*('L*
M*PE&;W)E:6=N4V-A;B J<&QA;B ]("A&;W)E:6=N4V-A;B J*2!N;V1E+3YS
M<RYP<RYP;&%N.PH@"6-H87()(" @*F9I;&5N86UE.PH@"4QI<W0)(" @*F]P
M=&EO;G,["B )0V]P>5-T871E"6-S=&%T93L*0$ @+34U.2PV("LU-S4L,3 @
M0$ @9FEL94)E9VEN1F]R96EG;E-C86XH1F]R96EG;E-C86Y3=&%T92 J;F]D
M92P@:6YT(&5F;&%G<RD*( EF:6QE1V5T3W!T:6]N<RA296QA=&EO;D=E=%)E
M;&ED*&YO9&4M/G-S+G-S7V-U<G)E;G1296QA=&EO;BDL"B )"0D)(" @)F9I
M;&5N86UE+" F;W!T:6]N<RD["B **PDO*B!!9&0@86X@;W!T:6]N(&9O<B!S
M96QE8W1I=F4@8FEN87)Y(&-O;G9E<G-I;VX@*B\**PEI9BAP;&%N+3YF9'=?
M<')I=F%T92 A/2!.24PI"BL)"6]P=&EO;G,@/2!L:7-T7V-O;F-A="AO<'1I
M;VYS+"!P;&%N+3YF9'=?<')I=F%T92D["BL*( DO*@H@"2 J($-R96%T92!#
M;W!Y4W1A=&4@9G)O;2!&1%<@;W!T:6]N<RX@(%=E(&%L=V%Y<R!A8W%U:7)E
M(&%L;"!C;VQU;6YS+"!S;PH@"2 J(&%S('1O(&UA=&-H('1H92!E>'!E8W1E
M9"!38V%N5'5P;&53;&]T('-I9VYA='5R92X*0$ @+38Y-2PV("LW,34L,3$R
M($! (&9I;&5!;F%L>7IE1F]R96EG;E1A8FQE*%)E;&%T:6]N(')E;&%T:6]N
M+ H@?0H@"B O*@HK("H@8VAE8VM?<V5L96-T:79E7V)I;F%R>5]C;VYV97)S
M:6]N"BL@*B\**W-T871I8R!B;V]L"BMC:&5C:U]S96QE8W1I=F5?8FEN87)Y
M7V-O;G9E<G-I;VXH4F5L3W!T26YF;R J8F%S97)E;"P**PD)"0D)"0D)("!/
M:60@9F]R96EG;G1A8FQE:60L"BL)"0D)"0D)"2 @3&ES=" J*F-O;'5M;G,I
M"BM["BL)1F]R96EG;E1A8FQE("IT86)L93L**PE,:7-T0V5L;" @("IL8SL*
M*PE296QA=&EO;@ER96P["BL)5'5P;&5$97-C"71U<&QE1&5S8SL**PE!='1R
M3G5M8F5R"6%T=&YU;3L**PE":71M87!S970@("IA='1R<U]U<V5D(#T@3E5,
M3#L**PE":71M87!S970@("IT;7!S970["BL):6YT"0D)8VYT.PHK"6EN= D)
M"6D["BL**PDJ8V]L=6UN<R ]($Y)3#L**PHK"2\J"BL)("H@17AA;6EN92!F
M;W)M870@;V8@=&AE(&9I;&4N("!)9B!B:6YA<GD@9F]R;6%T+"!W92!D;VXG
M="!N965D('1O(&-O;G9E<G0**PD@*B!A="!A;&PN"BL)("HO"BL)=&%B;&4@
M/2!'971&;W)E:6=N5&%B;&4H9F]R96EG;G1A8FQE:60I.PHK"69O<F5A8V@H
M;&,L('1A8FQE+3YO<'1I;VYS*0HK"7L**PD)1&5F16QE;2 @(" J9&5F(#T@
M*$1E9D5L96T@*BD@;&9I<G-T*&QC*3L**PHK"0EI9B H<W1R8VUP*&1E9BT^
M9&5F;F%M92P@(F9O<FUA="(I(#T](# I"BL)"7L**PD)"6-H87()(" @*F9O
M<FUA=" ](&1E9D=E=%-T<FEN9RAD968I.PHK"BL)"0EI9B H<W1R8VUP*&9O
M<FUA="P@(F)I;F%R>2(I(#T](# I"BL)"0D)<F5T=7)N(&9A;'-E.PHK"0D)
M8G)E86L["BL)"7T**PE]"BL**PDO*B!!9&0@86QL('1H92!A='1R:6)U=&5S
M(&YE961E9"!F;W(@:F]I;G,@;W(@9FEN86P@;W5T<'5T+B J+PHK"7!U;&Q?
M=F%R871T;F]S*"A.;V1E("HI(&)A<V5R96PM/G)E;'1A<F=E=&QI<W0L(&)A
M<V5R96PM/G)E;&ED+" F871T<G-?=7-E9"D["BL**PDO*B!!9&0@86QL('1H
M92!A='1R:6)U=&5S('5S960@8GD@<F5S=')I8W1I;VX@8VQA=7-E<RX@*B\*
M*PEF;W)E86-H*&QC+"!B87-E<F5L+3YB87-E<F5S=')I8W1I;F9O*0HK"7L*
M*PD)4F5S=')I8W1);F9O(" @*G)I;F9O(#T@*%)E<W1R:6-T26YF;R J*2!L
M9FER<W0H;&,I.PHK"BL)"7!U;&Q?=F%R871T;F]S*"A.;V1E("HI(')I;F9O
M+3YC;&%U<V4L(&)A<V5R96PM/G)E;&ED+" F871T<G-?=7-E9"D["BL)?0HK
M"BL)<F5L(#T@:&5A<%]O<&5N*&9O<F5I9VYT86)L96ED+"!!8V-E<W-3:&%R
M94QO8VLI.PHK"71U<&QE1&5S8R ](%)E;&%T:6]N1V5T1&5S8W(H<F5L*3L*
M*PHK"71M<'-E=" ](&)M<U]C;W!Y*&%T=')S7W5S960I.PHK"7=H:6QE("@H
M871T;G5M(#T@8FUS7V9I<G-T7VUE;6)E<BAT;7!S970I*2 ^/2 P*0HK"7L*
M*PD)+RH@061J=7-T(&9O<B!S>7-T96T@871T<FEB=71E<RX@*B\**PD)871T
M;G5M("L]($9I<G-T3&]W26YV86QI9$AE87!!='1R:6)U=&5.=6UB97(["BL*
M*PD)+RH@268@=VAO;&4M<F]W(')E9F5R96YC92P@9VEV92!U<"X@*B\**PD)
M:68@*&%T=&YU;2 ]/2 P*0HK"0E["BL)"0DJ8V]L=6UN<R ]($Y)3#L**PD)
M"7)E='5R;B!F86QS93L**PD)?0HK"BL)"2\J($EG;F]R92!S>7-T96T@871T
M<FEB=71E<RX@*B\**PD):68@*&%T=&YU;2 \(# I"BL)"0EC;VYT:6YU93L*
M*PHK"0DO*B!'970@=7-E<B!A='1R:6)U=&5S+B J+PHK"0EI9B H871T;G5M
M(#X@,"D**PD)>PHK"0D)1F]R;5]P9U]A='1R:6)U=&4@871T<B ]('1U<&QE
M1&5S8RT^871T<G-;871T;G5M("T@,5T["BL)"0EC:&%R"2 @("IA='1N86UE
M(#T@<'-T<F1U<"A.86UE4W1R*&%T='(M/F%T=&YA;64I*3L**PHK"0D)+RH@
M4VMI<"!D<F]P<&5D(&%T=')I8G5T97,N("HO"BL)"0EI9B H871T<BT^871T
M:7-D<F]P<&5D*0HK"0D)"6-O;G1I;G5E.PHK"0D)*F-O;'5M;G,@/2!L87!P
M96YD*"IC;VQU;6YS+"!M86ME4W1R:6YG*&%T=&YA;64I*3L**PD)?0HK"7T*
M*PEB;7-?9G)E92AT;7!S970I.PHK"BL):&5A<%]C;&]S92AR96PL($%C8V5S
M<U-H87)E3&]C:RD["BL**PDO*B!)9B!A;&P@=&AE('5S97(@871T<FEB=71E
M<R!N965D960L(&=I=F4@=7 N("HO"BL)8VYT(#T@,#L**PEF;W(@*&D@/2 P
M.R!I(#P@='5P;&5$97-C+3YN871T<SL@:2LK*0HK"7L**PD)1F]R;5]P9U]A
M='1R:6)U=&4@871T<B ]('1U<&QE1&5S8RT^871T<G-;:5T["BL**PD)+RH@
M4VMI<"!D<F]P<&5D(&%T=')I8G5T97,N("HO"BL)"6EF("AA='1R+3YA='1I
M<V1R;W!P960I"BL)"0EC;VYT:6YU93L**PD)8VYT*RL["BL)?0HK"6EF("AC
M;G0@/3T@;&ES=%]L96YG=&@H*F-O;'5M;G,I*0HK"7L**PD)*F-O;'5M;G,@
M/2!.24P["BL)"7)E='5R;B!F86QS93L**PE]"BL**PER971U<FX@=')U93L*
M*WT**PHK+RH*(" J($5S=&EM871E('-I>F4@;V8@82!F;W)E:6=N('1A8FQE
M+@H@("H*(" J(%1H92!M86EN(')E<W5L="!I<R!R971U<FYE9"!I;B!B87-E
M<F5L+3YR;W=S+B @5V4@86QS;R!S970*9&EF9B M+6=I="!A+W-R8R]B86-K
M96YD+V-O;6UA;F1S+V-O<'DN8R!B+W-R8R]B86-K96YD+V-O;6UA;F1S+V-O
M<'DN8PII;F1E>" Y.&)C8C)F+BXP,30S9#@Q(#$P,#8T- HM+2T@82]S<F,O
M8F%C:V5N9"]C;VUM86YD<R]C;W!Y+F,**RLK(&(O<W)C+V)A8VME;F0O8V]M
M;6%N9',O8V]P>2YC"D! ("TQ,C(L-B K,3(R+#$Q($! ('1Y<&5D968@<W1R
M=6-T($-O<'E3=&%T941A=&$*( E,:7-T"2 @("IF;W)C95]N;W1N=6QL.PDO
M*B!L:7-T(&]F(&-O;'5M;B!N86UE<R J+PH@"6)O;VP)(" @*F9O<F-E7VYO
M=&YU;&Q?9FQA9W,["2\J('!E<BUC;VQU;6X@0U-6($9.3B!F;&%G<R J+PH@
M"BL)+RH@<&%R86UE=&5R<R!F<F]M(&-O;G1R:6(O9FEL95]F9'<@*B\**PE,
M:7-T"2 @("IC;VYV97)T7V)I;F%R>3L)+RH@;&ES="!O9B!C;VQU;6X@;F%M
M97,@*B\**PEB;V]L"2 @("IC;VYV97)T7V)I;F%R>5]F;&%G<SL)+RH@<&5R
M+6-O;'5M;B!#4U8O5$585"!#0B!F;&%G<R J+PHK"6)O;VP)"6-O;G9E<G1?
M<V5L96-T:79E;'D["2\J('-E;&5C=&EV92!B:6YA<GD@8V]N=F5R<VEO;C\@
M*B\**PH@"2\J('1H97-E(&%R92!J=7-T(&9O<B!E<G)O<B!M97-S86=E<RP@
M<V5E($-O<'E&<F]M17)R;W)#86QL8F%C:R J+PH@"6-O;G-T(&-H87(@*F-U
M<E]R96QN86UE.PDO*B!T86)L92!N86UE(&9O<B!E<G)O<B!M97-S86=E<R J
M+PH@"6EN= D)"6-U<E]L:6YE;F\["0DO*B!L:6YE(&YU;6)E<B!F;W(@97)R
M;W(@;65S<V%G97,@*B\*0$ @+3DV,2PV("LY-C8L,C$@0$ @4')O8V5S<T-O
M<'E/<'1I;VYS*$-O<'E3=&%T92!C<W1A=&4L"B )"0D)"0D@97)R;7-G*")A
M<F=U;65N="!T;R!O<'1I;VX@7"(E<UPB(&UU<W0@8F4@82!L:7-T(&]F(&-O
M;'5M;B!N86UE<R(L"B )"0D)"0D)"61E9F5L+3YD969N86UE*2DI.PH@"0E]
M"BL)"65L<V4@:68@*'-T<F-M<"AD969E;"T^9&5F;F%M92P@(F-O;G9E<G1?
M8FEN87)Y(BD@/3T@,"D**PD)>PHK"0D):68@*&-S=&%T92T^8V]N=F5R=%]B
M:6YA<GDI"BL)"0D)97)E<&]R="A%4E)/4BP**PD)"0D)"2AE<G)C;V1E*$52
M4D-/1$5?4UE.5$%87T524D]2*2P**PD)"0D)"2!E<G)M<V<H(F-O;F9L:6-T
M:6YG(&]R(')E9'5N9&%N="!O<'1I;VYS(BDI*3L**PD)"6-S=&%T92T^8V]N
M=F5R=%]S96QE8W1I=F5L>2 ]('1R=64["BL)"0EI9B H9&5F96PM/F%R9R ]
M/2!.54Q,('Q\($ES02AD969E;"T^87)G+"!,:7-T*2D**PD)"0EC<W1A=&4M
M/F-O;G9E<G1?8FEN87)Y(#T@*$QI<W0@*BD@9&5F96PM/F%R9SL**PD)"65L
M<V4**PD)"0EE<F5P;W)T*$524D]2+ HK"0D)"0D)*&5R<F-O9&4H15)20T]$
M15])3E9!3$E$7U!!4D%-151%4E]604Q512DL"BL)"0D)"0D@97)R;7-G*")A
M<F=U;65N="!T;R!O<'1I;VX@7"(E<UPB(&UU<W0@8F4@82!L:7-T(&]F(&-O
M;'5M;B!N86UE<R(L"BL)"0D)"0D)"61E9F5L+3YD969N86UE*2DI.PHK"0E]
M"B )"65L<V4@:68@*'-T<F-M<"AD969E;"T^9&5F;F%M92P@(F5N8V]D:6YG
M(BD@/3T@,"D*( D)>PH@"0D):68@*&-S=&%T92T^9FEL95]E;F-O9&EN9R ^
M/2 P*0I 0" M,3,P-RPV("LQ,S(W+#(X($! ($)E9VEN0V]P>2AB;V]L(&ES
M7V9R;VTL"B )"7T*( E]"B **PDO*B!#;VYV97)T(&-O;G9E<G0@8FEN87)Y
M(&YA;64@;&ES="!T;R!P97(M8V]L=6UN(&9L86=S+"!C:&5C:R!V86QI9&ET
M>2 J+PHK"6-S=&%T92T^8V]N=F5R=%]B:6YA<GE?9FQA9W,@/2 H8F]O;" J
M*2!P86QL;V,P*&YU;5]P:'ES7V%T=')S("H@<VEZ96]F*&)O;VPI*3L**PEI
M9B H8W-T871E+3YC;VYV97)T7V)I;F%R>2D**PE["BL)"4QI<W0)(" @*F%T
M=&YU;7,["BL)"4QI<W1#96QL(" @*F-U<CL**PHK"0EA='1N=6US(#T@0V]P
M>4=E=$%T=&YU;7,H='5P1&5S8RP@8W-T871E+3YR96PL(&-S=&%T92T^8V]N
M=F5R=%]B:6YA<GDI.PHK"BL)"69O<F5A8V@H8W5R+"!A='1N=6US*0HK"0E[
M"BL)"0EI;G0)"0EA='1N=6T@/2!L9FER<W1?:6YT*&-U<BD["BL**PD)"6EF
M("@A;&ES=%]M96UB97)?:6YT*&-S=&%T92T^871T;G5M;&ES="P@871T;G5M
M*2D**PD)"0EE<F5P;W)T*$524D]2+ HK"0D)"0D)*&5R<F-O9&4H15)20T]$
M15])3E9!3$E$7T-/3%5-3E]2149%4D5.0T4I+ HK"0D)"65R<FUS9R@B<V5L
M96-T960@8V]L=6UN(%PB)7-<(B!N;W0@<F5F97)E;F-E9"!B>2!#3U!9(BP*
M*PD)"0D)(" @3F%M95-T<BAT=7!$97-C+3YA='1R<UMA='1N=6T@+2 Q72T^
M871T;F%M92DI*2D["BL)"0EC<W1A=&4M/F-O;G9E<G1?8FEN87)Y7V9L86=S
M6V%T=&YU;2 M(#%=(#T@=')U93L**PD)?0HK"7T**PH@"2\J(%5S92!C;&EE
M;G0@96YC;V1I;F<@=VAE;B!%3D-/1$E.1R!O<'1I;VX@:7,@;F]T('-P96-I
M9FEE9"X@*B\*( EI9B H8W-T871E+3YF:6QE7V5N8V]D:6YG(#P@,"D*( D)
M8W-T871E+3YF:6QE7V5N8V]D:6YG(#T@<&=?9V5T7V-L:65N=%]E;F-O9&EN
M9R@I.PI 0" M,C4V-2PR,R K,C8P-RPR-R! 0"!.97AT0V]P>49R;VTH0V]P
M>5-T871E(&-S=&%T92P@17AP<D-O;G1E>'0@*F5C;VYT97AT+ H@"0D)"0D)
M"0E.86UE4W1R*&%T=');;5TM/F%T=&YA;64I*2DI.PH@"0D)<W1R:6YG(#T@
M9FEE;&1?<W1R:6YG<UMF:65L9&YO*RM=.PH@"BT)"0EI9B H8W-T871E+3YC
M<W9?;6]D92 F)B!S=')I;F<@/3T@3E5,3" F)@HM"0D)"6-S=&%T92T^9F]R
M8V5?;F]T;G5L;%]F;&%G<UMM72D**PD)"6EF("@A8W-T871E+3YC;VYV97)T
M7W-E;&5C=&EV96QY('Q\"BL)"0D)8W-T871E+3YC;VYV97)T7V)I;F%R>5]F
M;&%G<UMM72D*( D)"7L*+0D)"0DO*B!';R!A:&5A9"!A;F0@<F5A9"!T:&4@
M3E5,3"!S=')I;F<@*B\*+0D)"0ES=')I;F<@/2!C<W1A=&4M/FYU;&Q?<')I
M;G0["BT)"0E]"BL)"0D):68@*&-S=&%T92T^8W-V7VUO9&4@)B8@<W1R:6YG
M(#T]($Y53$P@)B8**PD)"0D)8W-T871E+3YF;W)C95]N;W1N=6QL7V9L86=S
M6VU=*0HK"0D)"7L**PD)"0D)+RH@1V\@86AE860@86YD(')E860@=&AE($Y5
M3$P@<W1R:6YG("HO"BL)"0D)"7-T<FEN9R ](&-S=&%T92T^;G5L;%]P<FEN
M=#L**PD)"0E]"B *+0D)"6-S=&%T92T^8W5R7V%T=&YA;64@/2!.86UE4W1R
M*&%T=');;5TM/F%T=&YA;64I.PHM"0D)8W-T871E+3YC=7)?871T=F%L(#T@
M<W1R:6YG.PHM"0D)=F%L=65S6VU=(#T@26YP=71&=6YC=&EO;D-A;&PH)FEN
M7V9U;F-T:6]N<UMM72P*+0D)"0D)"0D)"0D@('-T<FEN9RP*+0D)"0D)"0D)
M"0D@('1Y<&EO<&%R86US6VU=+ HM"0D)"0D)"0D)"2 @871T<EMM72T^871T
M='EP;6]D*3L*+0D)"6EF("AS=')I;F<@(3T@3E5,3"D*+0D)"0EN=6QL<UMM
M72 ](&9A;'-E.PHM"0D)8W-T871E+3YC=7)?871T;F%M92 ]($Y53$P["BT)
M"0EC<W1A=&4M/F-U<E]A='1V86P@/2!.54Q,.PHK"0D)"6-S=&%T92T^8W5R
M7V%T=&YA;64@/2!.86UE4W1R*&%T=');;5TM/F%T=&YA;64I.PHK"0D)"6-S
M=&%T92T^8W5R7V%T='9A;" ]('-T<FEN9SL**PD)"0EV86QU97-;;5T@/2!)
M;G!U=$9U;F-T:6]N0V%L;"@F:6Y?9G5N8W1I;VYS6VU=+ HK"0D)"0D)"0D)
M"0D@('-T<FEN9RP**PD)"0D)"0D)"0D)("!T>7!I;W!A<F%M<UMM72P**PD)
M"0D)"0D)"0D)("!A='1R6VU=+3YA='1T>7!M;V0I.PHK"0D)"6EF("AS=')I
M;F<@(3T@3E5,3"D**PD)"0D);G5L;'-;;5T@/2!F86QS93L**PD)"0EC<W1A
M=&4M/F-U<E]A='1N86UE(#T@3E5,3#L**PD)"0EC<W1A=&4M/F-U<E]A='1V
M86P@/2!.54Q,.PHK"0D)?0H@"0E]"B *( D)07-S97)T*&9I96QD;F\@/3T@
*;F9I96QD<RD["@``
`
end



Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
Kohei KaiGai
Date:
Fujita-san,

The revised patch is almost good for me.
Only point I noticed is the check for redundant or duplicated options
I pointed out on the previous post.
So, if you agree with the attached patch, I'd like to hand over this
patch for committers.

All I changed is:
 --- a/src/backend/commands/copy.c
 +++ b/src/backend/commands/copy.c
 @@ -122,6 +122,11 @@ typedef struct CopyStateData
@@ -217,7 +221,7 @@ index 98bcb2f..0143d81 100644
        }
 +      else if (strcmp(defel->defname, "convert_binary") == 0)
 +      {
-+          if (cstate->convert_binary)
++          if (cstate->convert_selectively)
 +              ereport(ERROR,
 +                      (errcode(ERRCODE_SYNTAX_ERROR),
 +                       errmsg("conflicting or redundant options")));

Thanks,

2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> Hi KaiGai-san,
>
> Thank you for the review.
>
>> -----Original Message-----
>> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
>> Sent: Wednesday, June 20, 2012 1:26 AM
>> To: Etsuro Fujita
>> Cc: Robert Haas; pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> Hi Fujita-san,
>>
>> Could you rebase this patch towards the latest tree?
>> It was unavailable to apply the patch cleanly.
>
> Sorry, I updated the patch.  Please find attached an updated version of the
> patch.
>
>> I looked over the patch, then noticed a few points.
>>
>> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
>> If so, cstate->convert_binary is not a suitable flag to check redundant
>> option. It seems to me cstate->convert_selectively is more suitable flag
>> to check it.
>>
>> +       else if (strcmp(defel->defname, "convert_binary") == 0)
>> +       {
>> +           if (cstate->convert_binary)
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_SYNTAX_ERROR),
>> +                        errmsg("conflicting or redundant options")));
>> +           cstate->convert_selectively = true;
>> +           if (defel->arg == NULL || IsA(defel->arg, List))
>> +               cstate->convert_binary = (List *) defel->arg;
>> +           else
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                        errmsg("argument to option \"%s\" must be a
>> list of column names",
>> +                               defel->defname)));
>> +       }
>
> Yes, defel->arg can be NIL.  defel->arg is a List structure listing all the
> columns needed to be converted to binary representation, which is NIL for
> the case where no columns are needed to be converted.  For example,
> defel->arg is NIL for SELECT COUNT(*).  In this case, while
> cstate->convert_selectively is set to true, no columns are converted at
> NextCopyFrom().  Most efficient case!  In short, cstate->convert_selectively
> represents whether to do selective binary conversion at NextCopyFrom(), and
> cstate->convert_binary represents all the columns to be converted at
> NextCopyFrom(), that can be NIL.
>
>> At NextCopyFrom(), this routine computes default values if configured.
>> In case when these values are not referenced, it might be possible to skip
>> unnecessary calculations.
>> Is it unavailable to add logic to avoid to construct cstate->defmap on
>> unreferenced columns at  BeginCopyFrom()?
>
> I think that we don't need to add the above logic because file_fdw does
> BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
> doesn't construct cstate->defmap at all.
>
> I fixed a bug plus some minor optimization in check_binary_conversion() that
> is renamed to check_selective_binary_conversion() in the updated version,
> and now file_fdw gives up selective binary conversion for the following
> cases:
>
>  a) BINARY format
>  b) CSV/TEXT format and whole row reference
>  c) CSV/TEXT format and all the user attributes needed
>
>
> Best regards,
> Etsuro Fujita
>
>> Thanks,
>>
>> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> >> -----Original Message-----
>> >> From: Robert Haas [mailto:robertmhaas@gmail.com]
>> >> Sent: Friday, May 11, 2012 1:36 AM
>> >> To: Etsuro Fujita
>> >> Cc: pgsql-hackers@postgresql.org
>> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
>> >> file foreign tables
>> >>
>> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
>> > <fujita.etsuro@lab.ntt.co.jp>
>> >> wrote:
>> >> > I would like to propose to improve parsing efficiency of
>> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> >> > al.[1], which means that for a CSV/TEXT file foreign table,
>> >> > file_fdw performs binary conversion only for the columns needed for
>> >> > query processing.  Attached is a WIP patch implementing the feature.
>> >>
>> >> Can you add this to the next CommitFest?  Looks interesting.
>> >
>> > Done.
>> >
>> > Best regards,
>> > Etsuro Fujita
>> >
>> >> --
>> >> Robert Haas
>> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
>> >> Company
>> >>
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
>> > make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>>
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
>> changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>
>
>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
Hi Kaigai-san,

> -----Original Message-----
> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
> Sent: Monday, June 25, 2012 9:49 PM
> To: Etsuro Fujita
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
>
> Fujita-san,
>
> The revised patch is almost good for me.
> Only point I noticed is the check for redundant or duplicated options I
pointed
> out on the previous post.
> So, if you agree with the attached patch, I'd like to hand over this patch for
> committers.

OK I agree with you.  Thanks for the revision!

Besides the revision, I modified check_selective_binary_conversion() to run
heap_close() in the whole-row-reference case.  Attached is an updated version of
the patch.

Thanks.

Best regards,
Etsuro Fujita

> All I changed is:
>  --- a/src/backend/commands/copy.c
>  +++ b/src/backend/commands/copy.c
>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index
> 98bcb2f..0143d81 100644
>         }
>  +      else if (strcmp(defel->defname, "convert_binary") == 0)
>  +      {
> -+          if (cstate->convert_binary)
> ++          if (cstate->convert_selectively)
>  +              ereport(ERROR,
>  +                      (errcode(ERRCODE_SYNTAX_ERROR),
>  +                       errmsg("conflicting or redundant options")));
>
> Thanks,
>
> 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> > Hi KaiGai-san,
> >
> > Thank you for the review.
> >
> >> -----Original Message-----
> >> From: pgsql-hackers-owner@postgresql.org
> >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
> >> Sent: Wednesday, June 20, 2012 1:26 AM
> >> To: Etsuro Fujita
> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
> >> file foreign tables
> >>
> >> Hi Fujita-san,
> >>
> >> Could you rebase this patch towards the latest tree?
> >> It was unavailable to apply the patch cleanly.
> >
> > Sorry, I updated the patch.  Please find attached an updated version
> > of the patch.
> >
> >> I looked over the patch, then noticed a few points.
> >>
> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
> >> If so, cstate->convert_binary is not a suitable flag to check
> >> redundant option. It seems to me cstate->convert_selectively is more
> >> suitable flag to check it.
> >>
> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
> >> +       {
> >> +           if (cstate->convert_binary)
> >> +               ereport(ERROR,
> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
> >> +                        errmsg("conflicting or redundant
> >> + options")));
> >> +           cstate->convert_selectively = true;
> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
> >> +               cstate->convert_binary = (List *) defel->arg;
> >> +           else
> >> +               ereport(ERROR,
> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >> +                        errmsg("argument to option \"%s\" must be a
> >> list of column names",
> >> +                               defel->defname)));
> >> +       }
> >
> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
> > all the columns needed to be converted to binary representation, which
> > is NIL for the case where no columns are needed to be converted.  For
> > example,
> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
> > cstate->convert_selectively is set to true, no columns are converted
> > cstate->at
> > NextCopyFrom().  Most efficient case!  In short,
> > cstate->convert_selectively represents whether to do selective binary
> > conversion at NextCopyFrom(), and
> > cstate->convert_binary represents all the columns to be converted at
> > NextCopyFrom(), that can be NIL.
> >
> >> At NextCopyFrom(), this routine computes default values if configured.
> >> In case when these values are not referenced, it might be possible to
> >> skip unnecessary calculations.
> >> Is it unavailable to add logic to avoid to construct cstate->defmap
> >> on unreferenced columns at  BeginCopyFrom()?
> >
> > I think that we don't need to add the above logic because file_fdw
> > does
> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
> > doesn't construct cstate->defmap at all.
> >
> > I fixed a bug plus some minor optimization in
> > check_binary_conversion() that is renamed to
> > check_selective_binary_conversion() in the updated version, and now
> > file_fdw gives up selective binary conversion for the following
> > cases:
> >
> >  a) BINARY format
> >  b) CSV/TEXT format and whole row reference
> >  c) CSV/TEXT format and all the user attributes needed
> >
> >
> > Best regards,
> > Etsuro Fujita
> >
> >> Thanks,
> >>
> >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> >> >> -----Original Message-----
> >> >> From: Robert Haas [mailto:robertmhaas@gmail.com]
> >> >> Sent: Friday, May 11, 2012 1:36 AM
> >> >> To: Etsuro Fujita
> >> >> Cc: pgsql-hackers@postgresql.org
> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of
> >> >> CSV file foreign tables
> >> >>
> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
> >> > <fujita.etsuro@lab.ntt.co.jp>
> >> >> wrote:
> >> >> > I would like to propose to improve parsing efficiency of
> >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
> >> >> > al.[1], which means that for a CSV/TEXT file foreign table,
> >> >> > file_fdw performs binary conversion only for the columns needed
> >> >> > for query processing.  Attached is a WIP patch implementing the
feature.
> >> >>
> >> >> Can you add this to the next CommitFest?  Looks interesting.
> >> >
> >> > Done.
> >> >
> >> > Best regards,
> >> > Etsuro Fujita
> >> >
> >> >> --
> >> >> Robert Haas
> >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise
> >> >> PostgreSQL Company
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> > To make changes to your subscription:
> >> > http://www.postgresql.org/mailpref/pgsql-hackers
> >>
> >>
> >>
> >> --
> >> KaiGai Kohei <kaigai@kaigai.gr.jp>
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
> >> make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >>
> >
> >
> >
> >
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
Kohei KaiGai
Date:
2012/6/26 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> Hi Kaigai-san,
>
>> -----Original Message-----
>> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
>> Sent: Monday, June 25, 2012 9:49 PM
>> To: Etsuro Fujita
>> Cc: Robert Haas; pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> Fujita-san,
>>
>> The revised patch is almost good for me.
>> Only point I noticed is the check for redundant or duplicated options I
> pointed
>> out on the previous post.
>> So, if you agree with the attached patch, I'd like to hand over this patch for
>> committers.
>
> OK I agree with you.  Thanks for the revision!
>
> Besides the revision, I modified check_selective_binary_conversion() to run
> heap_close() in the whole-row-reference case.  Attached is an updated version of
> the patch.
>
Sorry, I overlooked this code path. It seems to me fair enough.

So, I'd like to take over this patch for committers.

Thanks,

> Thanks.
>
> Best regards,
> Etsuro Fujita
>
>> All I changed is:
>>  --- a/src/backend/commands/copy.c
>>  +++ b/src/backend/commands/copy.c
>>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index
>> 98bcb2f..0143d81 100644
>>         }
>>  +      else if (strcmp(defel->defname, "convert_binary") == 0)
>>  +      {
>> -+          if (cstate->convert_binary)
>> ++          if (cstate->convert_selectively)
>>  +              ereport(ERROR,
>>  +                      (errcode(ERRCODE_SYNTAX_ERROR),
>>  +                       errmsg("conflicting or redundant options")));
>>
>> Thanks,
>>
>> 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> > Hi KaiGai-san,
>> >
>> > Thank you for the review.
>> >
>> >> -----Original Message-----
>> >> From: pgsql-hackers-owner@postgresql.org
>> >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
>> >> Sent: Wednesday, June 20, 2012 1:26 AM
>> >> To: Etsuro Fujita
>> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
>> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
>> >> file foreign tables
>> >>
>> >> Hi Fujita-san,
>> >>
>> >> Could you rebase this patch towards the latest tree?
>> >> It was unavailable to apply the patch cleanly.
>> >
>> > Sorry, I updated the patch.  Please find attached an updated version
>> > of the patch.
>> >
>> >> I looked over the patch, then noticed a few points.
>> >>
>> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
>> >> If so, cstate->convert_binary is not a suitable flag to check
>> >> redundant option. It seems to me cstate->convert_selectively is more
>> >> suitable flag to check it.
>> >>
>> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
>> >> +       {
>> >> +           if (cstate->convert_binary)
>> >> +               ereport(ERROR,
>> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
>> >> +                        errmsg("conflicting or redundant
>> >> + options")));
>> >> +           cstate->convert_selectively = true;
>> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
>> >> +               cstate->convert_binary = (List *) defel->arg;
>> >> +           else
>> >> +               ereport(ERROR,
>> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> >> +                        errmsg("argument to option \"%s\" must be a
>> >> list of column names",
>> >> +                               defel->defname)));
>> >> +       }
>> >
>> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
>> > all the columns needed to be converted to binary representation, which
>> > is NIL for the case where no columns are needed to be converted.  For
>> > example,
>> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
>> > cstate->convert_selectively is set to true, no columns are converted
>> > cstate->at
>> > NextCopyFrom().  Most efficient case!  In short,
>> > cstate->convert_selectively represents whether to do selective binary
>> > conversion at NextCopyFrom(), and
>> > cstate->convert_binary represents all the columns to be converted at
>> > NextCopyFrom(), that can be NIL.
>> >
>> >> At NextCopyFrom(), this routine computes default values if configured.
>> >> In case when these values are not referenced, it might be possible to
>> >> skip unnecessary calculations.
>> >> Is it unavailable to add logic to avoid to construct cstate->defmap
>> >> on unreferenced columns at  BeginCopyFrom()?
>> >
>> > I think that we don't need to add the above logic because file_fdw
>> > does
>> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
>> > doesn't construct cstate->defmap at all.
>> >
>> > I fixed a bug plus some minor optimization in
>> > check_binary_conversion() that is renamed to
>> > check_selective_binary_conversion() in the updated version, and now
>> > file_fdw gives up selective binary conversion for the following
>> > cases:
>> >
>> >  a) BINARY format
>> >  b) CSV/TEXT format and whole row reference
>> >  c) CSV/TEXT format and all the user attributes needed
>> >
>> >
>> > Best regards,
>> > Etsuro Fujita
>> >
>> >> Thanks,
>> >>
>> >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> >> >> -----Original Message-----
>> >> >> From: Robert Haas [mailto:robertmhaas@gmail.com]
>> >> >> Sent: Friday, May 11, 2012 1:36 AM
>> >> >> To: Etsuro Fujita
>> >> >> Cc: pgsql-hackers@postgresql.org
>> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of
>> >> >> CSV file foreign tables
>> >> >>
>> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
>> >> > <fujita.etsuro@lab.ntt.co.jp>
>> >> >> wrote:
>> >> >> > I would like to propose to improve parsing efficiency of
>> >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> >> >> > al.[1], which means that for a CSV/TEXT file foreign table,
>> >> >> > file_fdw performs binary conversion only for the columns needed
>> >> >> > for query processing.  Attached is a WIP patch implementing the
> feature.
>> >> >>
>> >> >> Can you add this to the next CommitFest?  Looks interesting.
>> >> >
>> >> > Done.
>> >> >
>> >> > Best regards,
>> >> > Etsuro Fujita
>> >> >
>> >> >> --
>> >> >> Robert Haas
>> >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise
>> >> >> PostgreSQL Company
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> >> > To make changes to your subscription:
>> >> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >>
>> >>
>> >>
>> >> --
>> >> KaiGai Kohei <kaigai@kaigai.gr.jp>
>> >>
>> >> --
>> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
>> >> make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-hackers
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
Hi Kaigai-san,

> -----Original Message-----
> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
> Sent: Tuesday, June 26, 2012 11:05 PM
> To: Etsuro Fujita
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
>
> 2012/6/26 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> > Hi Kaigai-san,
> >
> >> -----Original Message-----
> >> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
> >> Sent: Monday, June 25, 2012 9:49 PM
> >> To: Etsuro Fujita
> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> >> foreign tables
> >>
> >> Fujita-san,
> >>
> >> The revised patch is almost good for me.
> >> Only point I noticed is the check for redundant or duplicated options I
> > pointed
> >> out on the previous post.
> >> So, if you agree with the attached patch, I'd like to hand over this patch
> for
> >> committers.
> >
> > OK I agree with you.  Thanks for the revision!
> >
> > Besides the revision, I modified check_selective_binary_conversion() to run
> > heap_close() in the whole-row-reference case.  Attached is an updated
version
> of
> > the patch.
> >
> Sorry, I overlooked this code path.

No, It's my fault.

> So, I'd like to take over this patch for committers.

Thanks,

Best regards,
Etsuro Fujita

>
> Thanks,
>
> > Thanks.
> >
> > Best regards,
> > Etsuro Fujita
> >
> >> All I changed is:
> >>  --- a/src/backend/commands/copy.c
> >>  +++ b/src/backend/commands/copy.c
> >>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@
index
> >> 98bcb2f..0143d81 100644
> >>         }
> >>  +      else if (strcmp(defel->defname, "convert_binary") == 0)
> >>  +      {
> >> -+          if (cstate->convert_binary)
> >> ++          if (cstate->convert_selectively)
> >>  +              ereport(ERROR,
> >>  +                      (errcode(ERRCODE_SYNTAX_ERROR),
> >>  +                       errmsg("conflicting or redundant options")));
> >>
> >> Thanks,
> >>
> >> 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> >> > Hi KaiGai-san,
> >> >
> >> > Thank you for the review.
> >> >
> >> >> -----Original Message-----
> >> >> From: pgsql-hackers-owner@postgresql.org
> >> >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
> >> >> Sent: Wednesday, June 20, 2012 1:26 AM
> >> >> To: Etsuro Fujita
> >> >> Cc: Robert Haas; pgsql-hackers@postgresql.org
> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
> >> >> file foreign tables
> >> >>
> >> >> Hi Fujita-san,
> >> >>
> >> >> Could you rebase this patch towards the latest tree?
> >> >> It was unavailable to apply the patch cleanly.
> >> >
> >> > Sorry, I updated the patch.  Please find attached an updated version
> >> > of the patch.
> >> >
> >> >> I looked over the patch, then noticed a few points.
> >> >>
> >> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
> >> >> If so, cstate->convert_binary is not a suitable flag to check
> >> >> redundant option. It seems to me cstate->convert_selectively is more
> >> >> suitable flag to check it.
> >> >>
> >> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
> >> >> +       {
> >> >> +           if (cstate->convert_binary)
> >> >> +               ereport(ERROR,
> >> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
> >> >> +                        errmsg("conflicting or redundant
> >> >> + options")));
> >> >> +           cstate->convert_selectively = true;
> >> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
> >> >> +               cstate->convert_binary = (List *) defel->arg;
> >> >> +           else
> >> >> +               ereport(ERROR,
> >> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >> >> +                        errmsg("argument to option \"%s\" must be a
> >> >> list of column names",
> >> >> +                               defel->defname)));
> >> >> +       }
> >> >
> >> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
> >> > all the columns needed to be converted to binary representation, which
> >> > is NIL for the case where no columns are needed to be converted.  For
> >> > example,
> >> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
> >> > cstate->convert_selectively is set to true, no columns are converted
> >> > cstate->at
> >> > NextCopyFrom().  Most efficient case!  In short,
> >> > cstate->convert_selectively represents whether to do selective binary
> >> > conversion at NextCopyFrom(), and
> >> > cstate->convert_binary represents all the columns to be converted at
> >> > NextCopyFrom(), that can be NIL.
> >> >
> >> >> At NextCopyFrom(), this routine computes default values if configured.
> >> >> In case when these values are not referenced, it might be possible to
> >> >> skip unnecessary calculations.
> >> >> Is it unavailable to add logic to avoid to construct cstate->defmap
> >> >> on unreferenced columns at  BeginCopyFrom()?
> >> >
> >> > I think that we don't need to add the above logic because file_fdw
> >> > does
> >> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
> >> > doesn't construct cstate->defmap at all.
> >> >
> >> > I fixed a bug plus some minor optimization in
> >> > check_binary_conversion() that is renamed to
> >> > check_selective_binary_conversion() in the updated version, and now
> >> > file_fdw gives up selective binary conversion for the following
> >> > cases:
> >> >
> >> >  a) BINARY format
> >> >  b) CSV/TEXT format and whole row reference
> >> >  c) CSV/TEXT format and all the user attributes needed
> >> >
> >> >
> >> > Best regards,
> >> > Etsuro Fujita
> >> >
> >> >> Thanks,
> >> >>
> >> >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> >> >> >> -----Original Message-----
> >> >> >> From: Robert Haas [mailto:robertmhaas@gmail.com]
> >> >> >> Sent: Friday, May 11, 2012 1:36 AM
> >> >> >> To: Etsuro Fujita
> >> >> >> Cc: pgsql-hackers@postgresql.org
> >> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of
> >> >> >> CSV file foreign tables
> >> >> >>
> >> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
> >> >> > <fujita.etsuro@lab.ntt.co.jp>
> >> >> >> wrote:
> >> >> >> > I would like to propose to improve parsing efficiency of
> >> >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
> >> >> >> > al.[1], which means that for a CSV/TEXT file foreign table,
> >> >> >> > file_fdw performs binary conversion only for the columns needed
> >> >> >> > for query processing.  Attached is a WIP patch implementing the
> > feature.
> >> >> >>
> >> >> >> Can you add this to the next CommitFest?  Looks interesting.
> >> >> >
> >> >> > Done.
> >> >> >
> >> >> > Best regards,
> >> >> > Etsuro Fujita
> >> >> >
> >> >> >> --
> >> >> >> Robert Haas
> >> >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise
> >> >> >> PostgreSQL Company
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> >> > To make changes to your subscription:
> >> >> > http://www.postgresql.org/mailpref/pgsql-hackers
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> KaiGai Kohei <kaigai@kaigai.gr.jp>
> >> >>
> >> >> --
> >> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
> >> >> make changes to your subscription:
> >> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>
>




Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
Tom Lane
Date:
"Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes:
> Besides the revision, I modified check_selective_binary_conversion() to run
> heap_close() in the whole-row-reference case.  Attached is an updated version of
> the patch.

Applied with minor, mostly-cosmetic revisions.  I did fix
check_selective_binary_conversion to not continue touching the
relation's tupledesc after heap_close.  Also I thought
"convert_selectively" was a better name for the hidden COPY option.
        regards, tom lane


Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From
"Etsuro Fujita"
Date:
Thanks!

Best regards,
Etsuro Fujita

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, July 13, 2012 5:30 AM
> To: Etsuro Fujita
> Cc: 'Kohei KaiGai'; 'Robert Haas'; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
> 
> "Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes:
> > Besides the revision, I modified check_selective_binary_conversion() to run
> > heap_close() in the whole-row-reference case.  Attached is an updated
version
> of
> > the patch.
> 
> Applied with minor, mostly-cosmetic revisions.  I did fix
> check_selective_binary_conversion to not continue touching the
> relation's tupledesc after heap_close.  Also I thought
> "convert_selectively" was a better name for the hidden COPY option.
> 
>             regards, tom lane