Thread: WIP Patch: Selective binary conversion of CSV file foreign tables
I would like to propose to improve parsing efficiency of contrib/file_fdw by selective parsing proposed by Alagiannis et al.[1], which means that for a CSV/TEXT file foreign table, file_fdw performs binary conversion only for the columns needed for query processing. Attached is a WIP patch implementing the feature. I evaluated the efficiency of the patch using SELECT count(*) on a CSV file foreign table of 5,000,000 records, which had the same definition as the pgbench history table. The following run is done on a single core of a 3.00GHz Intel Xeon CPU with 8GB of RAM. Configuration settings are all default. w/o the patch: 7255.898 ms w/ the patch: 3363.297 ms On reflection of [2], I think it would be better to disable this feature when the validation option is set to 'true'; file_fdw converts all columns to binary representation. So, it verifies that each tuple meets all column data types as well as all kinds of constraints. I appreciate your comments. Best regards, Etsuro Fujita [1] http://homepages.cwi.nl/~idreos/NoDBsigmod2012.pdf [2] https://commitfest.postgresql.org/action/patch_view?id=822
On Tue, May 08, 2012 at 08:26:02PM +0900, Etsuro Fujita wrote: > I would like to propose to improve parsing efficiency of contrib/file_fdw by > selective parsing proposed by Alagiannis et al.[1], Is the patch they used against 9.0 published somewhere? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
> -----Original Message----- > From: David Fetter [mailto:david@fetter.org] > Sent: Wednesday, May 09, 2012 9:25 AM > To: Etsuro Fujita > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > foreign tables > > On Tue, May 08, 2012 at 08:26:02PM +0900, Etsuro Fujita wrote: > > I would like to propose to improve parsing efficiency of > > contrib/file_fdw by selective parsing proposed by Alagiannis et > > al.[1], > > Is the patch they used against 9.0 published somewhere? It's not. The patch's been created by myself. I don't know their patch. Best regards, Etsuro Fujita > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I would like to propose to improve parsing efficiency of contrib/file_fdw by > selective parsing proposed by Alagiannis et al.[1], which means that for a > CSV/TEXT file foreign table, file_fdw performs binary conversion only for > the columns needed for query processing. Attached is a WIP patch > implementing the feature. Can you add this to the next CommitFest? Looks interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -----Original Message----- > From: Robert Haas [mailto:robertmhaas@gmail.com] > Sent: Friday, May 11, 2012 1:36 AM > To: Etsuro Fujita > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > foreign tables > > On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> > wrote: > > I would like to propose to improve parsing efficiency of > > contrib/file_fdw by selective parsing proposed by Alagiannis et > > al.[1], which means that for a CSV/TEXT file foreign table, file_fdw > > performs binary conversion only for the columns needed for query > > processing. Attached is a WIP patch implementing the feature. > > Can you add this to the next CommitFest? Looks interesting. Done. Best regards, Etsuro Fujita > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company >
Hi Fujita-san, Could you rebase this patch towards the latest tree? It was unavailable to apply the patch cleanly. I looked over the patch, then noticed a few points. At ProcessCopyOptions(), defel->arg can be NIL, isn't it? If so, cstate->convert_binary is not a suitable flag to check redundant option. It seems to me cstate->convert_selectively is more suitable flag to check it. + else if (strcmp(defel->defname, "convert_binary") == 0) + { + if (cstate->convert_binary) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + cstate->convert_selectively = true; + if (defel->arg == NULL || IsA(defel->arg, List)) + cstate->convert_binary = (List *) defel->arg; + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("argument to option \"%s\" must be a list of column names", + defel->defname))); + } At NextCopyFrom(), this routine computes default values if configured. In case when these values are not referenced, it might be possible to skip unnecessary calculations. Is it unavailable to add logic to avoid to construct cstate->defmap on unreferenced columns at BeginCopyFrom()? Thanks, 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: >> -----Original Message----- >> From: Robert Haas [mailto:robertmhaas@gmail.com] >> Sent: Friday, May 11, 2012 1:36 AM >> To: Etsuro Fujita >> Cc: pgsql-hackers@postgresql.org >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file >> foreign tables >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> >> wrote: >> > I would like to propose to improve parsing efficiency of >> > contrib/file_fdw by selective parsing proposed by Alagiannis et >> > al.[1], which means that for a CSV/TEXT file foreign table, file_fdw >> > performs binary conversion only for the columns needed for query >> > processing. Attached is a WIP patch implementing the feature. >> >> Can you add this to the next CommitFest? Looks interesting. > > Done. > > Best regards, > Etsuro Fujita > >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL >> Company >> > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hi KaiGai-san, Thank you for the review. > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai > Sent: Wednesday, June 20, 2012 1:26 AM > To: Etsuro Fujita > Cc: Robert Haas; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > foreign tables > > Hi Fujita-san, > > Could you rebase this patch towards the latest tree? > It was unavailable to apply the patch cleanly. Sorry, I updated the patch. Please find attached an updated version of the patch. > I looked over the patch, then noticed a few points. > > At ProcessCopyOptions(), defel->arg can be NIL, isn't it? > If so, cstate->convert_binary is not a suitable flag to check redundant > option. It seems to me cstate->convert_selectively is more suitable flag > to check it. > > + else if (strcmp(defel->defname, "convert_binary") == 0) > + { > + if (cstate->convert_binary) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("conflicting or redundant options"))); > + cstate->convert_selectively = true; > + if (defel->arg == NULL || IsA(defel->arg, List)) > + cstate->convert_binary = (List *) defel->arg; > + else > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("argument to option \"%s\" must be a > list of column names", > + defel->defname))); > + } Yes, defel->arg can be NIL. defel->arg is a List structure listing all the columns needed to be converted to binary representation, which is NIL for the case where no columns are needed to be converted. For example, defel->arg is NIL for SELECT COUNT(*). In this case, while cstate->convert_selectively is set to true, no columns are converted at NextCopyFrom(). Most efficient case! In short, cstate->convert_selectively represents whether to do selective binary conversion at NextCopyFrom(), and cstate->convert_binary represents all the columns to be converted at NextCopyFrom(), that can be NIL. > At NextCopyFrom(), this routine computes default values if configured. > In case when these values are not referenced, it might be possible to skip > unnecessary calculations. > Is it unavailable to add logic to avoid to construct cstate->defmap on > unreferenced columns at BeginCopyFrom()? I think that we don't need to add the above logic because file_fdw does BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom() doesn't construct cstate->defmap at all. I fixed a bug plus some minor optimization in check_binary_conversion() that is renamed to check_selective_binary_conversion() in the updated version, and now file_fdw gives up selective binary conversion for the following cases: a) BINARY format b) CSV/TEXT format and whole row reference c) CSV/TEXT format and all the user attributes needed Best regards, Etsuro Fujita > Thanks, > > 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > >> -----Original Message----- > >> From: Robert Haas [mailto:robertmhaas@gmail.com] > >> Sent: Friday, May 11, 2012 1:36 AM > >> To: Etsuro Fujita > >> Cc: pgsql-hackers@postgresql.org > >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV > >> file foreign tables > >> > >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> > >> wrote: > >> > I would like to propose to improve parsing efficiency of > >> > contrib/file_fdw by selective parsing proposed by Alagiannis et > >> > al.[1], which means that for a CSV/TEXT file foreign table, > >> > file_fdw performs binary conversion only for the columns needed for > >> > query processing. �Attached is a WIP patch implementing the feature. > >> > >> Can you add this to the next CommitFest? �Looks interesting. > > > > Done. > > > > Best regards, > > Etsuro Fujita > > > >> -- > >> Robert Haas > >> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > >> Company > >> > > > > > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To > > make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > begin 666 file_fdw_sel_bin_conv_v2.patch M9&EF9B M+6=I="!A+V-O;G1R:6(O9FEL95]F9'<O9FEL95]F9'<N8R!B+V-O M;G1R:6(O9FEL95]F9'<O9FEL95]F9'<N8PII;F1E>"!E,V(Y,C(S+BYF9#DT M,3)F(#$P,#8T- HM+2T@82]C;VYT<FEB+V9I;&5?9F1W+V9I;&5?9F1W+F,* M*RLK(&(O8V]N=')I8B]F:6QE7V9D=R]F:6QE7V9D=RYC"D! ("TQ-BPV("LQ M-BPW($! "B C:6YC;'5D92 \=6YI<W1D+F@^"B *("-I;F-L=61E(")A8V-E M<W,O<F5L;W!T:6]N<RYH(@HK(VEN8VQU9&4@(F%C8V5S<R]S>7-A='1R+F@B M"B C:6YC;'5D92 B8V%T86QO9R]P9U]F;W)E:6=N7W1A8FQE+F@B"B C:6YC M;'5D92 B8V]M;6%N9',O8V]P>2YH(@H@(VEN8VQU9&4@(F-O;6UA;F1S+V1E M9G)E;2YH(@I 0" M,CDL-B K,S L-R! 0 H@(VEN8VQU9&4@(F]P=&EM:7IE M<B]P871H;F]D92YH(@H@(VEN8VQU9&4@(F]P=&EM:7IE<B]P;&%N;6%I;BYH M(@H@(VEN8VQU9&4@(F]P=&EM:7IE<B]R97-T<FEC=&EN9F\N:"(**R-I;F-L M=61E(")O<'1I;6EZ97(O=F%R+F@B"B C:6YC;'5D92 B=71I;',O;65M=71I M;',N:"(*("-I;F-L=61E(")U=&EL<R]R96PN:"(*( I 0" M,3,V+#8@*S$S M."PY($! ('-T871I8R!B;V]L(&ES7W9A;&ED7V]P=&EO;BAC;VYS="!C:&%R M("IO<'1I;VXL($]I9"!C;VYT97AT*3L*('-T871I8R!V;VED(&9I;&5'971/ M<'1I;VYS*$]I9"!F;W)E:6=N=&%B;&5I9"P*( D)"2 @(&-H87(@*BIF:6QE M;F%M92P@3&ES=" J*F]T:&5R7V]P=&EO;G,I.PH@<W1A=&EC($QI<W0@*F=E M=%]F:6QE7V9D=U]A='1R:6)U=&5?;W!T:6]N<RA/:60@<F5L:60I.PHK<W1A M=&EC(&)O;VP@8VAE8VM?<V5L96-T:79E7V)I;F%R>5]C;VYV97)S:6]N*%)E M;$]P=$EN9F\@*F)A<V5R96PL"BL)"0D)"0D)"0D)"2 @3VED(&9O<F5I9VYT M86)L96ED+ HK"0D)"0D)"0D)"0D@($QI<W0@*BIC;VQU;6YS*3L*('-T871I M8R!V;VED(&5S=&EM871E7W-I>F4H4&QA;FYE<DEN9F\@*G)O;W0L(%)E;$]P M=$EN9F\@*F)A<V5R96PL"B )"0D@($9I;&5&9'=0;&%N4W1A=&4@*F9D=U]P M<FEV871E*3L*('-T871I8R!V;VED(&5S=&EM871E7V-O<W1S*%!L86YN97)) M;F9O("IR;V]T+"!296Q/<'1);F9O("IB87-E<F5L+ I 0" M-#4W+#8@*S0V M,BPQ-B! 0"!F:6QE1V5T1F]R96EG;E!A=&AS*%!L86YN97));F9O("IR;V]T M+ H@"49I;&5&9'=0;&%N4W1A=&4@*F9D=U]P<FEV871E(#T@*$9I;&5&9'=0 M;&%N4W1A=&4@*BD@8F%S97)E;"T^9F1W7W!R:79A=&4["B )0V]S= D)<W1A M<G1U<%]C;W-T.PH@"4-O<W0)"71O=&%L7V-O<W0["BL)8F]O; D)<F5S=6QT M.PHK"4QI<W0)(" @*F-O;'5M;G,["BL)3&ES= D@(" J8V]P=&EO;B ]($Y) M3#L**PHK"2\J($1E8VED92!W:&5T:&5R('1O('-E;&5C=&EV96QY('!E<F9O M<FT@8FEN87)Y(&-O;G9E<G-I;VX@*B\**PER97-U;'0@/2!C:&5C:U]S96QE M8W1I=F5?8FEN87)Y7V-O;G9E<G-I;VXH8F%S97)E;"P**PD)"0D)"0D)"0D) M(" @9F]R96EG;G1A8FQE:60L"BL)"0D)"0D)"0D)"2 @("9C;VQU;6YS*3L* M*PEI9B H<F5S=6QT*0HK"0EC;W!T:6]N(#T@;&ES=%]M86ME,2AM86ME1&5F M16QE;2@B8V]N=F5R=%]B:6YA<GDB+" H3F]D92 J*2!C;VQU;6YS*2D["B * M( DO*B!%<W1I;6%T92!C;W-T<R J+PH@"65S=&EM871E7V-O<W1S*')O;W0L M(&)A<V5R96PL(&9D=U]P<FEV871E+ I 0" M-#<P+#<@*S0X-2PW($! (&9I M;&5'971&;W)E:6=N4&%T:',H4&QA;FYE<DEN9F\@*G)O;W0L"B )"0D)"0D) M"0D@=&]T86Q?8V]S="P*( D)"0D)"0D)"2!.24PL"0DO*B!N;R!P871H:V5Y M<R J+PH@"0D)"0D)"0D)($Y53$PL"0DO*B!N;R!O=71E<B!R96P@96ET:&5R M("HO"BT)"0D)"0D)"0D@3DE,*2D["0DO*B!N;R!F9'=?<')I=F%T92!D871A M("HO"BL)"0D)"0D)"0D@8V]P=&EO;BDI.PH@"B )+RH*( D@*B!)9B!D871A M(&9I;&4@=V%S('-O<G1E9"P@86YD('=E(&MN97<@:70@<V]M96AO=RP@=V4@ M8V]U;&0@:6YS97)T"D! ("TU,#<L-R K-3(R+#<@0$ @9FEL94=E=$9O<F5I M9VY0;&%N*%!L86YN97));F9O("IR;V]T+ H@"0D)"0D)"7-C86Y?8VQA=7-E M<RP*( D)"0D)"0ES8V%N7W)E;&ED+ H@"0D)"0D)"4Y)3"P)+RH@;F\@97AP M<F5S<VEO;G,@=&\@979A;'5A=&4@*B\*+0D)"0D)"0E.24PI.PD)+RH@;F\@ M<')I=F%T92!S=&%T92!E:71H97(@*B\**PD)"0D)"0EB97-T7W!A=&@M/F9D M=U]P<FEV871E*3L*('T*( H@+RH*0$ @+34T-"PV("LU-3DL-R! 0"!F:6QE M17AP;&%I;D9O<F5I9VY38V%N*$9O<F5I9VY38V%N4W1A=&4@*FYO9&4L($5X M<&QA:6Y3=&%T92 J97,I"B!S=&%T:6,@=F]I9 H@9FEL94)E9VEN1F]R96EG M;E-C86XH1F]R96EG;E-C86Y3=&%T92 J;F]D92P@:6YT(&5F;&%G<RD*('L* M*PE&;W)E:6=N4V-A;B J<&QA;B ]("A&;W)E:6=N4V-A;B J*2!N;V1E+3YS M<RYP<RYP;&%N.PH@"6-H87()(" @*F9I;&5N86UE.PH@"4QI<W0)(" @*F]P M=&EO;G,["B )0V]P>5-T871E"6-S=&%T93L*0$ @+34U.2PV("LU-S4L,3 @ M0$ @9FEL94)E9VEN1F]R96EG;E-C86XH1F]R96EG;E-C86Y3=&%T92 J;F]D M92P@:6YT(&5F;&%G<RD*( EF:6QE1V5T3W!T:6]N<RA296QA=&EO;D=E=%)E M;&ED*&YO9&4M/G-S+G-S7V-U<G)E;G1296QA=&EO;BDL"B )"0D)(" @)F9I M;&5N86UE+" F;W!T:6]N<RD["B **PDO*B!!9&0@86X@;W!T:6]N(&9O<B!S M96QE8W1I=F4@8FEN87)Y(&-O;G9E<G-I;VX@*B\**PEI9BAP;&%N+3YF9'=? M<')I=F%T92 A/2!.24PI"BL)"6]P=&EO;G,@/2!L:7-T7V-O;F-A="AO<'1I M;VYS+"!P;&%N+3YF9'=?<')I=F%T92D["BL*( DO*@H@"2 J($-R96%T92!# M;W!Y4W1A=&4@9G)O;2!&1%<@;W!T:6]N<RX@(%=E(&%L=V%Y<R!A8W%U:7)E M(&%L;"!C;VQU;6YS+"!S;PH@"2 J(&%S('1O(&UA=&-H('1H92!E>'!E8W1E M9"!38V%N5'5P;&53;&]T('-I9VYA='5R92X*0$ @+38Y-2PV("LW,34L,3$R M($! (&9I;&5!;F%L>7IE1F]R96EG;E1A8FQE*%)E;&%T:6]N(')E;&%T:6]N M+ H@?0H@"B O*@HK("H@8VAE8VM?<V5L96-T:79E7V)I;F%R>5]C;VYV97)S M:6]N"BL@*B\**W-T871I8R!B;V]L"BMC:&5C:U]S96QE8W1I=F5?8FEN87)Y M7V-O;G9E<G-I;VXH4F5L3W!T26YF;R J8F%S97)E;"P**PD)"0D)"0D)("!/ M:60@9F]R96EG;G1A8FQE:60L"BL)"0D)"0D)"2 @3&ES=" J*F-O;'5M;G,I M"BM["BL)1F]R96EG;E1A8FQE("IT86)L93L**PE,:7-T0V5L;" @("IL8SL* M*PE296QA=&EO;@ER96P["BL)5'5P;&5$97-C"71U<&QE1&5S8SL**PE!='1R M3G5M8F5R"6%T=&YU;3L**PE":71M87!S970@("IA='1R<U]U<V5D(#T@3E5, M3#L**PE":71M87!S970@("IT;7!S970["BL):6YT"0D)8VYT.PHK"6EN= D) M"6D["BL**PDJ8V]L=6UN<R ]($Y)3#L**PHK"2\J"BL)("H@17AA;6EN92!F M;W)M870@;V8@=&AE(&9I;&4N("!)9B!B:6YA<GD@9F]R;6%T+"!W92!D;VXG M="!N965D('1O(&-O;G9E<G0**PD@*B!A="!A;&PN"BL)("HO"BL)=&%B;&4@ M/2!'971&;W)E:6=N5&%B;&4H9F]R96EG;G1A8FQE:60I.PHK"69O<F5A8V@H M;&,L('1A8FQE+3YO<'1I;VYS*0HK"7L**PD)1&5F16QE;2 @(" J9&5F(#T@ M*$1E9D5L96T@*BD@;&9I<G-T*&QC*3L**PHK"0EI9B H<W1R8VUP*&1E9BT^ M9&5F;F%M92P@(F9O<FUA="(I(#T](# I"BL)"7L**PD)"6-H87()(" @*F9O M<FUA=" ](&1E9D=E=%-T<FEN9RAD968I.PHK"BL)"0EI9B H<W1R8VUP*&9O M<FUA="P@(F)I;F%R>2(I(#T](# I"BL)"0D)<F5T=7)N(&9A;'-E.PHK"0D) M8G)E86L["BL)"7T**PE]"BL**PDO*B!!9&0@86QL('1H92!A='1R:6)U=&5S M(&YE961E9"!F;W(@:F]I;G,@;W(@9FEN86P@;W5T<'5T+B J+PHK"7!U;&Q? M=F%R871T;F]S*"A.;V1E("HI(&)A<V5R96PM/G)E;'1A<F=E=&QI<W0L(&)A M<V5R96PM/G)E;&ED+" F871T<G-?=7-E9"D["BL**PDO*B!!9&0@86QL('1H M92!A='1R:6)U=&5S('5S960@8GD@<F5S=')I8W1I;VX@8VQA=7-E<RX@*B\* M*PEF;W)E86-H*&QC+"!B87-E<F5L+3YB87-E<F5S=')I8W1I;F9O*0HK"7L* M*PD)4F5S=')I8W1);F9O(" @*G)I;F9O(#T@*%)E<W1R:6-T26YF;R J*2!L M9FER<W0H;&,I.PHK"BL)"7!U;&Q?=F%R871T;F]S*"A.;V1E("HI(')I;F9O M+3YC;&%U<V4L(&)A<V5R96PM/G)E;&ED+" F871T<G-?=7-E9"D["BL)?0HK M"BL)<F5L(#T@:&5A<%]O<&5N*&9O<F5I9VYT86)L96ED+"!!8V-E<W-3:&%R M94QO8VLI.PHK"71U<&QE1&5S8R ](%)E;&%T:6]N1V5T1&5S8W(H<F5L*3L* M*PHK"71M<'-E=" ](&)M<U]C;W!Y*&%T=')S7W5S960I.PHK"7=H:6QE("@H M871T;G5M(#T@8FUS7V9I<G-T7VUE;6)E<BAT;7!S970I*2 ^/2 P*0HK"7L* M*PD)+RH@061J=7-T(&9O<B!S>7-T96T@871T<FEB=71E<RX@*B\**PD)871T M;G5M("L]($9I<G-T3&]W26YV86QI9$AE87!!='1R:6)U=&5.=6UB97(["BL* M*PD)+RH@268@=VAO;&4M<F]W(')E9F5R96YC92P@9VEV92!U<"X@*B\**PD) M:68@*&%T=&YU;2 ]/2 P*0HK"0E["BL)"0DJ8V]L=6UN<R ]($Y)3#L**PD) M"7)E='5R;B!F86QS93L**PD)?0HK"BL)"2\J($EG;F]R92!S>7-T96T@871T M<FEB=71E<RX@*B\**PD):68@*&%T=&YU;2 \(# I"BL)"0EC;VYT:6YU93L* M*PHK"0DO*B!'970@=7-E<B!A='1R:6)U=&5S+B J+PHK"0EI9B H871T;G5M M(#X@,"D**PD)>PHK"0D)1F]R;5]P9U]A='1R:6)U=&4@871T<B ]('1U<&QE M1&5S8RT^871T<G-;871T;G5M("T@,5T["BL)"0EC:&%R"2 @("IA='1N86UE M(#T@<'-T<F1U<"A.86UE4W1R*&%T='(M/F%T=&YA;64I*3L**PHK"0D)+RH@ M4VMI<"!D<F]P<&5D(&%T=')I8G5T97,N("HO"BL)"0EI9B H871T<BT^871T M:7-D<F]P<&5D*0HK"0D)"6-O;G1I;G5E.PHK"0D)*F-O;'5M;G,@/2!L87!P M96YD*"IC;VQU;6YS+"!M86ME4W1R:6YG*&%T=&YA;64I*3L**PD)?0HK"7T* M*PEB;7-?9G)E92AT;7!S970I.PHK"BL):&5A<%]C;&]S92AR96PL($%C8V5S M<U-H87)E3&]C:RD["BL**PDO*B!)9B!A;&P@=&AE('5S97(@871T<FEB=71E M<R!N965D960L(&=I=F4@=7 N("HO"BL)8VYT(#T@,#L**PEF;W(@*&D@/2 P M.R!I(#P@='5P;&5$97-C+3YN871T<SL@:2LK*0HK"7L**PD)1F]R;5]P9U]A M='1R:6)U=&4@871T<B ]('1U<&QE1&5S8RT^871T<G-;:5T["BL**PD)+RH@ M4VMI<"!D<F]P<&5D(&%T=')I8G5T97,N("HO"BL)"6EF("AA='1R+3YA='1I M<V1R;W!P960I"BL)"0EC;VYT:6YU93L**PD)8VYT*RL["BL)?0HK"6EF("AC M;G0@/3T@;&ES=%]L96YG=&@H*F-O;'5M;G,I*0HK"7L**PD)*F-O;'5M;G,@ M/2!.24P["BL)"7)E='5R;B!F86QS93L**PE]"BL**PER971U<FX@=')U93L* M*WT**PHK+RH*(" J($5S=&EM871E('-I>F4@;V8@82!F;W)E:6=N('1A8FQE M+@H@("H*(" J(%1H92!M86EN(')E<W5L="!I<R!R971U<FYE9"!I;B!B87-E M<F5L+3YR;W=S+B @5V4@86QS;R!S970*9&EF9B M+6=I="!A+W-R8R]B86-K M96YD+V-O;6UA;F1S+V-O<'DN8R!B+W-R8R]B86-K96YD+V-O;6UA;F1S+V-O M<'DN8PII;F1E>" Y.&)C8C)F+BXP,30S9#@Q(#$P,#8T- HM+2T@82]S<F,O M8F%C:V5N9"]C;VUM86YD<R]C;W!Y+F,**RLK(&(O<W)C+V)A8VME;F0O8V]M M;6%N9',O8V]P>2YC"D! ("TQ,C(L-B K,3(R+#$Q($! ('1Y<&5D968@<W1R M=6-T($-O<'E3=&%T941A=&$*( E,:7-T"2 @("IF;W)C95]N;W1N=6QL.PDO M*B!L:7-T(&]F(&-O;'5M;B!N86UE<R J+PH@"6)O;VP)(" @*F9O<F-E7VYO M=&YU;&Q?9FQA9W,["2\J('!E<BUC;VQU;6X@0U-6($9.3B!F;&%G<R J+PH@ M"BL)+RH@<&%R86UE=&5R<R!F<F]M(&-O;G1R:6(O9FEL95]F9'<@*B\**PE, M:7-T"2 @("IC;VYV97)T7V)I;F%R>3L)+RH@;&ES="!O9B!C;VQU;6X@;F%M M97,@*B\**PEB;V]L"2 @("IC;VYV97)T7V)I;F%R>5]F;&%G<SL)+RH@<&5R M+6-O;'5M;B!#4U8O5$585"!#0B!F;&%G<R J+PHK"6)O;VP)"6-O;G9E<G1? M<V5L96-T:79E;'D["2\J('-E;&5C=&EV92!B:6YA<GD@8V]N=F5R<VEO;C\@ M*B\**PH@"2\J('1H97-E(&%R92!J=7-T(&9O<B!E<G)O<B!M97-S86=E<RP@ M<V5E($-O<'E&<F]M17)R;W)#86QL8F%C:R J+PH@"6-O;G-T(&-H87(@*F-U M<E]R96QN86UE.PDO*B!T86)L92!N86UE(&9O<B!E<G)O<B!M97-S86=E<R J M+PH@"6EN= D)"6-U<E]L:6YE;F\["0DO*B!L:6YE(&YU;6)E<B!F;W(@97)R M;W(@;65S<V%G97,@*B\*0$ @+3DV,2PV("LY-C8L,C$@0$ @4')O8V5S<T-O M<'E/<'1I;VYS*$-O<'E3=&%T92!C<W1A=&4L"B )"0D)"0D@97)R;7-G*")A M<F=U;65N="!T;R!O<'1I;VX@7"(E<UPB(&UU<W0@8F4@82!L:7-T(&]F(&-O M;'5M;B!N86UE<R(L"B )"0D)"0D)"61E9F5L+3YD969N86UE*2DI.PH@"0E] M"BL)"65L<V4@:68@*'-T<F-M<"AD969E;"T^9&5F;F%M92P@(F-O;G9E<G1? M8FEN87)Y(BD@/3T@,"D**PD)>PHK"0D):68@*&-S=&%T92T^8V]N=F5R=%]B M:6YA<GDI"BL)"0D)97)E<&]R="A%4E)/4BP**PD)"0D)"2AE<G)C;V1E*$52 M4D-/1$5?4UE.5$%87T524D]2*2P**PD)"0D)"2!E<G)M<V<H(F-O;F9L:6-T M:6YG(&]R(')E9'5N9&%N="!O<'1I;VYS(BDI*3L**PD)"6-S=&%T92T^8V]N M=F5R=%]S96QE8W1I=F5L>2 ]('1R=64["BL)"0EI9B H9&5F96PM/F%R9R ] M/2!.54Q,('Q\($ES02AD969E;"T^87)G+"!,:7-T*2D**PD)"0EC<W1A=&4M M/F-O;G9E<G1?8FEN87)Y(#T@*$QI<W0@*BD@9&5F96PM/F%R9SL**PD)"65L M<V4**PD)"0EE<F5P;W)T*$524D]2+ HK"0D)"0D)*&5R<F-O9&4H15)20T]$ M15])3E9!3$E$7U!!4D%-151%4E]604Q512DL"BL)"0D)"0D@97)R;7-G*")A M<F=U;65N="!T;R!O<'1I;VX@7"(E<UPB(&UU<W0@8F4@82!L:7-T(&]F(&-O M;'5M;B!N86UE<R(L"BL)"0D)"0D)"61E9F5L+3YD969N86UE*2DI.PHK"0E] M"B )"65L<V4@:68@*'-T<F-M<"AD969E;"T^9&5F;F%M92P@(F5N8V]D:6YG M(BD@/3T@,"D*( D)>PH@"0D):68@*&-S=&%T92T^9FEL95]E;F-O9&EN9R ^ M/2 P*0I 0" M,3,P-RPV("LQ,S(W+#(X($! ($)E9VEN0V]P>2AB;V]L(&ES M7V9R;VTL"B )"7T*( E]"B **PDO*B!#;VYV97)T(&-O;G9E<G0@8FEN87)Y M(&YA;64@;&ES="!T;R!P97(M8V]L=6UN(&9L86=S+"!C:&5C:R!V86QI9&ET M>2 J+PHK"6-S=&%T92T^8V]N=F5R=%]B:6YA<GE?9FQA9W,@/2 H8F]O;" J M*2!P86QL;V,P*&YU;5]P:'ES7V%T=')S("H@<VEZ96]F*&)O;VPI*3L**PEI M9B H8W-T871E+3YC;VYV97)T7V)I;F%R>2D**PE["BL)"4QI<W0)(" @*F%T M=&YU;7,["BL)"4QI<W1#96QL(" @*F-U<CL**PHK"0EA='1N=6US(#T@0V]P M>4=E=$%T=&YU;7,H='5P1&5S8RP@8W-T871E+3YR96PL(&-S=&%T92T^8V]N M=F5R=%]B:6YA<GDI.PHK"BL)"69O<F5A8V@H8W5R+"!A='1N=6US*0HK"0E[ M"BL)"0EI;G0)"0EA='1N=6T@/2!L9FER<W1?:6YT*&-U<BD["BL**PD)"6EF M("@A;&ES=%]M96UB97)?:6YT*&-S=&%T92T^871T;G5M;&ES="P@871T;G5M M*2D**PD)"0EE<F5P;W)T*$524D]2+ HK"0D)"0D)*&5R<F-O9&4H15)20T]$ M15])3E9!3$E$7T-/3%5-3E]2149%4D5.0T4I+ HK"0D)"65R<FUS9R@B<V5L M96-T960@8V]L=6UN(%PB)7-<(B!N;W0@<F5F97)E;F-E9"!B>2!#3U!9(BP* M*PD)"0D)(" @3F%M95-T<BAT=7!$97-C+3YA='1R<UMA='1N=6T@+2 Q72T^ M871T;F%M92DI*2D["BL)"0EC<W1A=&4M/F-O;G9E<G1?8FEN87)Y7V9L86=S M6V%T=&YU;2 M(#%=(#T@=')U93L**PD)?0HK"7T**PH@"2\J(%5S92!C;&EE M;G0@96YC;V1I;F<@=VAE;B!%3D-/1$E.1R!O<'1I;VX@:7,@;F]T('-P96-I M9FEE9"X@*B\*( EI9B H8W-T871E+3YF:6QE7V5N8V]D:6YG(#P@,"D*( D) M8W-T871E+3YF:6QE7V5N8V]D:6YG(#T@<&=?9V5T7V-L:65N=%]E;F-O9&EN M9R@I.PI 0" M,C4V-2PR,R K,C8P-RPR-R! 0"!.97AT0V]P>49R;VTH0V]P M>5-T871E(&-S=&%T92P@17AP<D-O;G1E>'0@*F5C;VYT97AT+ H@"0D)"0D) M"0E.86UE4W1R*&%T=');;5TM/F%T=&YA;64I*2DI.PH@"0D)<W1R:6YG(#T@ M9FEE;&1?<W1R:6YG<UMF:65L9&YO*RM=.PH@"BT)"0EI9B H8W-T871E+3YC M<W9?;6]D92 F)B!S=')I;F<@/3T@3E5,3" F)@HM"0D)"6-S=&%T92T^9F]R M8V5?;F]T;G5L;%]F;&%G<UMM72D**PD)"6EF("@A8W-T871E+3YC;VYV97)T M7W-E;&5C=&EV96QY('Q\"BL)"0D)8W-T871E+3YC;VYV97)T7V)I;F%R>5]F M;&%G<UMM72D*( D)"7L*+0D)"0DO*B!';R!A:&5A9"!A;F0@<F5A9"!T:&4@ M3E5,3"!S=')I;F<@*B\*+0D)"0ES=')I;F<@/2!C<W1A=&4M/FYU;&Q?<')I M;G0["BT)"0E]"BL)"0D):68@*&-S=&%T92T^8W-V7VUO9&4@)B8@<W1R:6YG M(#T]($Y53$P@)B8**PD)"0D)8W-T871E+3YF;W)C95]N;W1N=6QL7V9L86=S M6VU=*0HK"0D)"7L**PD)"0D)+RH@1V\@86AE860@86YD(')E860@=&AE($Y5 M3$P@<W1R:6YG("HO"BL)"0D)"7-T<FEN9R ](&-S=&%T92T^;G5L;%]P<FEN M=#L**PD)"0E]"B *+0D)"6-S=&%T92T^8W5R7V%T=&YA;64@/2!.86UE4W1R M*&%T=');;5TM/F%T=&YA;64I.PHM"0D)8W-T871E+3YC=7)?871T=F%L(#T@ M<W1R:6YG.PHM"0D)=F%L=65S6VU=(#T@26YP=71&=6YC=&EO;D-A;&PH)FEN M7V9U;F-T:6]N<UMM72P*+0D)"0D)"0D)"0D@('-T<FEN9RP*+0D)"0D)"0D) M"0D@('1Y<&EO<&%R86US6VU=+ HM"0D)"0D)"0D)"2 @871T<EMM72T^871T M='EP;6]D*3L*+0D)"6EF("AS=')I;F<@(3T@3E5,3"D*+0D)"0EN=6QL<UMM M72 ](&9A;'-E.PHM"0D)8W-T871E+3YC=7)?871T;F%M92 ]($Y53$P["BT) M"0EC<W1A=&4M/F-U<E]A='1V86P@/2!.54Q,.PHK"0D)"6-S=&%T92T^8W5R M7V%T=&YA;64@/2!.86UE4W1R*&%T=');;5TM/F%T=&YA;64I.PHK"0D)"6-S M=&%T92T^8W5R7V%T='9A;" ]('-T<FEN9SL**PD)"0EV86QU97-;;5T@/2!) M;G!U=$9U;F-T:6]N0V%L;"@F:6Y?9G5N8W1I;VYS6VU=+ HK"0D)"0D)"0D) M"0D@('-T<FEN9RP**PD)"0D)"0D)"0D)("!T>7!I;W!A<F%M<UMM72P**PD) M"0D)"0D)"0D)("!A='1R6VU=+3YA='1T>7!M;V0I.PHK"0D)"6EF("AS=')I M;F<@(3T@3E5,3"D**PD)"0D);G5L;'-;;5T@/2!F86QS93L**PD)"0EC<W1A M=&4M/F-U<E]A='1N86UE(#T@3E5,3#L**PD)"0EC<W1A=&4M/F-U<E]A='1V M86P@/2!.54Q,.PHK"0D)?0H@"0E]"B *( D)07-S97)T*&9I96QD;F\@/3T@ *;F9I96QD<RD["@`` ` end
Fujita-san, The revised patch is almost good for me. Only point I noticed is the check for redundant or duplicated options I pointed out on the previous post. So, if you agree with the attached patch, I'd like to hand over this patch for committers. All I changed is: --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index 98bcb2f..0143d81 100644 } + else if (strcmp(defel->defname, "convert_binary") == 0) + { -+ if (cstate->convert_binary) ++ if (cstate->convert_selectively) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); Thanks, 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > Hi KaiGai-san, > > Thank you for the review. > >> -----Original Message----- >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai >> Sent: Wednesday, June 20, 2012 1:26 AM >> To: Etsuro Fujita >> Cc: Robert Haas; pgsql-hackers@postgresql.org >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file >> foreign tables >> >> Hi Fujita-san, >> >> Could you rebase this patch towards the latest tree? >> It was unavailable to apply the patch cleanly. > > Sorry, I updated the patch. Please find attached an updated version of the > patch. > >> I looked over the patch, then noticed a few points. >> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it? >> If so, cstate->convert_binary is not a suitable flag to check redundant >> option. It seems to me cstate->convert_selectively is more suitable flag >> to check it. >> >> + else if (strcmp(defel->defname, "convert_binary") == 0) >> + { >> + if (cstate->convert_binary) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("conflicting or redundant options"))); >> + cstate->convert_selectively = true; >> + if (defel->arg == NULL || IsA(defel->arg, List)) >> + cstate->convert_binary = (List *) defel->arg; >> + else >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("argument to option \"%s\" must be a >> list of column names", >> + defel->defname))); >> + } > > Yes, defel->arg can be NIL. defel->arg is a List structure listing all the > columns needed to be converted to binary representation, which is NIL for > the case where no columns are needed to be converted. For example, > defel->arg is NIL for SELECT COUNT(*). In this case, while > cstate->convert_selectively is set to true, no columns are converted at > NextCopyFrom(). Most efficient case! In short, cstate->convert_selectively > represents whether to do selective binary conversion at NextCopyFrom(), and > cstate->convert_binary represents all the columns to be converted at > NextCopyFrom(), that can be NIL. > >> At NextCopyFrom(), this routine computes default values if configured. >> In case when these values are not referenced, it might be possible to skip >> unnecessary calculations. >> Is it unavailable to add logic to avoid to construct cstate->defmap on >> unreferenced columns at BeginCopyFrom()? > > I think that we don't need to add the above logic because file_fdw does > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom() > doesn't construct cstate->defmap at all. > > I fixed a bug plus some minor optimization in check_binary_conversion() that > is renamed to check_selective_binary_conversion() in the updated version, > and now file_fdw gives up selective binary conversion for the following > cases: > > a) BINARY format > b) CSV/TEXT format and whole row reference > c) CSV/TEXT format and all the user attributes needed > > > Best regards, > Etsuro Fujita > >> Thanks, >> >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: >> >> -----Original Message----- >> >> From: Robert Haas [mailto:robertmhaas@gmail.com] >> >> Sent: Friday, May 11, 2012 1:36 AM >> >> To: Etsuro Fujita >> >> Cc: pgsql-hackers@postgresql.org >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV >> >> file foreign tables >> >> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita >> > <fujita.etsuro@lab.ntt.co.jp> >> >> wrote: >> >> > I would like to propose to improve parsing efficiency of >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et >> >> > al.[1], which means that for a CSV/TEXT file foreign table, >> >> > file_fdw performs binary conversion only for the columns needed for >> >> > query processing. Attached is a WIP patch implementing the feature. >> >> >> >> Can you add this to the next CommitFest? Looks interesting. >> > >> > Done. >> > >> > Best regards, >> > Etsuro Fujita >> > >> >> -- >> >> Robert Haas >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL >> >> Company >> >> >> > >> > >> > >> > -- >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To >> > make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-hackers >> >> >> >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make >> changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > > > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hi Kaigai-san, > -----Original Message----- > From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp] > Sent: Monday, June 25, 2012 9:49 PM > To: Etsuro Fujita > Cc: Robert Haas; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > foreign tables > > Fujita-san, > > The revised patch is almost good for me. > Only point I noticed is the check for redundant or duplicated options I pointed > out on the previous post. > So, if you agree with the attached patch, I'd like to hand over this patch for > committers. OK I agree with you. Thanks for the revision! Besides the revision, I modified check_selective_binary_conversion() to run heap_close() in the whole-row-reference case. Attached is an updated version of the patch. Thanks. Best regards, Etsuro Fujita > All I changed is: > --- a/src/backend/commands/copy.c > +++ b/src/backend/commands/copy.c > @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index > 98bcb2f..0143d81 100644 > } > + else if (strcmp(defel->defname, "convert_binary") == 0) > + { > -+ if (cstate->convert_binary) > ++ if (cstate->convert_selectively) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("conflicting or redundant options"))); > > Thanks, > > 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > > Hi KaiGai-san, > > > > Thank you for the review. > > > >> -----Original Message----- > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai > >> Sent: Wednesday, June 20, 2012 1:26 AM > >> To: Etsuro Fujita > >> Cc: Robert Haas; pgsql-hackers@postgresql.org > >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV > >> file foreign tables > >> > >> Hi Fujita-san, > >> > >> Could you rebase this patch towards the latest tree? > >> It was unavailable to apply the patch cleanly. > > > > Sorry, I updated the patch. Please find attached an updated version > > of the patch. > > > >> I looked over the patch, then noticed a few points. > >> > >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it? > >> If so, cstate->convert_binary is not a suitable flag to check > >> redundant option. It seems to me cstate->convert_selectively is more > >> suitable flag to check it. > >> > >> + else if (strcmp(defel->defname, "convert_binary") == 0) > >> + { > >> + if (cstate->convert_binary) > >> + ereport(ERROR, > >> + (errcode(ERRCODE_SYNTAX_ERROR), > >> + errmsg("conflicting or redundant > >> + options"))); > >> + cstate->convert_selectively = true; > >> + if (defel->arg == NULL || IsA(defel->arg, List)) > >> + cstate->convert_binary = (List *) defel->arg; > >> + else > >> + ereport(ERROR, > >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > >> + errmsg("argument to option \"%s\" must be a > >> list of column names", > >> + defel->defname))); > >> + } > > > > Yes, defel->arg can be NIL. defel->arg is a List structure listing > > all the columns needed to be converted to binary representation, which > > is NIL for the case where no columns are needed to be converted. For > > example, > > defel->arg is NIL for SELECT COUNT(*). In this case, while > > cstate->convert_selectively is set to true, no columns are converted > > cstate->at > > NextCopyFrom(). Most efficient case! In short, > > cstate->convert_selectively represents whether to do selective binary > > conversion at NextCopyFrom(), and > > cstate->convert_binary represents all the columns to be converted at > > NextCopyFrom(), that can be NIL. > > > >> At NextCopyFrom(), this routine computes default values if configured. > >> In case when these values are not referenced, it might be possible to > >> skip unnecessary calculations. > >> Is it unavailable to add logic to avoid to construct cstate->defmap > >> on unreferenced columns at BeginCopyFrom()? > > > > I think that we don't need to add the above logic because file_fdw > > does > > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom() > > doesn't construct cstate->defmap at all. > > > > I fixed a bug plus some minor optimization in > > check_binary_conversion() that is renamed to > > check_selective_binary_conversion() in the updated version, and now > > file_fdw gives up selective binary conversion for the following > > cases: > > > > a) BINARY format > > b) CSV/TEXT format and whole row reference > > c) CSV/TEXT format and all the user attributes needed > > > > > > Best regards, > > Etsuro Fujita > > > >> Thanks, > >> > >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > >> >> -----Original Message----- > >> >> From: Robert Haas [mailto:robertmhaas@gmail.com] > >> >> Sent: Friday, May 11, 2012 1:36 AM > >> >> To: Etsuro Fujita > >> >> Cc: pgsql-hackers@postgresql.org > >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of > >> >> CSV file foreign tables > >> >> > >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita > >> > <fujita.etsuro@lab.ntt.co.jp> > >> >> wrote: > >> >> > I would like to propose to improve parsing efficiency of > >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et > >> >> > al.[1], which means that for a CSV/TEXT file foreign table, > >> >> > file_fdw performs binary conversion only for the columns needed > >> >> > for query processing. Attached is a WIP patch implementing the feature. > >> >> > >> >> Can you add this to the next CommitFest? Looks interesting. > >> > > >> > Done. > >> > > >> > Best regards, > >> > Etsuro Fujita > >> > > >> >> -- > >> >> Robert Haas > >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise > >> >> PostgreSQL Company > >> >> > >> > > >> > > >> > > >> > -- > >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > >> > To make changes to your subscription: > >> > http://www.postgresql.org/mailpref/pgsql-hackers > >> > >> > >> > >> -- > >> KaiGai Kohei <kaigai@kaigai.gr.jp> > >> > >> -- > >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To > >> make changes to your subscription: > >> http://www.postgresql.org/mailpref/pgsql-hackers > >> > > > > > > > > > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp>
2012/6/26 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > Hi Kaigai-san, > >> -----Original Message----- >> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp] >> Sent: Monday, June 25, 2012 9:49 PM >> To: Etsuro Fujita >> Cc: Robert Haas; pgsql-hackers@postgresql.org >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file >> foreign tables >> >> Fujita-san, >> >> The revised patch is almost good for me. >> Only point I noticed is the check for redundant or duplicated options I > pointed >> out on the previous post. >> So, if you agree with the attached patch, I'd like to hand over this patch for >> committers. > > OK I agree with you. Thanks for the revision! > > Besides the revision, I modified check_selective_binary_conversion() to run > heap_close() in the whole-row-reference case. Attached is an updated version of > the patch. > Sorry, I overlooked this code path. It seems to me fair enough. So, I'd like to take over this patch for committers. Thanks, > Thanks. > > Best regards, > Etsuro Fujita > >> All I changed is: >> --- a/src/backend/commands/copy.c >> +++ b/src/backend/commands/copy.c >> @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index >> 98bcb2f..0143d81 100644 >> } >> + else if (strcmp(defel->defname, "convert_binary") == 0) >> + { >> -+ if (cstate->convert_binary) >> ++ if (cstate->convert_selectively) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("conflicting or redundant options"))); >> >> Thanks, >> >> 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: >> > Hi KaiGai-san, >> > >> > Thank you for the review. >> > >> >> -----Original Message----- >> >> From: pgsql-hackers-owner@postgresql.org >> >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai >> >> Sent: Wednesday, June 20, 2012 1:26 AM >> >> To: Etsuro Fujita >> >> Cc: Robert Haas; pgsql-hackers@postgresql.org >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV >> >> file foreign tables >> >> >> >> Hi Fujita-san, >> >> >> >> Could you rebase this patch towards the latest tree? >> >> It was unavailable to apply the patch cleanly. >> > >> > Sorry, I updated the patch. Please find attached an updated version >> > of the patch. >> > >> >> I looked over the patch, then noticed a few points. >> >> >> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it? >> >> If so, cstate->convert_binary is not a suitable flag to check >> >> redundant option. It seems to me cstate->convert_selectively is more >> >> suitable flag to check it. >> >> >> >> + else if (strcmp(defel->defname, "convert_binary") == 0) >> >> + { >> >> + if (cstate->convert_binary) >> >> + ereport(ERROR, >> >> + (errcode(ERRCODE_SYNTAX_ERROR), >> >> + errmsg("conflicting or redundant >> >> + options"))); >> >> + cstate->convert_selectively = true; >> >> + if (defel->arg == NULL || IsA(defel->arg, List)) >> >> + cstate->convert_binary = (List *) defel->arg; >> >> + else >> >> + ereport(ERROR, >> >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> >> + errmsg("argument to option \"%s\" must be a >> >> list of column names", >> >> + defel->defname))); >> >> + } >> > >> > Yes, defel->arg can be NIL. defel->arg is a List structure listing >> > all the columns needed to be converted to binary representation, which >> > is NIL for the case where no columns are needed to be converted. For >> > example, >> > defel->arg is NIL for SELECT COUNT(*). In this case, while >> > cstate->convert_selectively is set to true, no columns are converted >> > cstate->at >> > NextCopyFrom(). Most efficient case! In short, >> > cstate->convert_selectively represents whether to do selective binary >> > conversion at NextCopyFrom(), and >> > cstate->convert_binary represents all the columns to be converted at >> > NextCopyFrom(), that can be NIL. >> > >> >> At NextCopyFrom(), this routine computes default values if configured. >> >> In case when these values are not referenced, it might be possible to >> >> skip unnecessary calculations. >> >> Is it unavailable to add logic to avoid to construct cstate->defmap >> >> on unreferenced columns at BeginCopyFrom()? >> > >> > I think that we don't need to add the above logic because file_fdw >> > does >> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom() >> > doesn't construct cstate->defmap at all. >> > >> > I fixed a bug plus some minor optimization in >> > check_binary_conversion() that is renamed to >> > check_selective_binary_conversion() in the updated version, and now >> > file_fdw gives up selective binary conversion for the following >> > cases: >> > >> > a) BINARY format >> > b) CSV/TEXT format and whole row reference >> > c) CSV/TEXT format and all the user attributes needed >> > >> > >> > Best regards, >> > Etsuro Fujita >> > >> >> Thanks, >> >> >> >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: >> >> >> -----Original Message----- >> >> >> From: Robert Haas [mailto:robertmhaas@gmail.com] >> >> >> Sent: Friday, May 11, 2012 1:36 AM >> >> >> To: Etsuro Fujita >> >> >> Cc: pgsql-hackers@postgresql.org >> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of >> >> >> CSV file foreign tables >> >> >> >> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita >> >> > <fujita.etsuro@lab.ntt.co.jp> >> >> >> wrote: >> >> >> > I would like to propose to improve parsing efficiency of >> >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et >> >> >> > al.[1], which means that for a CSV/TEXT file foreign table, >> >> >> > file_fdw performs binary conversion only for the columns needed >> >> >> > for query processing. Attached is a WIP patch implementing the > feature. >> >> >> >> >> >> Can you add this to the next CommitFest? Looks interesting. >> >> > >> >> > Done. >> >> > >> >> > Best regards, >> >> > Etsuro Fujita >> >> > >> >> >> -- >> >> >> Robert Haas >> >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise >> >> >> PostgreSQL Company >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> >> > To make changes to your subscription: >> >> > http://www.postgresql.org/mailpref/pgsql-hackers >> >> >> >> >> >> >> >> -- >> >> KaiGai Kohei <kaigai@kaigai.gr.jp> >> >> >> >> -- >> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To >> >> make changes to your subscription: >> >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> >> > >> > >> > >> > >> >> >> >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hi Kaigai-san, > -----Original Message----- > From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp] > Sent: Tuesday, June 26, 2012 11:05 PM > To: Etsuro Fujita > Cc: Robert Haas; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > foreign tables > > 2012/6/26 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > > Hi Kaigai-san, > > > >> -----Original Message----- > >> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp] > >> Sent: Monday, June 25, 2012 9:49 PM > >> To: Etsuro Fujita > >> Cc: Robert Haas; pgsql-hackers@postgresql.org > >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > >> foreign tables > >> > >> Fujita-san, > >> > >> The revised patch is almost good for me. > >> Only point I noticed is the check for redundant or duplicated options I > > pointed > >> out on the previous post. > >> So, if you agree with the attached patch, I'd like to hand over this patch > for > >> committers. > > > > OK I agree with you. Thanks for the revision! > > > > Besides the revision, I modified check_selective_binary_conversion() to run > > heap_close() in the whole-row-reference case. Attached is an updated version > of > > the patch. > > > Sorry, I overlooked this code path. No, It's my fault. > So, I'd like to take over this patch for committers. Thanks, Best regards, Etsuro Fujita > > Thanks, > > > Thanks. > > > > Best regards, > > Etsuro Fujita > > > >> All I changed is: > >> --- a/src/backend/commands/copy.c > >> +++ b/src/backend/commands/copy.c > >> @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index > >> 98bcb2f..0143d81 100644 > >> } > >> + else if (strcmp(defel->defname, "convert_binary") == 0) > >> + { > >> -+ if (cstate->convert_binary) > >> ++ if (cstate->convert_selectively) > >> + ereport(ERROR, > >> + (errcode(ERRCODE_SYNTAX_ERROR), > >> + errmsg("conflicting or redundant options"))); > >> > >> Thanks, > >> > >> 2012/6/20 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > >> > Hi KaiGai-san, > >> > > >> > Thank you for the review. > >> > > >> >> -----Original Message----- > >> >> From: pgsql-hackers-owner@postgresql.org > >> >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai > >> >> Sent: Wednesday, June 20, 2012 1:26 AM > >> >> To: Etsuro Fujita > >> >> Cc: Robert Haas; pgsql-hackers@postgresql.org > >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV > >> >> file foreign tables > >> >> > >> >> Hi Fujita-san, > >> >> > >> >> Could you rebase this patch towards the latest tree? > >> >> It was unavailable to apply the patch cleanly. > >> > > >> > Sorry, I updated the patch. Please find attached an updated version > >> > of the patch. > >> > > >> >> I looked over the patch, then noticed a few points. > >> >> > >> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it? > >> >> If so, cstate->convert_binary is not a suitable flag to check > >> >> redundant option. It seems to me cstate->convert_selectively is more > >> >> suitable flag to check it. > >> >> > >> >> + else if (strcmp(defel->defname, "convert_binary") == 0) > >> >> + { > >> >> + if (cstate->convert_binary) > >> >> + ereport(ERROR, > >> >> + (errcode(ERRCODE_SYNTAX_ERROR), > >> >> + errmsg("conflicting or redundant > >> >> + options"))); > >> >> + cstate->convert_selectively = true; > >> >> + if (defel->arg == NULL || IsA(defel->arg, List)) > >> >> + cstate->convert_binary = (List *) defel->arg; > >> >> + else > >> >> + ereport(ERROR, > >> >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > >> >> + errmsg("argument to option \"%s\" must be a > >> >> list of column names", > >> >> + defel->defname))); > >> >> + } > >> > > >> > Yes, defel->arg can be NIL. defel->arg is a List structure listing > >> > all the columns needed to be converted to binary representation, which > >> > is NIL for the case where no columns are needed to be converted. For > >> > example, > >> > defel->arg is NIL for SELECT COUNT(*). In this case, while > >> > cstate->convert_selectively is set to true, no columns are converted > >> > cstate->at > >> > NextCopyFrom(). Most efficient case! In short, > >> > cstate->convert_selectively represents whether to do selective binary > >> > conversion at NextCopyFrom(), and > >> > cstate->convert_binary represents all the columns to be converted at > >> > NextCopyFrom(), that can be NIL. > >> > > >> >> At NextCopyFrom(), this routine computes default values if configured. > >> >> In case when these values are not referenced, it might be possible to > >> >> skip unnecessary calculations. > >> >> Is it unavailable to add logic to avoid to construct cstate->defmap > >> >> on unreferenced columns at BeginCopyFrom()? > >> > > >> > I think that we don't need to add the above logic because file_fdw > >> > does > >> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom() > >> > doesn't construct cstate->defmap at all. > >> > > >> > I fixed a bug plus some minor optimization in > >> > check_binary_conversion() that is renamed to > >> > check_selective_binary_conversion() in the updated version, and now > >> > file_fdw gives up selective binary conversion for the following > >> > cases: > >> > > >> > a) BINARY format > >> > b) CSV/TEXT format and whole row reference > >> > c) CSV/TEXT format and all the user attributes needed > >> > > >> > > >> > Best regards, > >> > Etsuro Fujita > >> > > >> >> Thanks, > >> >> > >> >> 2012/5/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > >> >> >> -----Original Message----- > >> >> >> From: Robert Haas [mailto:robertmhaas@gmail.com] > >> >> >> Sent: Friday, May 11, 2012 1:36 AM > >> >> >> To: Etsuro Fujita > >> >> >> Cc: pgsql-hackers@postgresql.org > >> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of > >> >> >> CSV file foreign tables > >> >> >> > >> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita > >> >> > <fujita.etsuro@lab.ntt.co.jp> > >> >> >> wrote: > >> >> >> > I would like to propose to improve parsing efficiency of > >> >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et > >> >> >> > al.[1], which means that for a CSV/TEXT file foreign table, > >> >> >> > file_fdw performs binary conversion only for the columns needed > >> >> >> > for query processing. Attached is a WIP patch implementing the > > feature. > >> >> >> > >> >> >> Can you add this to the next CommitFest? Looks interesting. > >> >> > > >> >> > Done. > >> >> > > >> >> > Best regards, > >> >> > Etsuro Fujita > >> >> > > >> >> >> -- > >> >> >> Robert Haas > >> >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise > >> >> >> PostgreSQL Company > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > >> >> > To make changes to your subscription: > >> >> > http://www.postgresql.org/mailpref/pgsql-hackers > >> >> > >> >> > >> >> > >> >> -- > >> >> KaiGai Kohei <kaigai@kaigai.gr.jp> > >> >> > >> >> -- > >> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To > >> >> make changes to your subscription: > >> >> http://www.postgresql.org/mailpref/pgsql-hackers > >> >> > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> >
"Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes: > Besides the revision, I modified check_selective_binary_conversion() to run > heap_close() in the whole-row-reference case. Attached is an updated version of > the patch. Applied with minor, mostly-cosmetic revisions. I did fix check_selective_binary_conversion to not continue touching the relation's tupledesc after heap_close. Also I thought "convert_selectively" was a better name for the hidden COPY option. regards, tom lane
Thanks! Best regards, Etsuro Fujita > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, July 13, 2012 5:30 AM > To: Etsuro Fujita > Cc: 'Kohei KaiGai'; 'Robert Haas'; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file > foreign tables > > "Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes: > > Besides the revision, I modified check_selective_binary_conversion() to run > > heap_close() in the whole-row-reference case. Attached is an updated version > of > > the patch. > > Applied with minor, mostly-cosmetic revisions. I did fix > check_selective_binary_conversion to not continue touching the > relation's tupledesc after heap_close. Also I thought > "convert_selectively" was a better name for the hidden COPY option. > > regards, tom lane