On Mon, Jan 27, 2025 at 4:17 PM jian he <jian.universality@gmail.com> wrote:
>
> hi.
>
> There are two ways we can use to represent the new copy format: json.
> 1.
> typedef struct CopyFormatOptions
> {
> bool binary; /* binary format? */
> bool freeze; /* freeze rows on loading? */
> bool csv_mode; /* Comma Separated Value format? */
> bool json_mode; /* JSON format? */
> ...
> }
>
> 2.
> typedef struct CopyFormatOptions
> {
> CopyFormat format; /* format of the COPY operation */
> .....
> }
>
> typedef enum CopyFormat
> {
> COPY_FORMAT_TEXT = 0,
> COPY_FORMAT_BINARY,
> COPY_FORMAT_CSV,
> COPY_FORMAT_JSON,
> } CopyFormat;
>
> both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
> so the struct size will not influence the performance.
>
> I also did some benchmarks when using CopyFormat.
> the following are the benchmarks info:
> -------------------------------------------------------------------------------------------------------
> create unlogged table t as select g from generate_series(1, 1_000_000) g;
>
> build_type=release patch:
> copy t to '/dev/null' json \watch i=0.1 c=10
> last execution Time: 108.741 ms
>
> copy t to '/dev/null' (format text) \watch i=0.1 c=10
> last execution Time: 42.600 ms
>
> build_type=release master:
> copy t to '/dev/null' (format text) \watch i=0.1 c=10
> last execution Time Time: 42.948 ms
>
> ---------------------------------------------------------
> so a new version is attached, using the struct CopyFormatOptions.
>
> changes mainly in CopyOneRowTo.
> now it is:
> """"
> if(!cstate->rel)
> {
> memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
> TupleDescAttr(cstate->queryDesc->tupDesc, 0),
> cstate->queryDesc->tupDesc->natts *
> sizeof(FormData_pg_attribute));
> for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
> populate_compact_attribute(slot->tts_tupleDescriptor, i);
> BlessTupleDesc(slot->tts_tupleDescriptor);
> }
> """"
> reasoning for change:
> for composite_to_json to construct json key, we only need
> FormData_pg_attribute.attname
> but code path
> composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
> means we also need to call populate_compact_attribute to populate
> other attributes.
>
> v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
> author is by Joel Jacobson.
> As I mentioned in above,
> replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
> but consolidated 3 bool fields into one enum to make code more lean.
> I think the refactoring (v14-0001) is worth it.
I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
see 2e4127b6d2.
v15-0001 is the merged one of v14-0001 and v14-0002
There are some other ongoing *copy to/from* refactors[1] which we can benefit
to make the code cleaner, especially the checks done in ProcessCopyOptions.
[1]: https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com
--
Regards
Junwang Zhao