Re: New "raw" COPY format - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: New "raw" COPY format |
Date | |
Msg-id | CACJufxGWet+n+E7-ymwMxA8cFPGc65CmBpxOfT_hi9OPnou3Gg@mail.gmail.com Whole thread Raw |
In response to | Re: New "raw" COPY format (Tatsuo Ishii <ishii@postgresql.org>) |
List | pgsql-hackers |
On Wed, Oct 16, 2024 at 2:37 PM Joel Jacobson <joel@compiler.org> wrote: > > On Wed, Oct 16, 2024, at 05:31, jian he wrote: > > Hi. > > I only checked 0001, 0002, 0003. > > the raw format patch is v9-0016. > > 003-0016 is a lot of small patches, maybe you can consolidate it to > > make the review more easier. > > Thanks for reviewing. > > OK, I've consolidated the v9 0003-0016 into a single patch. > + <refsect2> + <title>Raw Format</title> + + <para> + This format option is used for importing and exporting files containing + unstructured text, where each line is treated as a single field. It is + ideal for data that does not conform to a structured, tabular format and + lacks delimiters. + </para> + + <para> + In the <literal>raw</literal> format, each line of the input or output is + considered a complete value without any field separation. There are no + field delimiters, and all characters are taken literally. There is no + special handling for quotes, backslashes, or escape sequences. All + characters, including whitespace and special characters, are preserved + exactly as they appear in the file. However, it's important to note that + the text is still interpreted according to the specified <literal>ENCODING</literal> + option or the current client encoding for input, and encoded using the + specified <literal>ENCODING</literal> or the current client encoding for output. + </para> + + <para> + When using this format, the <command>COPY</command> command must specify + exactly one column. Specifying multiple columns will result in an error. + If the table has multiple columns and no column list is provided, an error + will occur. + </para> + + <para> + The <literal>raw</literal> format does not distinguish a <literal>NULL</literal> + value from an empty string. Empty lines are imported as empty strings, not + as <literal>NULL</literal> values. + </para> + + <para> + Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats. + </para> + + </refsect2> + + <refsect2> + <title>Raw Format</title> + + <para> + This format option is used for importing and exporting files containing + unstructured text, where each line is treated as a single field. It is + ideal for data that does not conform to a structured, tabular format and + lacks delimiters. + </para> + + <para> + In the <literal>raw</literal> format, each line of the input or output is + considered a complete value without any field separation. There are no + field delimiters, and all characters are taken literally. There is no + special handling for quotes, backslashes, or escape sequences. All + characters, including whitespace and special characters, are preserved + exactly as they appear in the file. However, it's important to note that + the text is still interpreted according to the specified <literal>ENCODING</literal> + option or the current client encoding for input, and encoded using the + specified <literal>ENCODING</literal> or the current client encoding for output. + </para> + + <para> + When using this format, the <command>COPY</command> command must specify + exactly one column. Specifying multiple columns will result in an error. + If the table has multiple columns and no column list is provided, an error + will occur. + </para> + + <para> + The <literal>raw</literal> format does not distinguish a <literal>NULL</literal> + value from an empty string. Empty lines are imported as empty strings, not + as <literal>NULL</literal> values. + </para> + + <para> + Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats. + </para> + + </refsect2> + <refsect2 id="sql-copy-binary-format" xreflabel="Binary Format"> <title>Binary Format</title> <refsect2> <title>Raw Format</title> is duplicated <title>Raw Format</title> didn't mention the special handling of end-of-data marker. +COPY copy_raw_test (col) FROM :'filename' RAW; we may need to support this. since we not allow COPY x from stdin text; COPY x to stdout text; so I think adding the RAW keyword in gram.y may not be necessary. /* Complete COPY <sth> FROM|TO filename WITH (FORMAT */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT")) COMPLETE_WITH("binary", "csv", "text"); src/bin/psql/tab-complete.in.c, we can also add "raw". /* --- ESCAPE option --- */ if (opts_out->escape) { if (opts_out->format != COPY_FORMAT_CSV) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ errmsg("COPY %s requires CSV mode", "ESCAPE"))); } escape option no regress test. /* --- QUOTE option --- */ if (opts_out->quote) { if (opts_out->format != COPY_FORMAT_CSV) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ errmsg("COPY %s requires CSV mode", "QUOTE"))); } escape option no regress test. CopyOneRowTo else if (cstate->opts.format == COPY_FORMAT_RAW) { int attnum; Datum value; bool isnull; /* Ensure only one column is being copied */ if (list_length(cstate->attnumlist) != 1) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("COPY with format 'raw' must specify exactly one column"))); attnum = linitial_int(cstate->attnumlist); value = slot->tts_values[attnum - 1]; isnull = slot->tts_isnull[attnum - 1]; if (!isnull) { char *string = OutputFunctionCall(&out_functions[attnum - 1], value); CopyAttributeOutRaw(cstate, string); } /* For RAW format, we don't send anything for NULL values */ } We already did column length checking at BeginCopyTo. no need to "if (list_length(cstate->attnumlist) != 1)" error check in CopyOneRowTo?
pgsql-hackers by date: