Re: [Proposal] Adding TRIM_SPACE option to COPY - Mailing list pgsql-hackers

From 河田達也
Subject Re: [Proposal] Adding TRIM_SPACE option to COPY
Date
Msg-id CAHza6qfffimVp=+SOr533JHXMazNE6F6UFhxE0HjzQu9JE5-EA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Adding TRIM_SPACE option to COPY  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hi all,

Thank you for the feedback and for taking the time to review this patch.

## Performance Measurement Results
> I also agree that this feature probably won't add noticeable overhead to
> COPY when it isn't used, but it would still be good to measure
> the performance impact of the patch.
I conducted simple performance measurements with 5 million rows to assess the overhead
when TRIM_SPACE is not used:

**BEFORE patch (master branch):**
Time: 5565.584 ms (00:05.566)
Time: 5626.593 ms (00:05.627)

**AFTER patch (TRIM_SPACE disabled):**
Time: 5840.472 ms (00:05.840)
Time: 5523.806 ms (00:05.524)

The overhead appears to be approximately 1.5%, which seems to be within
an acceptable range for typical COPY operations.

## Regarding COPY PROGRAM as an Alternative

I understand Tom's concern about adding features to core that could be
implemented externally.

> As for TRIM_SPACE, it seems possible to implement it as an external module
> and call it via COPY PROGRAM. Is this true?
You're right that TRIM_SPACE can technically be
achieved using COPY PROGRAM with an external script. I'd like to share
some observations about this approach:

**Example implementation with COPY PROGRAM:**

```python
#!/usr/bin/env python3
import sys
import csv

reader = csv.reader(sys.stdin)
writer = csv.writer(sys.stdout)

for row in reader:
    trimmed_row = [field.strip() for field in row]
    writer.writerow(trimmed_row)
```

```sql
COPY users FROM PROGRAM 'python3 /tmp/trim_csv.py'
WITH (FORMAT csv, HEADER true);
```

**Some considerations:**

While the COPY PROGRAM approach works, it does have some practical limitations:
it requires elevated privileges (`pg_execute_server_program`), may not be
available on all platforms (particularly Windows), and requires users to
handle parsing correctly. That said, I understand the concern about
where to draw the line for features in core, especially given that options
like FORCE_NULL follow a similar pattern.

## My Thoughts

> So the question is which ones belong in core. On second thought,
> perhaps features that are difficult or impossible to implement outside the core
> are the ones that can be considered for COPY itself. Otherwise, it might be
> better to avoid expanding COPY unnecessarily. Anyway I'd like to hear more
> opinons on this.
I can see the validity of the concern about feature creep.
I'm open to the community's perspective on whether this
belongs in core.

If the general feeling is that this functionality should remain external,
I'm happy to withdraw the patch.

Thank you again for your time and valuable feedback. This discussion has been
very educational for me.

Best regards,
Tatsuya Kawata

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [buildfarm related] Machines gcc experimental failed test_lfind
Next
From: Nathan Bossart
Date:
Subject: Re: [buildfarm related] Machines gcc experimental failed test_lfind