Thread: COPY and default values

COPY and default values

From

Neil Conway

Date:

27 May 2002, 15:44:00

The attached patch changes COPY so that if no value for an attribute
is specified, the default value for the attribute (or the attribute's
type) will be assigned, rather than NULL.

I'll write some regression tests and update the docs later. Also,
there's a possible optimization that I didn't implement: if the
default value Node is a 'Const', we can safely evaluate and store
the default value once for the lifetime of the COPY, rather than
every time that it is needed. However, Const Nodes are evaluated
very quickly by ExecEvalExpr(), so I'm not sure if it's worth
worrying about...

There's one major problem with the patch, which I haven't been
able to fix: when there are multiple rows of data which require
default values, the COPY fails:

nconway=# create table test (a int, b text default 'def value');
CREATE TABLE
nconway=# copy test from stdin;
>> 5
>> \.
nconway=# copy test from stdin;
>> 10
>> 15
>> \.
ERROR:  copy: line 2, Memory exhausted in AllocSetAlloc(1065320379)

I think I've mismanaged the memory contexts involved somehow, but
I'm not sure what the problem is. Any help would be appreciated...

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Attachment

copy-def-value-2.patch

Re: COPY and default values

From

Tom Lane

Date:

27 May 2002, 17:28:13

Neil Conway <nconway@klamath.dyndns.org> writes:
> ERROR:  copy: line 2, Memory exhausted in AllocSetAlloc(1065320379)

> I think I've mismanaged the memory contexts involved somehow, but
> I'm not sure what the problem is. Any help would be appreciated...

Well, for one thing you're doing

+         for (i = 0; i < attr_count; i++)
+         {
+             if (nulls[i] == 'd' && values[i] == 0)
+             {
+                 bool isNull;
+
+                 values[i] = ExecEvalExprSwitchContext(defaults[i], econtext,
+                                                       &isNull, NULL);
+
+                 /* If it's NULL, return value is meaningless */
+                 if (isNull)
+                 {
+                     values[i] = 0;
+                     nulls[i] = 'n';
+                 }
+                 else
+                     nulls[i] = ' ';
+
+                 ResetPerTupleExprContext(estate);
+             }
+         }

which means you reset the memory context containing the default results
before those results can ever get used, leaving dangling pointers in
values[i].  (Hint: there's a reason why the econtext is called the
"per-tuple" context, not "per-column" context.  The one reset that's in
there now is sufficient.)

The nulls = 'd' mechanism is ugly and unnecessary, I think.  We were
intending to modify COPY's behavior to prohibit missing/extra fields
in incoming rows anyway, so there's no reason not to know in advance
exactly which columns need to have defaults inserted, and to set up
default info for only those columns.  I'm also somewhat uncomfortable
with the fact that the patch invokes ExecEvalExpr on a NULL pointer
if there is a default-less column involved --- perhaps that works at
the moment but I don't like it.  Should special-case that.

The pfree's you've added near line 1021 look rather pointless, seeing
as how (a) they're only pfree'ing the topmost node of the default
expressions, and (b) the whole context is about to get reset anyhow.
There isn't a lot of percentage in any of those end-of-statement
pfrees that are there now...

I didn't like the aspect of the patch that moves build_column_default
into execUtils.c.  It's not an executor routine (as evidenced by the
fact that you had to add a pile of new inclusions to execUtils.c to
make it compile).  I'm not sure of the cleanest place for it; maybe
someplace in backend/catalog/, though really there's nothing wrong with
keeping it in the rewriter.

I haven't tried to run the patch, but those were some things I noticed
in a quick eyeball review...

            regards, tom lane

Re: COPY and default values

From

Neil Conway

Date:

27 May 2002, 19:45:11

On Mon, 27 May 2002 17:15:21 -0400
"Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Neil Conway <nconway@klamath.dyndns.org> writes:
> > ERROR:  copy: line 2, Memory exhausted in AllocSetAlloc(1065320379)
>
> > I think I've mismanaged the memory contexts involved somehow, but
> > I'm not sure what the problem is. Any help would be appreciated...

> [You] reset the memory context containing the default results
> before those results can ever get used, leaving dangling pointers in
> values[i].  (Hint: there's a reason why the econtext is called the
> "per-tuple" context, not "per-column" context.  The one reset that's in
> there now is sufficient.)

Woops :-) Ok, removed that. Actually, I had just inserted that when
I was trying to debug the assertion failure I mentioned earlier.

> The nulls = 'd' mechanism is ugly and unnecessary, I think.  We were
> intending to modify COPY's behavior to prohibit missing/extra fields
> in incoming rows anyway, so there's no reason not to know in advance
> exactly which columns need to have defaults inserted, and to set up
> default info for only those columns.

Can you elaborate on exactly how you'd like to see COPY's behavior
changed? What's the rationale for disallowing missing fields in
COPY input?

> I'm also somewhat uncomfortable
> with the fact that the patch invokes ExecEvalExpr on a NULL pointer
> if there is a default-less column involved --- perhaps that works at
> the moment but I don't like it.  Should special-case that.

Good spot -- there's a special case for that now.

> The pfree's you've added near line 1021 look rather pointless, seeing
> as how (a) they're only pfree'ing the topmost node of the default
> expressions, and (b) the whole context is about to get reset anyhow.
> There isn't a lot of percentage in any of those end-of-statement
> pfrees that are there now...

OK, removed them.

> I didn't like the aspect of the patch that moves build_column_default
> into execUtils.c.  It's not an executor routine (as evidenced by the
> fact that you had to add a pile of new inclusions to execUtils.c to
> make it compile).  I'm not sure of the cleanest place for it; maybe
> someplace in backend/catalog/, though really there's nothing wrong with
> keeping it in the rewriter.

Oh, yeah -- I forgot to mention that. I wasn't really sure where that
code should go, so I basically picked execUtils.c at random -- if you'd
prefer that I move the code to somewhere in catalog/ that's fine,
just let me know where. I don't have a strong preference myself.

> I haven't tried to run the patch, but those were some things I noticed
> in a quick eyeball review...

Thanks Tom. However, after applying the changes noted above, the test
case still fails (and I'm still scratching my head about what's
causing the problem). A new version of the patch is attached.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Attachment

copy-def-value-3.patch

Re: COPY and default values

From

Tom Lane

Date:

27 May 2002, 20:04:42

Neil Conway <nconway@klamath.dyndns.org> writes:
> Can you elaborate on exactly how you'd like to see COPY's behavior
> changed? What's the rationale for disallowing missing fields in
> COPY input?

Robustness.  We've seen problem reports in the past when COPY got out of
sync with the data stream because of loss or insertion of a newline (eg,
due to faulty quoting logic in an application; this scenario is not as
improbable as you might think).  Rejecting lines that don't seem to have
the right number of fields would go a long way towards detecting that
sort of mistake.  And, since the existing behavior with unexpected
numbers of fields has never been documented, no one feels any strong
compunction about changing it.

This has been hashed out in several prior threads, and in fact I think
Brent Verner has a nearly-ready-to-go patch implementing the agreed-on
behavior.  Now that I think about it, your patch is heading off in a
quite different direction from what was agreed to: we were never
intending that omitting trailing fields would be the cue for inserting
default values.  Without a way to specify a target column list, that's
far too restrictive; and with it, it's unnecessary.

            regards, tom lane

Re: COPY and default values

From

Neil Conway

Date:

27 May 2002, 20:22:31

On Mon, 27 May 2002 19:28:30 -0400
"Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> This has been hashed out in several prior threads, and in fact I think
> Brent Verner has a nearly-ready-to-go patch implementing the agreed-on
> behavior.  Now that I think about it, your patch is heading off in a
> quite different direction from what was agreed to: we were never
> intending that omitting trailing fields would be the cue for inserting
> default values.  Without a way to specify a target column list, that's
> far too restrictive; and with it, it's unnecessary.

Ok, sounds fair. I think my patch is an improvement over the current
behavior (if you're going to silently insert arbitrary data for
missing attributes, I'd say the default value is a better choice
than NULL), but I would agree that this kind of behavior as a whole
is not optimal. Assuming that the patch from Brent materializes, I'll
be happy to take his solution over mine.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: COPY and default values

From

Tom Lane

Date:

27 May 2002, 22:39:04

Neil Conway <nconway@klamath.dyndns.org> writes:
> Thanks Tom. However, after applying the changes noted above, the test
> case still fails (and I'm still scratching my head about what's
> causing the problem). A new version of the patch is attached.

I got curious, so I tried out this version of the patch.  Some debugging
demonstrated that the value being pointed to by the default expression's
Const node was getting freed :-(, evidently by the loop at copy.c lines
1014-1018 (as patched).  Said loop is really pretty bogus anyway, since
it effectively assumes that every pass-by-ref datatype's input function
always returns a freshly-palloc'd value.  (This isn't the only place
that assumes that, if memory serves; but over time I'd like to eliminate
such assumptions.)  In the presence of values generated by means other
than datatype input functions, the assumption breaks down completely.

What I would like to do about this is to eliminate the explicit pfree's
entirely; they are both dangerous, as shown by this example, and
inadequate to guarantee no memory leak.  (For example, any storage
leaked internally by a datatype conversion function cannot get reclaimed
by this approach.)  A much more bulletproof approach in both COPY IN and
COPY OUT would be to run the per-tuple loop in a memory context that we
reset at the top of the loop.  Then any palloc'd leaked storage will be
reclaimed at the end of the tuple cycle.  COPY IN may as well use the
per-tuple exprcontext's memory context, which we're already using and
resetting.  COPY OUT doesn't seem to need an exprcontext at the moment,
but it might be best to use parallel coding anyway.

Brent Verner hasn't submitted any version of his patch yet, but he did
pop up a day or so ago to say that he still intended to do so.  Maybe
you should offer to help him whip it into shape ...

            regards, tom lane

Re: COPY and default values

From

Brent Verner

Date:

27 May 2002, 23:59:30

[2002-05-27 19:28] Tom Lane said:
| Neil Conway <nconway@klamath.dyndns.org> writes:
| > Can you elaborate on exactly how you'd like to see COPY's behavior
| > changed? What's the rationale for disallowing missing fields in
| > COPY input?
|
| Robustness.  We've seen problem reports in the past when COPY got out of
| sync with the data stream because of loss or insertion of a newline (eg,
| due to faulty quoting logic in an application; this scenario is not as
| improbable as you might think).  Rejecting lines that don't seem to have
| the right number of fields would go a long way towards detecting that
| sort of mistake.  And, since the existing behavior with unexpected
| numbers of fields has never been documented, no one feels any strong
| compunction about changing it.
|
| This has been hashed out in several prior threads, and in fact I think
| Brent Verner has a nearly-ready-to-go patch implementing the agreed-on
| behavior.  Now that I think about it, your patch is heading off in a
| quite different direction from what was agreed to: we were never
| intending that omitting trailing fields would be the cue for inserting
| default values.  Without a way to specify a target column list, that's
| far too restrictive; and with it, it's unnecessary.

attached is an updated patch (copy.colspec.20020527.diff), against
HEAD, implementing the following COPY syntax

  COPY sleep (n,o,w) FROM stdin;

  where (n,o,w) may be a subset of columns in the "sleep" table.
If a column is not specified, and has an attribute default, the
default value will be evaluated and used.  There is no support for
COPY (<attlist>) TO yet, but that should not be too hairy.  A second
patch for pg_dump to support this syntax is attached, but I have
not even tried to apply it, sorry.  The pg_dump patch is against
a circa 7.2-RELEASE cvs head.  I attach it in case someone else
wants to help wrap this up. I have been death-marching for 3-4 months
and I'm not sure I see the top of the hill yet...



How it does what it does.

  I modified gram.y to add an opt_column_list to CopyStmt.  This
List is sent down to DoCopy.  If no attlist is present, I build
an attlist from the relation's available attributes, enabling
the old behavior.  Once inside CopyFrom, the List attlist is
scanned to build two arrays, one representing the actual index
of the specified column, and another containing available default
exprs.  Attributes are read from the input stream and assigned to
their appropriate location in the new tuple.  If there are any
defaults available, they are then evaluated and placed in the new
tuple arrays.  That's really it.  I'm not _at all_ confident that
there is not a better way to do this, but this is what I've done.
  The patch will elog(ERROR) in DoCopy, before Copy(To|From) is
entered, if a column list is specified that cannot be handled
(non-existent columns, duplicate columns).  Should I be waiting
until I'm _in_ Copy(To|From) to elog(ERROR) out?

I'll try to add some commentary to that snakey code over the next
week, and will resubmit an updated patch.


Anyone questions regarding this patch should be sent 'To:' me and
'CC:' the list, as I haven't had the luxury of reading the good
list lately :-(.

comments/brain-death-fixes are welcome.

cheers.
  b

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman