Re: number of loaded/unloaded COPY rows - Mailing list pgsql-hackers

From Volkan YAZICI
Subject Re: number of loaded/unloaded COPY rows
Date
Msg-id 20051217115049.GA518@alamut
Whole thread Raw
In response to Re: number of loaded/unloaded COPY rows  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: number of loaded/unloaded COPY rows
Re: number of loaded/unloaded COPY rows
List pgsql-hackers
On Dec 16 08:47, Bruce Momjian wrote:
> I think an int64 is the proper solution. If int64 isn't really
> 64-bits, the code should still work though.

(I'd prefer uint64 instead of an int64.) In src/include/c.h, in
this or that way it'll assign a type for uint64, so there won't
be any problem for both 64-bit and non-64-bit architectures.

I've attached the updated patch. This one uses uint64 and
UINT64_FORMAT while printing uint64 value inside string.

I used char[20+1] as buffer size to store uint64 value's string
representation. (Because, AFAIK, maximum decimal digit length of
an [u]int64 equals to 2^64 - 1 = 20.) In this context, when I
looked at the example usages (for instance as in
backend/commands/sequence.c) it's stored in a char[100] buffer.
Maybe we should define a constant in pg_config.h like
INT64_PRINT_LEN. This will define a standard behaviour with
INT64_FORMAT for using integers inside strings.

For instance:
  char buf[INT64_PRINT_LEN+1];
  snprintf(buf, sizeof(buf), INT64_FORMAT, var);

> In fact we have this TODO, which is related:
>
>     * Change LIMIT/OFFSET and FETCH/MOVE to use int8
>
> This requires the same type of change.
>
> I have added this TODO:
>
>     * Allow the count returned by SELECT, etc to be to represent
>     as an int64 to allow a higher range of values
>
> This requires a change to es_processed, I think.

I think so. es_processed is defined as uint32. It should be
uint64 too.

I tried to prepare a patch for es_processed issue. But when I look
further in the code, found that there're lots of mixed usages of
"uint32" and "long" for row count related trackings. (Moreover,
as you can see from the patch, there's a problem with ULLONG_MAX
usage in there.)

I'm aware of the patch's out-of-usability, but I just tried to
underline some (IMHO) problems.

Last minute edit: Proposal: Maybe we can define a (./configure
controlled) type like pg_int (with bounds like PG_INT_MAX) to use
in counter related processes.

- * -

AFAIK, there're two ways to implement a counter:

1. Using integer types supplied by the compiler, like uint64 as we
   discussed above.
   Pros: Whole mathematical operations are handled by the compiler.
   Cons: Implementation is bounded by the system architecture.

2. Using arrays to hold numeric values, like we did in the
   implementation of SQL numeric types.
   Pros: Value lengths bounded by available memory.
   Cons: Mathematical operations have to be handled by software.
         Therefore, this will cause a small overhead in performance
         aspect compared to previous implementation.

I'm not sure if we can use the second implementation (in the
performance point of view) for the COPY command's counter. But IMHO
it can be agreeable for SELECT/INSERT/UPDATE/DELETE operations'
counters. OTOH, by using this way, we'll form a proper method for
counting without any (logical) bounds.

What's your opinion? If you aggree, I'll try to use the second
implementation for counters - except COPY.


Regards.

--
"We are the middle children of history, raised by television to believe
that someday we'll be millionaires and movie stars and rock stars, but
we won't. And we're just learning this fact," Tyler said. "So don't
fuck with us."

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Immodest Proposal: pg_catalog.pg_ddl
Next
From: Bruce Momjian
Date:
Subject: Re: psql and COPY BINARY