Thread: Re: [GENERAL] Large databases, performance

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 3 Oct 2002 at 8:54, Charles H. Woloszynski wrote:

> I'd be curious what happens when you submit more queries than you have
> processors (you had four concurrent queries and four CPUs), if you care
> to run any additional tests.  Also, I'd report the query time in
> absolute (like you did) and also in 'Time/number of concurrent queries".
>  This will give you a sense of how the system is scaling as the workload
> increases.  Personally I am more concerned about this aspect than the
> load time, since I am going to guess that this is where all the time is
> spent.

OK. I am back from my cave after some more tests are done. Here are the
results. I am not repeating large part of it but answering your questions..

Don't ask me how these numbers changed. I am not the person who conducts the
test neither I have access to the system. Rest(or most ) of the things remains
same..

MySQL 3.23.52 with innodb transaction support:

4 concurrent queries     :-  257.36 ms
40 concurrent queries    :-  35.12 ms

Postgresql 7.2.2

4 concurrent queries         :- 257.43 ms
40 concurrent     queries        :- 41.16 ms

Though I can not report oracle numbers, suffice to say that they fall in
between these two numbers.

Oracle seems to be hell lot faster than mysql/postgresql to load raw data even
when it's installed on reiserfs. We plan to run XFS tests later in hope that
that would improve mysql/postgresql load times.

In this run postgresql has better load time than mysql/innodb ( 18270 sec v/s
17031 sec.) Index creation times are faster as well (100 sec v/s 130 sec).
Don't know what parameters are changed.

Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All
numbers include indexes. This is really going to be a problem when things are
deployed. Any idea how can it be taken down?

WAL is out, it's not counted.

Schema optimisation is later issue. Right now all three databases are using
same schema..

Will it help in this situation if I recompile posgresql with block size say 32K
rather than 8K default? Will it saev some overhead and offer better performance
in data load etc?

Will keep you guys updated..

Regards,
 Shridhar

-----------------------------------------------------------
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:shridhar_daithankar@persistent.co.in
Phone:- +91-20-5678900 Extn.270
Fax  :- +91-20-5678901
-----------------------------------------------------------


Re: [GENERAL] Large databases, performance

From
Manfred Koizar
Date:
On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar"
<shridhar_daithankar@persistent.co.in> wrote:
>Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All
>numbers include indexes. This is really going to be a problem when things are
>deployed. Any idea how can it be taken down?

Shridhar,

if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit
integer specifying the length followed by as many characters as the
length tells.  On 32-bit Intel hardware this structure is aligned on a
4-byte boundary.

For your row layout this gives the following sizes (look at the "phys
size" column):

| Field    Field     Null Indexed phys  mini
| Name     Type                   size
|--------------------------------------------
| type     int        no    no       4     4
| esn      char (10)  no    yes     16    11
| min      char (10)  no    yes     16    11
| datetime timestamp  no    yes      8     8
| opc0     char (3)   no    no       8     4
| opc1     char (3)   no    no       8     4
| opc2     char (3)   no    no       8     4
| dpc0     char (3)   no    no       8     4
| dpc1     char (3)   no    no       8     4
| dpc2     char (3)   no    no       8     4
| npa      char (3)   no    no       8     4
| nxx      char (3)   no    no       8     4
| rest     char (4)   no    no       8     5
| field0   int        yes   no       4     4
| field1   char (4)   yes   no       8     5
| field2   int        yes   no       4     4
| field3   char (4)   yes   no       8     5
| field4   int        yes   no       4     4
| field5   char (4)   yes   no       8     5
| field6   int        yes   no       4     4
| field7   char (4)   yes   no       8     5
| field8   int        yes   no       4     4
| field9   char (4)   yes   no       8     5
|                                 ----- -----
|                                  176   116

Ignoring nulls for now, you have to add 32 bytes for a v7.2 heap tuple
header and 4 bytes for ItemIdData per tuple, ending up with 212 bytes
per tuple or ca. 85 GB heap space for 432000000 tuples.  Depending on
fill factor similar calculations give some 30 GB for your index.

Now if we had a datatype with only one byte for the string length,
char columns could be byte aligned and we'd have column sizes given
under "mini" in the table above.  The columns would have to be
rearranged according to alignment requirements.

Thus 60 bytes per heap tuple and 8 bytes per index tuple could be
saved, resulting in a database size of ~ 85 GB (index included).  And
I bet this would be significantly faster, too.

Hackers, do you think it's possible to hack together a quick and dirty
patch, so that string length is represented by one byte?  IOW can a
database be built that doesn't contain any char/varchar/text value
longer than 255 characters in the catalog?

If I'm not told that this is impossibly, I'd give it a try.  Shridhar,
if such a patch can be made available, would you be willing to test
it?

What can you do right now?  Try using v7.3 beta and creating your
table WITHOUT OIDS.  This saves 8 bytes per tuple; not much, but
better save 4% than nothing.

Servus
 Manfred

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 7 Oct 2002 at 16:10, Manfred Koizar wrote:
> if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit
> integer specifying the length followed by as many characters as the
> length tells.  On 32-bit Intel hardware this structure is aligned on a
> 4-byte boundary.

That shouldn't be necessary for a char field as space is always pre-allocated.
Sounds like a possible area of imporvement to me, if that's the case..

> Hackers, do you think it's possible to hack together a quick and dirty
> patch, so that string length is represented by one byte?  IOW can a
> database be built that doesn't contain any char/varchar/text value
> longer than 255 characters in the catalog?

I say if it's a char field, there should be no indicator of length as it's not
required. Just store those many characters straight ahead..

>
> If I'm not told that this is impossibly, I'd give it a try.  Shridhar,
> if such a patch can be made available, would you be willing to test
> it?

Sure. But the server machine is not available this week. Some other project is
using it. So the results won't be out unless at least a week from now.


> What can you do right now?  Try using v7.3 beta and creating your
> table WITHOUT OIDS.  This saves 8 bytes per tuple; not much, but
> better save 4% than nothing.

IIRC there was some header optimisation which saved 4 bytes. So without OIDs
that should save 8. Would do that as first next thing.

I talked to my friend regarding postgresql surpassing mysql substantially in
this test. He told me that the last test where postgresql took 23000+/150 sec
for load/index and mysql took 18,000+/130 index, postgresql was running in
default configuration. He forgot to copy postgresql.conf to data directory
after he modified it.

This time results are correct. Postgresql loads data faster, indexes it faster
and queries in almost same time.. Way to go..

Regards,
 Shridhar

-----------------------------------------------------------
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:shridhar_daithankar@persistent.co.in
Phone:- +91-20-5678900 Extn.270
Fax  :- +91-20-5678901
-----------------------------------------------------------


Re: [GENERAL] Large databases, performance

From
Tom Lane
Date:
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> MySQL 3.23.52 with innodb transaction support:

> 4 concurrent queries     :-  257.36 ms
> 40 concurrent queries    :-  35.12 ms

> Postgresql 7.2.2

> 4 concurrent queries         :- 257.43 ms
> 40 concurrent     queries        :- 41.16 ms

I find this pretty fishy.  The extreme similarity of the 4-client
numbers seems improbable, from what I know of the two databases.
I suspect your numbers are mostly measuring some non-database-related
overhead --- communications overhead, maybe?

> Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All
> numbers include indexes. This is really going to be a problem when things are
> deployed. Any idea how can it be taken down?

7.3 should be a little bit better because of Manfred's work on reducing
tuple header size --- if you create your tables WITHOUT OIDS, you should
save 8 bytes per row compared to earlier releases.

            regards, tom lane

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 7 Oct 2002 at 10:30, Tom Lane wrote:

> "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > MySQL 3.23.52 with innodb transaction support:
>
> > 4 concurrent queries     :-  257.36 ms
> > 40 concurrent queries    :-  35.12 ms
>
> > Postgresql 7.2.2
>
> > 4 concurrent queries         :- 257.43 ms
> > 40 concurrent     queries        :- 41.16 ms
>
> I find this pretty fishy.  The extreme similarity of the 4-client
> numbers seems improbable, from what I know of the two databases.
> I suspect your numbers are mostly measuring some non-database-related
> overhead --- communications overhead, maybe?

I don't know but three numbers, postgresql/mysql/oracle all are 25x.xx ms. The
clients were on same machie as of server. So no real area to point at..
>
> > Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All
> > numbers include indexes. This is really going to be a problem when things are
> > deployed. Any idea how can it be taken down?
>
> 7.3 should be a little bit better because of Manfred's work on reducing
> tuple header size --- if you create your tables WITHOUT OIDS, you should
> save 8 bytes per row compared to earlier releases.

Got it..

Bye
 Shridhar

--
Sweater, n.:    A garment worn by a child when its mother feels chilly.


Re: [GENERAL] Large databases, performance

From
Manfred Koizar
Date:
On Mon, 07 Oct 2002 19:48:31 +0530, "Shridhar Daithankar"
<shridhar_daithankar@persistent.co.in> wrote:
>I say if it's a char field, there should be no indicator of length as it's not
>required. Just store those many characters straight ahead..

This is out of reach for a quick hack ...

>Sure. But the server machine is not available this week. Some other project is
>using it. So the results won't be out unless at least a week from now.

 :-)

>This time results are correct. Postgresql loads data faster, indexes it faster
>and queries in almost same time.. Way to go..

Great!  And now let's work on making selects faster, too.

Servus
 Manfred

Re: [GENERAL] Large databases, performance

From
Tom Lane
Date:
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> I say if it's a char field, there should be no indicator of length as
> it's not required. Just store those many characters straight ahead..

Your assumption fails when considering UNICODE or other multibyte
character encodings.

            regards, tom lane

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 7 Oct 2002 at 11:21, Tom Lane wrote:

> "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > I say if it's a char field, there should be no indicator of length as
> > it's not required. Just store those many characters straight ahead..
>
> Your assumption fails when considering UNICODE or other multibyte
> character encodings.

Correct but is it possible to have real char string when database is not
unicode or when locale defines size of char, to be exact?

In my case varchar does not make sense as all strings are guaranteed to be of
defined length. While the argument you have put is correct, it's causing a disk
space leak, to say so.

Bye
 Shridhar

--
Boucher's Observation:    He who blows his own horn always plays the music    several
octaves higher than originally written.


Re: [GENERAL] Large databases, performance

From
Martijn van Oosterhout
Date:
On Tue, Oct 08, 2002 at 11:14:11AM +0530, Shridhar Daithankar wrote:
> On 7 Oct 2002 at 11:21, Tom Lane wrote:
>
> > "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > > I say if it's a char field, there should be no indicator of length as
> > > it's not required. Just store those many characters straight ahead..
> >
> > Your assumption fails when considering UNICODE or other multibyte
> > character encodings.
>
> Correct but is it possible to have real char string when database is not
> unicode or when locale defines size of char, to be exact?
>
> In my case varchar does not make sense as all strings are guaranteed to be of
> defined length. While the argument you have put is correct, it's causing a disk
> space leak, to say so.

Well, maybe. But since 7.1 or so char() and varchar() simply became text
with some length restrictions. This was one of the reasons. It also
simplified a lot of code.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: [GENERAL] Large databases, performance

From
Ron Johnson
Date:
On Tue, 2002-10-08 at 02:20, Martijn van Oosterhout wrote:
> On Tue, Oct 08, 2002 at 11:14:11AM +0530, Shridhar Daithankar wrote:
> > On 7 Oct 2002 at 11:21, Tom Lane wrote:
> >
> > > "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > > > I say if it's a char field, there should be no indicator of length as
> > > > it's not required. Just store those many characters straight ahead..
> > >
> > > Your assumption fails when considering UNICODE or other multibyte
> > > character encodings.
> >
> > Correct but is it possible to have real char string when database is not
> > unicode or when locale defines size of char, to be exact?
> >
> > In my case varchar does not make sense as all strings are guaranteed to be of
> > defined length. While the argument you have put is correct, it's causing a disk
> > space leak, to say so.

Not only that, but you get INSERT, UPDATE, DELETE and SELECT performance
gains with fixed length records, since you don't get fragmentation.

For example:
TABLE T
F1    INTEGER;
F2    VARCHAR(200)

INSERT INTO T VALUES (1, 'FOO BAR');
INSERT INTO T VALUES (2, 'SNAFU');

Next,
UPDATE T SET F2 = 'WIGGLE WAGGLE WUMPERSTUMPER' WHERE F1 = 1;

Unless there is a big gap on disk between the 2 inserted records,
postgresql must then look somewhere else for space to put the new
version of T WHERE F1 = 1.

With fixed-length records, you know exactly where you can put the
new value of F2, thus minimizing IO.

> Well, maybe. But since 7.1 or so char() and varchar() simply became text
> with some length restrictions. This was one of the reasons. It also
> simplified a lot of code.

How much simpler can you get than fixed-length records?

Of course, then there are 2 code paths, 1 for fixed length, and
1 for variable length.

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "they love our milk and honey, but preach about another    |
|  way of living"                                            |
|    Merle Haggard, "The Fighting Side Of Me"                |
+------------------------------------------------------------+


Re: [GENERAL] Large databases, performance

From
Tom Lane
Date:
Ron Johnson <ron.l.johnson@cox.net> writes:
> Not only that, but you get INSERT, UPDATE, DELETE and SELECT performance
> gains with fixed length records, since you don't get fragmentation.

That argument loses a lot of its force when you consider that Postgres
uses non-overwriting storage management.  We never do an UPDATE in-place
anyway, and so it matters little whether the updated record is the same
size as the original.

>> Well, maybe. But since 7.1 or so char() and varchar() simply became text
>> with some length restrictions. This was one of the reasons. It also
>> simplified a lot of code.

> How much simpler can you get than fixed-length records?

It's not simpler: it's more complicated, because you need an additional
input item to figure out the size of any given column in a record.
Making sure that that info is available every place it's needed is one
of the costs of supporting a feature like this.

            regards, tom lane

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 8 Oct 2002 at 10:38, Tom Lane wrote:

> Ron Johnson <ron.l.johnson@cox.net> writes:
> It's not simpler: it's more complicated, because you need an additional
> input item to figure out the size of any given column in a record.
> Making sure that that info is available every place it's needed is one
> of the costs of supporting a feature like this.

I understand. Can we put this in say page header instead of tuple header. While
all the arguments you have put are really good, the stellar redundancy
certainly can do with a mid-way solution.

Just a thought..

Bye
 Shridhar

--
bit, n:    A unit of measure applied to color.  Twenty-four-bit color    refers to
expensive $3 color as opposed to the cheaper 25    cent, or two-bit, color that
use to be available a few years ago.


Re: [GENERAL] Large databases, performance

From
Ron Johnson
Date:
On Tue, 2002-10-08 at 09:38, Tom Lane wrote:
> Ron Johnson <ron.l.johnson@cox.net> writes:
> > Not only that, but you get INSERT, UPDATE, DELETE and SELECT performance
> > gains with fixed length records, since you don't get fragmentation.
>
> That argument loses a lot of its force when you consider that Postgres
> uses non-overwriting storage management.  We never do an UPDATE in-place
> anyway, and so it matters little whether the updated record is the same
> size as the original.

Must you update any relative indexes, in order to point to the
new location of the record?

> >> Well, maybe. But since 7.1 or so char() and varchar() simply became text
> >> with some length restrictions. This was one of the reasons. It also
> >> simplified a lot of code.
>
> > How much simpler can you get than fixed-length records?
>
> It's not simpler: it's more complicated, because you need an additional
> input item to figure out the size of any given column in a record.

With fixed-length, why?  From the metadata, you can compute the intra-
record offsets.  That's how it works with the commercial RDBMS that
I use at work.

On that system, even variable-length records don't need record-size
fields.  Any repeating text (more that ~4 chars) is replaced with
run-length encoding.  This includes the phantom spaces at the end
of the field.

> Making sure that that info is available every place it's needed is one
> of the costs of supporting a feature like this.

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "they love our milk and honey, but preach about another    |
|  way of living"                                            |
|    Merle Haggard, "The Fighting Side Of Me"                |
+------------------------------------------------------------+


Re: CHAR, VARCHAR, TEXT (Was Large Databases)

From
"Josh Berkus"
Date:
Ron, Shridhar,

Maybe I missed something on this thread, but can either of you give me
an example of a real database where the PostgreSQL approach of "all
strings are TEXT" versus the more traditional CHAR implementation have
resulted in measurable performance loss?

Otherwise, this discussion is rather academic ...

-Josh Berkus

Re: [GENERAL] Large databases, performance

From
Tom Lane
Date:
Ron Johnson <ron.l.johnson@cox.net> writes:
> On Tue, 2002-10-08 at 09:38, Tom Lane wrote:
>> That argument loses a lot of its force when you consider that Postgres
>> uses non-overwriting storage management.  We never do an UPDATE in-place
>> anyway, and so it matters little whether the updated record is the same
>> size as the original.

> Must you update any relative indexes, in order to point to the
> new location of the record?

We make new index entries for the new record, yes.  Both the old and new
records must be indexed (until one or the other is garbage-collected by
VACUUM) so that transactions can find whichever version they are
supposed to be able to see according to the tuple visibility rules.

>> It's not simpler: it's more complicated, because you need an additional
>> input item to figure out the size of any given column in a record.

> With fixed-length, why?  From the metadata, you can compute the intra-
> record offsets.

Sure, but you need an additional item of metadata than you otherwise
would (this is atttypmod, in Postgres terms).  I'm not certain that the
typmod is available everyplace that would need to be able to figure out
the physical width of a column.

> On that system, even variable-length records don't need record-size
> fields.  Any repeating text (more that ~4 chars) is replaced with
> run-length encoding.  This includes the phantom spaces at the end
> of the field.

Interesting that you should bring that up in the context of an argument
for supporting fixed-width fields ;-).  Doesn't any form of data
compression bring you right back into variable-width land?

Postgres' approach to data compression is that it's done per-field,
and only on variable-width fields.  We steal a couple of bits from the
length word to allow flagging of compressed and out-of-line values.
If we were to make CHAR(n) fixed-width then it would lose the ability
to participate in either compression or out-of-line storage.

Between that and the multibyte-encoding issue, I think it's very
difficult to make a case that the general-purpose CHAR(n) type should
be implemented as fixed-width.  If someone has a specialized application
where they need a restricted fixed-width string type, it's not that
hard to make a user-defined type that supports only a single column
width (and thereby gets around the typmod issue).  So I'm satisfied with
saying "define your own type if you want this".

            regards, tom lane

Re: CHAR, VARCHAR, TEXT (Was Large Databases)

From
Ron Johnson
Date:
On Tue, 2002-10-08 at 10:33, Josh Berkus wrote:
> Ron, Shridhar,
>
> Maybe I missed something on this thread, but can either of you give me
> an example of a real database where the PostgreSQL approach of "all
> strings are TEXT" versus the more traditional CHAR implementation have
> resulted in measurable performance loss?

??????

> Otherwise, this discussion is rather academic ...

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "they love our milk and honey, but preach about another    |
|  way of living"                                            |
|    Merle Haggard, "The Fighting Side Of Me"                |
+------------------------------------------------------------+


Re: CHAR, VARCHAR, TEXT (Was Large Databases)

From
Josh Berkus
Date:
Ron,

> > Maybe I missed something on this thread, but can either of you give me
> > an example of a real database where the PostgreSQL approach of "all
> > strings are TEXT" versus the more traditional CHAR implementation have
> > resulted in measurable performance loss?
>
> ??????

In other words, if it ain't broke, don't fix it.

--
Josh Berkus
josh@agliodbs.com
Aglio Database Solutions
San Francisco

Re: CHAR, VARCHAR, TEXT (Was Large Databases)

From
Josh Berkus
Date:
Ron,

> > > > Maybe I missed something on this thread, but can either of you give
> > > > me an example of a real database where the PostgreSQL approach of
> > > > "all strings are TEXT" versus the more traditional CHAR
> > > > implementation have resulted in measurable performance loss?
> > >
> > > ??????
> >
> > In other words, if it ain't broke, don't fix it.
>
> Well, does Really Slow Performance qualify as "broke"?

That's what I was asking.   Can you explain where your slow performance is
attibutable to the CHAR implementation issues?   I missed that, if it was
explained earlier in the thread.

--
Josh Berkus
josh@agliodbs.com
Aglio Database Solutions
San Francisco

Re: [GENERAL] Large databases, performance

From
Manfred Koizar
Date:
On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar"
<shridhar_daithankar@persistent.co.in> wrote:
>Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql.

Shridhar,

here is an implementation of a set of user types: char3, char4,
char10.  Put the attached files into a new directory contrib/fixchar,
make, make install, and run fixchar.sql through psql.  Then create
your table as
    CREATE TABLE tbl (
    type        int,
    esn        char10,
    min        char10,
    datetime    timestamp,
    opc0        char3,
    ...
    rest        char4,
    field0        int,
    field1        char4,
    ...
    )

This should save 76 bytes per heap tuple and 12 bytes per index tuple,
giving a database size of ~ 76 GB.  I'd be very interested how this
affects performance.

Code has been tested for v7.2, it crashes on v7.3 beta 1.  If this is
a problem, let me know.

Servus
 Manfred

Attachment

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 9 Oct 2002 at 10:00, Manfred Koizar wrote:

> On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar"
> <shridhar_daithankar@persistent.co.in> wrote:
> >Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql.
>
> Shridhar,
>
> here is an implementation of a set of user types: char3, char4,
> char10.  Put the attached files into a new directory contrib/fixchar,
> make, make install, and run fixchar.sql through psql.  Then create
> your table as
>     CREATE TABLE tbl (
>     type        int,
>     esn        char10,
>     min        char10,
>     datetime    timestamp,
>     opc0        char3,
>     ...
>     rest        char4,
>     field0        int,
>     field1        char4,
>     ...
>     )
>
> This should save 76 bytes per heap tuple and 12 bytes per index tuple,
> giving a database size of ~ 76 GB.  I'd be very interested how this
> affects performance.
>
> Code has been tested for v7.2, it crashes on v7.3 beta 1.  If this is
> a problem, let me know.

Thank you very much for this. I would certainly give it a try. Please be
patient as next test is scheuled on monday.

Bye
 Shridhar

--
love, n.:    When it's growing, you don't mind watering it with a few tears.


Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 9 Oct 2002 at 10:00, Manfred Koizar wrote:

> On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar"
> <shridhar_daithankar@persistent.co.in> wrote:
> >Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql.
>
> Shridhar,
>
> here is an implementation of a set of user types: char3, char4,
> char10.  Put the attached files into a new directory contrib/fixchar,
> make, make install, and run fixchar.sql through psql.  Then create
> your table as

I had a quick look in things. I think it's a great learning material for pg
internals..;-)

I have a suggestion. In README, it should be worth mentioning that, new types
can be added just by changin Makefile. e.g. Changing line

OBJS = char3.o char4.o char10.o

to

OBJS = char3.o char4.o char5.o char10.o

would add the datatype char5 as well.

Obviously this is for those who might not take efforts to read the source. (
Personally I wouldn't have, had it been part of entire postgres source dump.
Just would have done ./configure;make;make install)

Thanks for the solution. It wouldn't have occurred to me in ages to create a
type for this. I guess that's partly because never used postgresql beyond
select/insert/update/delete. Anyway should have been awake..

Thanks once again


Bye
 Shridhar

--
But it's real.  And if it's real it can be affected ...  we may not be ableto
break it, but, I'll bet you credits to Navy Beans we can put a dent in it.        --
deSalle, "Catspaw", stardate 3018.2


Re: [GENERAL] Large databases, performance

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> here is an implementation of a set of user types: char3, char4,
> char10.

Coupla quick comments on these:

> CREATE FUNCTION charNN_lt(charNN, charNN)
>     RETURNS boolean
>     AS '$libdir/fixchar'
>     LANGUAGE 'c';

> bool
> charNN_lt(char *a, char *b)
> {
>     return (strncmp(a, b, NN) < 0);
> }/*charNN_lt*/

These functions are dangerous as written, because they will crash on
null inputs.  I'd suggest marking them strict in the function
declarations.  Some attention to volatility declarations (isCachable
or isImmutable) would be a good idea too.

Also, it'd be faster and more portable to write the functions with
version-1 calling conventions.

Using the Makefile to auto-create the differently sized versions is
a slick trick...

            regards, tom lane

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 9 Oct 2002 at 9:32, Tom Lane wrote:

> Manfred Koizar <mkoi-pg@aon.at> writes:
> > here is an implementation of a set of user types: char3, char4,
> > char10.
>
> Coupla quick comments on these:
>
> > CREATE FUNCTION charNN_lt(charNN, charNN)
> >     RETURNS boolean
> >     AS '$libdir/fixchar'
> >     LANGUAGE 'c';
>
> > bool
> > charNN_lt(char *a, char *b)
> > {
> >     return (strncmp(a, b, NN) < 0);
> > }/*charNN_lt*/
>
> These functions are dangerous as written, because they will crash on
> null inputs.  I'd suggest marking them strict in the function
> declarations.  Some attention to volatility declarations (isCachable
> or isImmutable) would be a good idea too.

Let me add something. Using char* is bad idea. I had faced a situation recently
on HP-UX 11 that with a libc patch, isspace collapsed for char>127. Fix was to
use unsigned char. There are other places also where the input character is
used as index to an array internally and can cause weird behaviour for values
>127

I will apply both the correction here. Will post the final stuff soon.

Bye
 Shridhar

--
Hacker's Quicky #313:    Sour Cream -n- Onion Potato Chips    Microwave Egg Roll
Chocolate Milk


Re: [GENERAL] Large databases, performance

From
Manfred Koizar
Date:
On Wed, 09 Oct 2002 09:32:50 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Coupla quick comments on these:

My first attempt on user types; thanks for the tips.

>These functions are dangerous as written, because they will crash on
>null inputs.  I'd suggest marking them strict in the function
>declarations.

I was not aware of this, just wondered why bpchar routines didn't
crash :-)  Fixed.

>Some attention to volatility declarations (isCachable
>or isImmutable) would be a good idea too.
>Also, it'd be faster and more portable to write the functions with
>version-1 calling conventions.

Done, too.  In the meantime I've found out why it crashed with 7.3:
INSERT INTO pg_opclass  is now obsolete, have to use  CREATE OPERATOR
CLASS ...

Servus
 Manfred

contrib/fixchar (Was: Large databases, performance)

From
Manfred Koizar
Date:
On Wed, 09 Oct 2002 10:00:03 +0200, I wrote:
>here is an implementation of a set of user types: char3, char4,
>char10.

New version available.  As I don't want to spam the list with various
versions until I get it right eventually, you can get it from
http://members.aon.at/pivot/pg/fixchar20021010.tgz if you are
interested.

What's new:

. README updated (per Shridhar's suggestion)
. doesn't crash on NULL (p. Tom)
. version-1 calling conventions (p. Tom)
. isCachable (p. Tom)
. works for 7.2 (as delivered) and for 7.3 (make for73)

Shridhar, you were concerned about signed/unsigned chars;  looking at
the code I can not see how this is a problem.  So no change in this
regard.

Thanks for your comments.  Have fun!

Servus
 Manfred

Re: contrib/fixchar (Was: Large databases, performance)

From
"Shridhar Daithankar"
Date:
On 10 Oct 2002 at 15:30, Manfred Koizar wrote:

> On Wed, 09 Oct 2002 10:00:03 +0200, I wrote:
> >here is an implementation of a set of user types: char3, char4,
> >char10.
>
> New version available.  As I don't want to spam the list with various
> versions until I get it right eventually, you can get it from
> http://members.aon.at/pivot/pg/fixchar20021010.tgz if you are
> interested.
>
> What's new:
>
> . README updated (per Shridhar's suggestion)
> . doesn't crash on NULL (p. Tom)
> . version-1 calling conventions (p. Tom)
> . isCachable (p. Tom)
> . works for 7.2 (as delivered) and for 7.3 (make for73)
>
> Shridhar, you were concerned about signed/unsigned chars;  looking at
> the code I can not see how this is a problem.  So no change in this
> regard.

Well, this is not related to postgresql exactly but to summerise the problem,
with libc patch PHCO_19090 or compatible upwards, on HP-UX11, isspace does not
work correctly if input value is >127. Can cause lot of problem for an external
app. It works fine with unsigned char

Does not make a difference from postgrersql point of view but would break non-
english locale if they want to use this fix under some situation.

But I agree, unless somebody reports it, no point fixing it and we know the fix
anyway..


Bye
 Shridhar

--
Live long and prosper.        -- Spock, "Amok Time", stardate 3372.7