Thread: Urgent need of (paid) PostgreSQL support in New York City area

Urgent need of (paid) PostgreSQL support in New York City area

From
Doug Fields
Date:
Hello all,

We've run up against a wall with our PostgreSQL installation and we're in
urgent need of help. If you know anyone who can help us (commercially) I
would appreciate a reference.

This is the kind of help where we pay for it, of course. It would be for as
long as it takes, in the New York City area - in-person preferred.

The solutions to our problems could include:

1) Different Linux Kernel
2) Different Hardware
3) Different PostgreSQL version/configuration
4) Different queries

We just don't know what the problem is and after banging up against it we
need some serious help.

Our database is about 70 gigs large (of which it is about 25 gigs when
dumped). We have several tables in the 100 million row range. We have
several major problems:

1) Seemingly simple queries taking hours and even days, even with full use
of indices
2) Imports of data taking days, even for 7000 rows at a time

There are other minor problems, but those are the real ones. I welcome any
suggestions of who to contact for this support. It's urgent, and immediate,
and large-scale.

Thanks,

Doug


Re: Urgent need of (paid) PostgreSQL support in New

From
Doug Fields
Date:
Hello,

Thanks for the response. I believe I have these bases covered:

>         - You run vacumm analyze often. This is one of the most important
>procedures and needs to be frequent. If you fail to do so, your database
>will most likely slow down.

I ANALYZE every 8 hours at most, whenever new data is finished being imported.

I VACUUM irregularly, as my tables mostly grow in size and don't get
trimmed regularly. Hence, I VACUUM whenever there is any major shrinkage.

>         - Are you REALLY sure that indices are being used? For that, use
>EXPLAIN <query> to see that. Note that I had real trouble until I noticed
>that PostgreSQL still does not recognize type casts, so for instance,
>if you got a bigint key, a select * from table where key = 12312 will not
>use indices. A "select * from table where key = 12312::int8" will be
>necessary. This is valid for EVERY "non-standard" type.

I'm certain that the indices are being used. I've EXPLAINed to death over
the last year. I've even made my queries do things like "SET
ENABLE_NESTLOOP=OFF;SET ENABLE_SEQSCAN=OFF;query...;RESET..." in order to
force usage of the indices for some of the queries.

We don't use any non-standard types (although we may move to a BIGSERIAL
one day) except for BOOLEAN.

>         - If your "data importing" is done via inserts, make sure that the
>batch uses transactions for each (at least or so) 200 inserts. If you
>don't, each insert will be a transaction, what will slow down you.

Indeed. At first, I did:

BEGIN WORK;INSERT...; INSERT...;COMMIT; and so forth to ensure they were in
a transaction.

Later, I imported the whole thing into a temporary table, then INSERT INTO
real_table SELECT * FROM temp_table to make it even faster (or so I thought).

The biggest slowdown seems to come when there are queries of the form:

1) INSERT INTO tableA SELECT * FROM temp_table
executing simultaneously with queries of the form
2) INSERT INTO tableB SELECT column FROM tableA WHERE various clauses

Of course, #1 happens after a bunch of inserts into temp_table, but those
go very fast.

Either of those queries, in themselves, go slowly (for #2) or are frequent
(for #1).

We have 8GB RAM. I've allocated about 700 megs to shared memory. The rest
is buffer cached by the O/S. I can't afford a 32 gig server as 2 gig RAM
modules are exorbitantly expensive. The database won't fit into RAM anyway.

At this point, after working with variants of this for a year, and watching
my database grow to several tables of 100 million records, I need
professional, high quality, in depth help.

Thanks,

Doug


Re: Urgent need of (paid) PostgreSQL support in New

From
Ezra
Date:
Hello Doug:

Professional help might cost more than a powerful server.

Ezra Taylor



Doug Fields wrote:

> Hello,
>
> Thanks for the response. I believe I have these bases covered:
>
>> - You run vacumm analyze often. This is one of the most important
>> procedures and needs to be frequent. If you fail to do so, your database
>> will most likely slow down.
>
>
> I ANALYZE every 8 hours at most, whenever new data is finished being
> imported.
>
> I VACUUM irregularly, as my tables mostly grow in size and don't get
> trimmed regularly. Hence, I VACUUM whenever there is any major shrinkage.
>
>> - Are you REALLY sure that indices are being used? For that, use
>> EXPLAIN <query> to see that. Note that I had real trouble until I
>> noticed
>> that PostgreSQL still does not recognize type casts, so for instance,
>> if you got a bigint key, a select * from table where key = 12312 will
>> not
>> use indices. A "select * from table where key = 12312::int8" will be
>> necessary. This is valid for EVERY "non-standard" type.
>
>
> I'm certain that the indices are being used. I've EXPLAINed to death
> over the last year. I've even made my queries do things like "SET
> ENABLE_NESTLOOP=OFF;SET ENABLE_SEQSCAN=OFF;query...;RESET..." in order
> to force usage of the indices for some of the queries.
>
> We don't use any non-standard types (although we may move to a
> BIGSERIAL one day) except for BOOLEAN.
>
>> - If your "data importing" is done via inserts, make sure that the
>> batch uses transactions for each (at least or so) 200 inserts. If you
>> don't, each insert will be a transaction, what will slow down you.
>
>
> Indeed. At first, I did:
>
> BEGIN WORK;INSERT...; INSERT...;COMMIT; and so forth to ensure they
> were in a transaction.
>
> Later, I imported the whole thing into a temporary table, then INSERT
> INTO real_table SELECT * FROM temp_table to make it even faster (or so
> I thought).
>
> The biggest slowdown seems to come when there are queries of the form:
>
> 1) INSERT INTO tableA SELECT * FROM temp_table
> executing simultaneously with queries of the form
> 2) INSERT INTO tableB SELECT column FROM tableA WHERE various clauses
>
> Of course, #1 happens after a bunch of inserts into temp_table, but
> those go very fast.
>
> Either of those queries, in themselves, go slowly (for #2) or are
> frequent (for #1).
>
> We have 8GB RAM. I've allocated about 700 megs to shared memory. The
> rest is buffer cached by the O/S. I can't afford a 32 gig server as 2
> gig RAM modules are exorbitantly expensive. The database won't fit
> into RAM anyway.
>
> At this point, after working with variants of this for a year, and
> watching my database grow to several tables of 100 million records, I
> need professional, high quality, in depth help.
>
> Thanks,
>
> Doug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>



Re: Urgent need of (paid) PostgreSQL support in New

From
Joseph Shraibman
Date:
Is it blocking on cpu or disk?  Perhaps simply buying faster disks will work.


Re: Urgent need of (paid) PostgreSQL support in New

From
Hadley Willan
Date:
Or more importantly the bus?? I take it you're using a SCSI sub system,
most likely with a RAID Container, lots of cache too, level 5?

After all regardless of how powerful your CPU is, or how fast your disks
are, if you're trying to squeeze it all down a straw, that won't help.

We've experienced Db slow downs before when joining tables that
comprised of long string keys (GUID/OID like).

Do you really need all the data in the tables available? Have you
considered an archiving process to a second database for data
wharehousing? For example most banks keep only a months worth of live
data in the actual system, and archive nightly on a sliding window out
to a seconday database, this is then archived off to tape for long term
storage. This keeps the system at an average size for performance. Often
you can also keep historical information can be kept in a summary
(optimised) form rather than generating off the raw data?

Also when doing HUGE INSERT sets, it can often be more efficient to drop
the indexes, or if the Db engine supports set to deferred indexing and
then rebuild after the inserts.

Some ideas, that you may have already covered, but, hey! They're free
;-)

Hadley

On Wed, 2002-12-11 at 16:21, Joseph Shraibman wrote:
> Is it blocking on cpu or disk?  Perhaps simply buying faster disks will work.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
--
Hadley Willan > Systems Development > Deeper Design Limited.
hadley@deeper.co.nz > www.deeperdesign.com > +64 (21) 28 41 463
Level 1, 4 Tamamutu St, PO Box 90, TAUPO 2730, New Zealand.



Re: Urgent need of (paid) PostgreSQL support in New

From
"Nicolai Tufar"
Date:
----- Original Message -----
From: "Doug Fields" <dfields-pg-general@pexicom.com>
>
> We have 8GB RAM. I've allocated about 700 megs to shared memory. The rest
> is buffer cached by the O/S. I can't afford a 32 gig server as 2 gig RAM
> modules are exorbitantly expensive. The database won't fit into RAM
anyway.

OS caching is generally considered a waste of resource in databases.
Try to allocate as much as possible to shared buffers and set OS caching
to minimum.

Regards,
Nic




Re: Urgent need of (paid) PostgreSQL support in New

From
"Shridhar Daithankar"
Date:
On 11 Dec 2002 at 12:52, Nicolai Tufar wrote:

> OS caching is generally considered a waste of resource in databases.
> Try to allocate as much as possible to shared buffers and set OS caching
> to minimum.

That is an exactly opposite of the truth. Leave as much for OS cache and do
minimum use of shared buffers.

Initially upping the shared buffers help but at some pointthe advantage starts
to disappear. Decide that figure with trial and error but certainly it will be
around 100-200MB for most cases..

Bye
 Shridhar

--
Intuition, however illogical, is recognized as a command prerogative.        -- Kirk,
"Obsession", stardate 3620.7


Re: Urgent need of (paid) PostgreSQL support in New

From
"Ricardo Ryoiti S. Junior"
Date:
    Hi,

On Wed, 11 Dec 2002, Nicolai Tufar wrote:

> > We have 8GB RAM. I've allocated about 700 megs to shared memory. The rest
> > is buffer cached by the O/S. I can't afford a 32 gig server as 2 gig RAM
> > modules are exorbitantly expensive. The database won't fit into RAM
> anyway.

> OS caching is generally considered a waste of resource in databases.
> Try to allocate as much as possible to shared buffers and set OS caching
> to minimum.

    Hmm, in a lower priority level, It's also nice to see how much
sort_mem you have, mainly if you sort selects returning large number of
tuples. However, his problem seems to be more tricky than these details.
:)

    []'s
    Ricardo.



Re: Urgent need of (paid) PostgreSQL support in New

From
"Ricardo Ryoiti S. Junior"
Date:
    Hi,

On Wed, 11 Dec 2002, Shridhar Daithankar wrote:

> > OS caching is generally considered a waste of resource in databases.
> > Try to allocate as much as possible to shared buffers and set OS
> > caching to minimum.
>
> That is an exactly opposite of the truth. Leave as much for OS cache and
> do minimum use of shared buffers.
>
> Initially upping the shared buffers help but at some pointthe advantage starts
> to disappear. Decide that figure with trial and error but certainly it will be
> around 100-200MB for most cases..

    Are there any studies around this? I remember that there where
other people saying the same thing. But at least logically, it seems that
the database server should know better than the OS what it needs cached or
not. Also, doesn't the db buffer cache requires a bit lower "overload" for
data to be accessed?


    []'s
    Ricardo.



Re: Urgent need of (paid) PostgreSQL support in New

From
"Shridhar Daithankar"
Date:
On 11 Dec 2002 at 9:08, Ricardo Ryoiti S. Junior wrote:
> > Initially upping the shared buffers help but at some pointthe advantage starts
> > to disappear. Decide that figure with trial and error but certainly it will be
> > around 100-200MB for most cases..
>
>     Are there any studies around this? I remember that there where

Well, you should be able to test it if you have big enough setup but.. anyway
(I don't have it now either)

> other people saying the same thing. But at least logically, it seems that
> the database server should know better than the OS what it needs cached or
> not. Also, doesn't the db buffer cache requires a bit lower "overload" for
> data to be accessed?

Well, the thing is postgresql caches data tuples and not the data file. So only
those tuples will be in memory which are required but OS can cache the data
file and serve the data faster.

And for any database, file caching is as much important as data caching.
However postgresql does not try to outsmart OS as with say, Oracle. It relies
on OS to be smart enough to handle caching properly.

A test of this is to load a database and do continous heavy activity on it as
in real life. To start with set shared buffers very high. Monitor output of
ipcs as in how much shared memory postgresql typically uses. That is all it
needs and typically it is much lower than half the RAM for multi-GIG machines.
In that case, rest of the shared buffers for postgresql are actually waste as
shared memory pages are locked in memory and can not be reused by OS.

HTH

Bye
 Shridhar

--
Sometimes a man will tell his bartender things he'll never tell his doctor.        --
Dr. Phillip Boyce, "The Menagerie" ("The Cage"),           stardate unknown.


Re: Urgent need of (paid) PostgreSQL support in New

From
Jeff Eckermann
Date:
--- Doug Fields <dfields-pg-general@pexicom.com>
wrote:
>
> The biggest slowdown seems to come when there are
> queries of the form:
>
> 1) INSERT INTO tableA SELECT * FROM temp_table
> executing simultaneously with queries of the form
> 2) INSERT INTO tableB SELECT column FROM tableA
> WHERE various clauses
>
You will probably find it more efficient to write a
procedure that handles the inserts into both tableA
and tableB sumultaneously based on the temp_table data.

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: Urgent need of (paid) PostgreSQL support in New

From
"scott.marlowe"
Date:
On Wed, 11 Dec 2002, Shridhar Daithankar wrote:

> On 11 Dec 2002 at 9:08, Ricardo Ryoiti S. Junior wrote:
> > > Initially upping the shared buffers help but at some pointthe advantage starts
> > > to disappear. Decide that figure with trial and error but certainly it will be
> > > around 100-200MB for most cases..
> >
> >     Are there any studies around this? I remember that there where
>
> Well, you should be able to test it if you have big enough setup but.. anyway
> (I don't have it now either)

I have a machine with 1.5 Gigs of ram, and so far upping shared buffers
past 4096 (32 Megs) hasn't really seemed to make much of a difference in
performance.

I think what people may be forgetting here is that it is likely that the
Linux kernel level file cachine algorhythms are more efficient than the
ones in postgresql.

If the ones in the linux kernel are optimized to cache hundreds and
hundreds of megs of data, while the ones in postgresql were designed to
hand tens of megs of data, then it might well be slower to have postgresql
try to cache the files.

In the early days of CPU design, it was not uncommon to have chips run
slower as their caches got bigger due to issues of cache lookup taking
longer and longer.  I.e. you've got to "index" your cache, and indexing
isn't free.  So, if the kernel is signifigantly more efficient at caching
large datasets, then letting the kernel do it makes the most sense.

Don't ASSUME your database is better at caching then the kernel, prove it
to yourself first if you are gonna try huge caches.

My experience has been that beyond 200 megs or so, postgresql caching
doesn't seem to speed things up much, no matter how large the data set.


Re: Batch Inserts

From
Doug Fields
Date:
Hi Ricardo, list,

One quick question:

>         - If your "data importing" is done via inserts, make sure that the
>batch uses transactions for each (at least or so) 200 inserts. If you
>don't, each insert will be a transaction, what will slow down you.

I use JDBC and use it with the default "AUTOCOMMIT ON."

Does doing a statement, in one JDBC execution, of the form:

BEGIN WORK; INSERT ... ; INSERT ... ; INSERT ...; COMMIT;

Count as N individual inserts (due to the autocommit setting) or does the
BEGIN WORK;...COMMIT; surrounding it override that setting?

Thanks,

Doug


Re: Urgent need of (paid) PostgreSQL support in New

From
"Fred Moyer"
Date:
I have noticed that increasing the shared buffers has always had a
positive performance effect on my system.  I'm not saying it will help
everyone but check out the attached (simple) benchmarks I ran.  The
results have been repeatable.

I always use as many shared buffers as I can but right now I can't been
able to go above 2 GB worth until I reconfigure the kernel to take more
than 2 GB shared memory.  Right now /proc/sys/kernel/shmmax is at
2192000000.  Note that for some reason I was able to configure 2 GB shared
memory on a machine with 1.5 GB ram (was testing against my production
server which has 4 GB). Not sure why that is but definitely try this
yourself.

> On Wed, 11 Dec 2002, Shridhar Daithankar wrote:
>
>> On 11 Dec 2002 at 9:08, Ricardo Ryoiti S. Junior wrote:
>> > > Initially upping the shared buffers help but at some pointthe
>> advantage starts to disappear. Decide that figure with trial and
>> error but certainly it will be around 100-200MB for most cases..
>> >
>> >     Are there any studies around this? I remember that there where
>>
>> Well, you should be able to test it if you have big enough setup but..
>> anyway  (I don't have it now either)
>
> I have a machine with 1.5 Gigs of ram, and so far upping shared buffers
> past 4096 (32 Megs) hasn't really seemed to make much of a difference in
>  performance.
>
> I think what people may be forgetting here is that it is likely that the
>  Linux kernel level file cachine algorhythms are more efficient than the
>  ones in postgresql.
>
> If the ones in the linux kernel are optimized to cache hundreds and
> hundreds of megs of data, while the ones in postgresql were designed to
> hand tens of megs of data, then it might well be slower to have
> postgresql  try to cache the files.
>
> In the early days of CPU design, it was not uncommon to have chips run
> slower as their caches got bigger due to issues of cache lookup taking
> longer and longer.  I.e. you've got to "index" your cache, and indexing
> isn't free.  So, if the kernel is signifigantly more efficient at
> caching  large datasets, then letting the kernel do it makes the most
> sense.
>
> Don't ASSUME your database is better at caching then the kernel, prove
> it  to yourself first if you are gonna try huge caches.
>
> My experience has been that beyond 200 megs or so, postgresql caching
> doesn't seem to speed things up much, no matter how large the data set.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


Attachment

Re: Urgent need of (paid) PostgreSQL support in New

From
Tom Lane
Date:
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> On 11 Dec 2002 at 12:52, Nicolai Tufar wrote:
>> OS caching is generally considered a waste of resource in databases.
>> Try to allocate as much as possible to shared buffers and set OS caching
>> to minimum.

> That is an exactly opposite of the truth. Leave as much for OS cache and do
> minimum use of shared buffers.

There are varying opinions about that.  Some say "push PG's
shared-buffers setting as high as you can make it".  Some say "use a
conservatively small shared-buffers setting and expect the kernel to use
the rest of physical memory as kernel disk buffers".  But both camps
agree that a shared-buffers setting near 50% of physical RAM will suck:
then any given page of database is likely to get cached *both* in PG's
buffers and in kernel buffers.  That behavior can't win by any measure.

            regards, tom lane

Re: Urgent need of (paid) PostgreSQL support in New

From
"Shridhar Daithankar"
Date:
On 11 Dec 2002 at 17:53, Fred Moyer wrote:

> I have noticed that increasing the shared buffers has always had a
> positive performance effect on my system.  I'm not saying it will help
> everyone but check out the attached (simple) benchmarks I ran.  The
> results have been repeatable.
>
> I always use as many shared buffers as I can but right now I can't been
> able to go above 2 GB worth until I reconfigure the kernel to take more
> than 2 GB shared memory.  Right now /proc/sys/kernel/shmmax is at
> 2192000000.  Note that for some reason I was able to configure 2 GB shared
> memory on a machine with 1.5 GB ram (was testing against my production
> server which has 4 GB). Not sure why that is but definitely try this
> yourself.

I believe even if you push kernel memory to past 2GB, you can not make
postgresql use it, because IIRC Tom said, all shared buffers count were int and
can not go past beyong 250K on 32 bit machines.

Possibly recompiling postgresql with a long long shared count and/or bigger
page size might make a difference.

Personally I would recompile postgresql with 16KB as page size if database size
is past a gig. It should avoid some fragmentation. I don't have any results to
say that it will work better but I believe it is worth an experiement. Anything
more than that might be overkill.



Bye
 Shridhar

--
Wedding, n:    A ceremony at which two persons undertake to become one, one
undertakes    to become nothing and nothing undertakes to become supportable.        --
Ambrose Bierce


Re: Urgent need of (paid) PostgreSQL support in New

From
Vivek Khera
Date:
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> There are varying opinions about that.  Some say "push PG's
TL> shared-buffers setting as high as you can make it".  Some say "use a
TL> conservatively small shared-buffers setting and expect the kernel to use
TL> the rest of physical memory as kernel disk buffers".  But both camps

I think that once your DB size gets big enough, there will be queries
that suck no matter what, because you have to scan through a boatload
of disk pages.  Basically, once your working set size is bigger than
the shared buffer space, you're hosed.  Making shared buffer space
bigger than 50% of RAM will suck, as Tom said.  I used to do that, now
I have about 25-30% of RAM as shared bufs.  It still sucks because the
data size is too big.  I've got the fastest disks I can get, and split
the data across multiple spindles using RAID5 (yes, I could probably
get faster with RAID 1+0 or something, but it is too late for that
now...)


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/