Thread: Point-in-time data recovery - v.7.4

Point-in-time data recovery - v.7.4

From

Rafael Martinez Guerrero

Date:

18 November 2003, 10:02:16

Hello

I am trying to find out when 'Point-in-time data recovery' functionality
will be available with postgreSQL but I can not find concrete info about
this.

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

Anybody knows when this functionality will be available with
prod-quality?

--
With regards
Rafael Martinez
USIT, University of Oslo

Re: Point-in-time data recovery - v.7.4

From

Franco Bruno Borghesi

Date:

18 November 2003, 10:09:08

I think it was delayed until 7.5... same for win32 port.

Here ir Bruce's message talkin about both topics: http://archives.postgresql.org/pgsql-hackers/2003-07/msg00284.php

On Tue, 2003-11-18 at 11:01, Rafael Martinez Guerrero wrote:

Hello

I am trying to find out when 'Point-in-time data recovery' functionality
will be available with postgreSQL but I can not find concrete info about
this.

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

Anybody knows when this functionality will be available with
prod-quality?

Attachment

signature.asc

Re: Point-in-time data recovery - v.7.4

From

Greg Stark

Date:

18 November 2003, 10:16:44

Rafael Martinez Guerrero <r.m.guerrero@usit.uio.no> writes:

> Anybody knows when this functionality will be available with
> prod-quality?

It's likely to show up in 7.5 which it is hoped would be released around the
middle of next year. However as this is free software and dependent on
volunteers and other people's priorities neither the feature set of the next
release nor the exact release date are promised.

--
greg

Re: [ADMIN] Point-in-time data recovery - v.7.4

From

Robert Treat

Date:

18 November 2003, 10:27:53

It was planned for 7.4 but got bumped since the work was not finished.
There are patches floating around and people are putting some effort
into it, so hopefully we will see something in 7.5, but it is dependent
on the code being finished before the end of 7.5 development cycle. If
you want to help code PITR please send a message to hackers.

Robert Treat

On Tue, 2003-11-18 at 09:01, Rafael Martinez Guerrero wrote:
> Hello
>
> I am trying to find out when 'Point-in-time data recovery' functionality
> will be available with postgreSQL but I can not find concrete info about
> this.
>
> References in mailinglists talk about version 7.4 and in the TODO list
> is under the section 'urgent'.
>
> Anybody knows when this functionality will be available with
> prod-quality?
>
> --
> With regards
> Rafael Martinez
> USIT, University of Oslo
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Point-in-time data recovery - v.7.4

From

Andrew Sullivan

Date:

18 November 2003, 10:29:50

On Tue, Nov 18, 2003 at 03:01:15PM +0100, Rafael Martinez Guerrero wrote:
> References in mailinglists talk about version 7.4 and in the TODO list
> is under the section 'urgent'.

It didn't get done.

> Anybody knows when this functionality will be available with
> prod-quality?

I don't think so.

Jan Wieck has a proposal for a new replication system which will
offer a trick for producing point in time as a side benefit.  We are
aiming to have that software in use sooner rather than later, but it
hasn't been written yet.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Point-in-time data recovery - v.7.4

From

Ron Johnson

Date:

18 November 2003, 14:40:23

On Tue, 2003-11-18 at 08:29, Andrew Sullivan wrote:
> On Tue, Nov 18, 2003 at 03:01:15PM +0100, Rafael Martinez Guerrero wrote:
> > References in mailinglists talk about version 7.4 and in the TODO list
> > is under the section 'urgent'.
>
> It didn't get done.
>
> > Anybody knows when this functionality will be available with
> > prod-quality?
>
> I don't think so.
>
> Jan Wieck has a proposal for a new replication system which will
> offer a trick for producing point in time as a side benefit.  We are
> aiming to have that software in use sooner rather than later, but it
> hasn't been written yet.

Supplanting PITR with Replication?  If so, I don't think that's
a good idea, since you'd either have to buy a new server, or double
disk capacity, both of which can cost noticeable amounts of money,
if SCSI disks are used.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

When Swedes start committing terrorism, I'll become suspicious of
Scandinavians.

uploading files

From

"Rick Gigger"

Date:

18 November 2003, 15:12:34

What is the best method for storing files in postgres?  Is it better to use
the large object functions or to just encode the data and store it in a
regular text or data field?

Re: Point-in-time data recovery - v.7.4

From

Andrew Sullivan

Date:

18 November 2003, 15:15:43

On Tue, Nov 18, 2003 at 12:39:40PM -0600, Ron Johnson wrote:
> Supplanting PITR with Replication?  If so, I don't think that's

No, I think the idea is that if you're already using the replication
system, you can get this for nothing along with it.  Yes, it'll cost
you in hardware.

A


--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: uploading files

From

Doug McNaught

Date:

18 November 2003, 15:25:35

"Rick Gigger" <rick@alpinenetworking.com> writes:

> What is the best method for storing files in postgres?  Is it better to use
> the large object functions or to just encode the data and store it in a
> regular text or data field?

Depends on whether you need "random access" to the contents.  You can
lo_seek() inside a large object and retrieve parts of the data with
lo_read(), while 'text' and 'bytea' currently require fetching the
whole field.

-Doug

Re: uploading files

From

"Keith C. Perry"

Date:

18 November 2003, 15:25:52

Quoting Rick Gigger <rick@alpinenetworking.com>:

> What is the best method for storing files in postgres?  Is it better to use
> the large object functions or to just encode the data and store it in a
> regular text or data field?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Rick,

This has been discussed recently so you might want to dig through the archives
but one thing to be aware of is that with large objects, you have have to dump
your database in a different format- a non-text format.  This is less portable
than storing files as bytea's which can be dumped in the text and non-text
formats.  Argueably is all you use is PosgreSQL then this might not be a big deal.

Performance-wise I'm not qualified to speak to which is "better".  As always, it
is going to depend on your specific application and environment.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Re: uploading files

From

"Rick Gigger"

Date:

18 November 2003, 15:29:57

I will search the archives but does anyone know off the top of their head
which performs better?

----- Original Message -----
From: "Keith C. Perry" <netadmin@vcsn.com>
To: "Rick Gigger" <rick@alpinenetworking.com>
Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
Sent: Tuesday, November 18, 2003 12:25 PM
Subject: Re: [GENERAL] uploading files


> Quoting Rick Gigger <rick@alpinenetworking.com>:
>
> > What is the best method for storing files in postgres?  Is it better to
use
> > the large object functions or to just encode the data and store it in a
> > regular text or data field?
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that your
> >       message can get through to the mailing list cleanly
> >
>
> Rick,
>
> This has been discussed recently so you might want to dig through the
archives
> but one thing to be aware of is that with large objects, you have have to
dump
> your database in a different format- a non-text format.  This is less
portable
> than storing files as bytea's which can be dumped in the text and non-text
> formats.  Argueably is all you use is PosgreSQL then this might not be a
big deal.
>
> Performance-wise I'm not qualified to speak to which is "better".  As
always, it
> is going to depend on your specific application and environment.
>
> --
> Keith C. Perry, MS E.E.
> Director of Networks & Applications
> VCSN, Inc.
> http://vcsn.com
>
> ____________________________________
> This email account is being host by:
> VCSN, Inc : http://vcsn.com
>

Re: uploading files

From

Doug McNaught

Date:

18 November 2003, 15:37:51

"Rick Gigger" <rick@alpinenetworking.com> writes:

> I will search the archives but does anyone know off the top of their head
> which performs better?

It really depends on what you are trying to do.  If you are always
fetching the entire file from the database, 'bytea' *might* have a
little less overhead, but it's not clearcut.

Both of them will be ultimately limited by how fast you can get data
off the disk and and pump it through your client connection.

-Doug

Re: uploading files

From

Karsten Hilbert

Date:

18 November 2003, 15:58:11

> Depends on whether you need "random access" to the contents.  You can
> lo_seek() inside a large object and retrieve parts of the data with
> lo_read(), while 'text' and 'bytea' currently require fetching the
> whole file.
Not so unless I misunderstand. We use substr() on bytea for
chunking access to large files on OSs with inferior TCP/IP buffer
implementations. That's probably suboptimal performance-wise,
though.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: uploading files

From

Doug McNaught

Date:

18 November 2003, 16:10:25

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

> > Depends on whether you need "random access" to the contents.  You can
> > lo_seek() inside a large object and retrieve parts of the data with
> > lo_read(), while 'text' and 'bytea' currently require fetching the
> > whole file.
> Not so unless I misunderstand. We use substr() on bytea for
> chunking access to large files on OSs with inferior TCP/IP buffer
> implementations. That's probably suboptimal performance-wise,
> though.

I think the field will still be competely loaded into memory on the
server side though, while LOs are stored in "chunks" and can
theoretically be streamed to the client.  I'm not really a definitive
authority, though...

-Doug

performance problem

From

"Rick Gigger"

Date:

18 November 2003, 16:47:33

I am currently trying to import a text data file without about 45,000
records.  At the end of the import it does an update on each of the 45,000
records.  Doing all of the inserts completes in a fairly short amount of
time (about 2 1/2 minutes).  Once it gets to the the updates though it slows
to a craw.  After about 10 minutes it's only done about 3000 records.

Is that normal?  Is it because it's inside such a large transaction?  Is
there anything I can do to speed that up.  It seems awfully slow to me.

I didn't think that giving it more shared buffers would help but I tried
anyway.  It didn't help.

I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
of stuff but it didn't speed up the updates at all.

I am using a dual 800mhz xeon box with 2 gb of ram.  I've tried anywhere
from about 16,000 to 65000 shared buffers.

What other factors are involved here?

Re: uploading files

From

Karsten Hilbert

Date:

18 November 2003, 16:48:21

> I think the field will still be competely loaded into memory on the
> server side though, while LOs are stored in "chunks" and can
> theoretically be streamed to the client.  I'm not really a definitive
> authority, though...
Ah ! Sounds about right ! Something new to learn every day :-)

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: performance problem

From

Mike Mascari

Date:

18 November 2003, 17:05:24

Rick Gigger wrote:

> I am currently trying to import a text data file without about 45,000
> records.  At the end of the import it does an update on each of the 45,000
> records.  Doing all of the inserts completes in a fairly short amount of
> time (about 2 1/2 minutes).  Once it gets to the the updates though it slows
> to a craw.  After about 10 minutes it's only done about 3000 records.
>
> Is that normal?  Is it because it's inside such a large transaction?  Is
> there anything I can do to speed that up.  It seems awfully slow to me.
>
> I didn't think that giving it more shared buffers would help but I tried
> anyway.  It didn't help.
>
> I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
> of stuff but it didn't speed up the updates at all.
>
> I am using a dual 800mhz xeon box with 2 gb of ram.  I've tried anywhere
> from about 16,000 to 65000 shared buffers.
>
> What other factors are involved here?

It is difficult to say without knowing either the definition of the
relation(s) or the update queries involved. Are there indexes being
created after the import that would allow PostgreSQL to locate the
rows being updated quickly, or is the update an unqualified update (no
WHERE clause) that affects all tuples?

EXPLAIN ANALYZE is your friend...

Mike Mascari
mascarm@mascari.com

Re: performance problem

From

"scott.marlowe"

Date:

18 November 2003, 17:40:36

On Tue, 18 Nov 2003, Rick Gigger wrote:

> I am currently trying to import a text data file without about 45,000
> records.  At the end of the import it does an update on each of the 45,000
> records.  Doing all of the inserts completes in a fairly short amount of
> time (about 2 1/2 minutes).  Once it gets to the the updates though it slows
> to a craw.  After about 10 minutes it's only done about 3000 records.

Are you doing your updates like this:

update table1 set field1='1' where id=1;
update table1 set field2=4 where id=1;
...
update table1 set field10='something else' where id=1;
update table1 set field1='3' where id=2;
...

Maybe an analyze after the import and before the updates would help.  As
might a vacuum [full].  If the table isn't updated by other processes
probably not.

Maybe you've got a foreign key mistmatch going on and a lot of sequential
scanning?

> Is that normal?  Is it because it's inside such a large transaction?  Is
> there anything I can do to speed that up.  It seems awfully slow to me.

Possibly.  If you are creating a lot of dead tuples, then the operations
can get slower and slower.  Have you checked your fsm settings et. al.?

> I didn't think that giving it more shared buffers would help but I tried
> anyway.  It didn't help.

Usually doesn't.  More sort_mem might though.  Make it something like
16384 or 32768 (it's measured in kbytes)

> I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
> of stuff but it didn't speed up the updates at all.

You need to probably do the analyze between the import and the update.

> I am using a dual 800mhz xeon box with 2 gb of ram.  I've tried anywhere
> from about 16,000 to 65000 shared buffers.

That's VERY high.  When postgresql has to manage a lot of buffers it
actually is slower than letting the kernel in Linux or BSD do it for you.

> What other factors are involved here?

Not sure.  More concrete examples would help.  Have you run your queries
with explain analyze at the front and looked for differences in number of
rows / loops?  Those are the dead giveaways.

Take a look here:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

Re: performance problem

From

"Rick Gigger"

Date:

18 November 2003, 17:41:48

Uh, I feel a little silly now.  I had and index on the field in question
(needed to locate the row to update) but later recreated the table and
forgot to readd it.  I had assumed that it was there but double checked just
now and it was gone.  I then readded the index and and it finished in a few
minutes.
Sorry about that one.  Thanks for the help.

rg

----- Original Message -----
From: "Mike Mascari" <mascarm@mascari.com>
To: "Rick Gigger" <rick@alpinenetworking.com>
Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
Sent: Tuesday, November 18, 2003 2:03 PM
Subject: Re: [GENERAL] performance problem


> Rick Gigger wrote:
>
> > I am currently trying to import a text data file without about 45,000
> > records.  At the end of the import it does an update on each of the
45,000
> > records.  Doing all of the inserts completes in a fairly short amount of
> > time (about 2 1/2 minutes).  Once it gets to the the updates though it
slows
> > to a craw.  After about 10 minutes it's only done about 3000 records.
> >
> > Is that normal?  Is it because it's inside such a large transaction?  Is
> > there anything I can do to speed that up.  It seems awfully slow to me.
> >
> > I didn't think that giving it more shared buffers would help but I tried
> > anyway.  It didn't help.
> >
> > I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a
lot
> > of stuff but it didn't speed up the updates at all.
> >
> > I am using a dual 800mhz xeon box with 2 gb of ram.  I've tried anywhere
> > from about 16,000 to 65000 shared buffers.
> >
> > What other factors are involved here?
>
> It is difficult to say without knowing either the definition of the
> relation(s) or the update queries involved. Are there indexes being
> created after the import that would allow PostgreSQL to locate the
> rows being updated quickly, or is the update an unqualified update (no
> WHERE clause) that affects all tuples?
>
> EXPLAIN ANALYZE is your friend...
>
> Mike Mascari
> mascarm@mascari.com
>
>
>

Re: performance problem

From

Martin Marques

Date:

18 November 2003, 17:47:58

El Mar 18 Nov 2003 17:43, Rick Gigger escribió:
> I am currently trying to import a text data file without about 45,000
> records.  At the end of the import it does an update on each of the 45,000
> records.  Doing all of the inserts completes in a fairly short amount of
> time (about 2 1/2 minutes).  Once it gets to the the updates though it slows
> to a craw.  After about 10 minutes it's only done about 3000 records.

Thats not a big amount of rows. It shouldn't be making that amount of trouble,
unless you have something wrong with the update query.

Try inserting all the 45K rows and then run an explain analyze to the update
query to see what's wrong (reply the output of explain analyze to the list).
The update query itself would help alot, too.

> Is that normal?  Is it because it's inside such a large transaction?  Is
> there anything I can do to speed that up.  It seems awfully slow to me.

Not sure, but I have had lot's of big transactions with heady load, and never
had a problem like the one you describe.

> I didn't think that giving it more shared buffers would help but I tried
> anyway.  It didn't help.
>
> I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
> of stuff but it didn't speed up the updates at all.
>
> I am using a dual 800mhz xeon box with 2 gb of ram.  I've tried anywhere
> from about 16,000 to 65000 shared buffers.

How's memory performace while you're running the updates? What does free say
(if you are in some unix environment).

--
select 'mmarques' || '@' || 'unl.edu.ar' AS email;
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telemática
                       Universidad Nacional
                            del Litoral
-----------------------------------------------------------------

Re: Point-in-time data recovery - v.7.4

From

"scott.marlowe"

Date:

18 November 2003, 19:57:19

On Tue, 18 Nov 2003, Andrew Sullivan wrote:

> On Tue, Nov 18, 2003 at 12:39:40PM -0600, Ron Johnson wrote:
> > Supplanting PITR with Replication?  If so, I don't think that's
>
> No, I think the idea is that if you're already using the replication
> system, you can get this for nothing along with it.  Yes, it'll cost
> you in hardware.

No, it's more along the lines of "once the replication system is written,
it can be adapted to provide PITR (without the need for actual
replication) as a side effect.

Re: Point-in-time data recovery - v.7.4

From

Gaetano Mendola

Date:

18 November 2003, 20:23:44

scott.marlowe wrote:

> On Tue, 18 Nov 2003, Andrew Sullivan wrote:
>
>
>>On Tue, Nov 18, 2003 at 12:39:40PM -0600, Ron Johnson wrote:
>>
>>>Supplanting PITR with Replication?  If so, I don't think that's
>>
>>No, I think the idea is that if you're already using the replication
>>system, you can get this for nothing along with it.  Yes, it'll cost
>>you in hardware.
>
>
> No, it's more along the lines of "once the replication system is written,
> it can be adapted to provide PITR (without the need for actual
> replication) as a side effect.

I think the opposit:

Once the PITR is written it can be used for Replication as side
effect.

For example the Sybase Replication is implemented in this way.


Regards
Gaetano Mendola

Re: Point-in-time data recovery - v.7.4

From

"Joshua D. Drake"

Date:

18 November 2003, 20:55:49

> I think the opposit:
>
> Once the PITR is written it can be used for Replication as side
> effect.
>
Command Prompt has found the opposite to be the case. It is our
replication product
that is going to allow PITR, even from 7.3.

Sincerely,

Joshua Drake



> For example the Sybase Replication is implemented in this way.
>
>
> Regards
> Gaetano Mendola
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org

Re: performance problem

From

"Rick Gigger"

Date:

18 November 2003, 22:17:32

> Usually doesn't.  More sort_mem might though.  Make it something like
> 16384 or 32768 (it's measured in kbytes)

I thought that it was measured in 8k blocks.

http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node3.html


> That's VERY high.  When postgresql has to manage a lot of buffers it
> actually is slower than letting the kernel in Linux or BSD do it for you.

Even if you've got the memory to spare?  Does postgres actually slow down
just because it's slower to manager a lot of them just or because you're
taking the memory away from the kernel so the kernel has to swap more?

rg

Re: performance problem

From

Doug McNaught

Date:

18 November 2003, 22:27:25

"Rick Gigger" <rick@alpinenetworking.com> writes:

> > That's VERY high.  When postgresql has to manage a lot of buffers it
> > actually is slower than letting the kernel in Linux or BSD do it for you.
>
> Even if you've got the memory to spare?  Does postgres actually slow down
> just because it's slower to manager a lot of them just or because you're
> taking the memory away from the kernel so the kernel has to swap more?

The latter, mainly, I think.  Also you *really* don't want your kernel
to swap out any of your shared buffers, which can happen if they take
up a significant portion of RAM...

-Doug

Re: performance problem

From

Martijn van Oosterhout

Date:

18 November 2003, 22:32:02

On Tue, Nov 18, 2003 at 04:54:59PM -0700, Rick Gigger wrote:
> > That's VERY high.  When postgresql has to manage a lot of buffers it
> > actually is slower than letting the kernel in Linux or BSD do it for you.
>
> Even if you've got the memory to spare?  Does postgres actually slow down
> just because it's slower to manager a lot of them just or because you're
> taking the memory away from the kernel so the kernel has to swap more?

It's a combination. Any buffer that's stored in the postgresql cache is also
in the kernel cache so that's just slightly inefficient. The kernel pool
manager tends to be more optimised for handling large numbers of buffers.
Because it's the kernel is can tailor for hardware quirks like high memory
and bounce buffers and such. And when you're not using your shared memory
it's wasted but the kernel can reallocate any memory it likes for any
purpose.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

msg-32091-70790.dat

Re: Point-in-time data recovery - v.7.4

From

"Merrall, Graeme"

Date:

19 November 2003, 00:38:03


> Command Prompt has found the opposite to be the case. It is our
> replication product
> that is going to allow PITR, even from 7.3.
>

I must have been asleep as this is the first I've heard of Command
Prompt doing a replication product. Are you able to provide more
details?

Zembu Labs (http://www.zembu.com/) apparently had a postgreSQL
replication thing happening last year but they've since closed their
doors. Anyone know anything about that?

Cheers,
 Graeme

Re: Point-in-time data recovery - v.7.4

From

Dave Cramer

Date:

19 November 2003, 09:36:02

So are you donating your replication work to the community then? Or do
we still have to write PITR?

Dave
On Tue, 2003-11-18 at 19:55, Joshua D. Drake wrote:
> > I think the opposit:
> >
> > Once the PITR is written it can be used for Replication as side
> > effect.
> >
> Command Prompt has found the opposite to be the case. It is our
> replication product
> that is going to allow PITR, even from 7.3.
>
> Sincerely,
>
> Joshua Drake
>
>
>
> > For example the Sybase Replication is implemented in this way.
> >
> >
> > Regards
> > Gaetano Mendola
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>

Re: Point-in-time data recovery - v.7.4

From

Bruce Momjian

Date:

19 November 2003, 13:01:52

Dave Cramer wrote:
> So are you donating your replication work to the community then? Or do
> we still have to write PITR?

Jan is working on a community replication solution, and we will still
need PITR anyway because everyone doesn't have two machines.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Point-in-time data recovery - v.7.4

From

Stef

Date:

19 November 2003, 15:26:34

Hi all,

Just another question on all this :
When will postgres have win32 native
support. I've been gathering information
from bits and pieces I found on the net,
but never got an official answer.

Some said it will be in 7.4, but it's not there,
and from what I gathered about a month ago
was that it was aimed for release in version 8.0
So I have been under the impression that there
will never be a 7.5 release.

Is there anyone that can help me clear up my
confusion in this matter?

Franco Bruno Borghesi explained :
=> I think it was delayed until 7.5... same for win32 port.

Attachment

Re: uploading files

From

"John Gray"

Date:

19 November 2003, 19:20:44

On Tue, 18 Nov 2003 21:22:56 +0100, Karsten Hilbert wrote:

>> I think the field will still be competely loaded into memory on the
>> server side though, while LOs are stored in "chunks" and can
>> theoretically be streamed to the client.  I'm not really a definitive
>> authority, though...
> Ah ! Sounds about right ! Something new to learn every day :-)
>

Actually, bytea and text are chunked behind the scenes (the technique
known as TOAST). They are also compressed(LZ) by default. However
if you anticipate substringing them a lot (or if they are not very
compressible anyway), use ALTER TABLE xxx ALTER COLUMN yyy SET STORAGE
EXTERNAL to turn off the automatc compression of values. This allows the
substr operation to read the minimum number of chunks necessary of disk.

Bear in mind of course, that the whole value will take up more disk space,
so reading it in its entirety will be slower (IO is normally the limiting
performance factor -CPU on a DB server is often cheap by comparison, so
decompressing/compressing to save IO is a good idea). If however you
always fetch small parts (e.g. you store large images and usually want to
read the header info from them, EXTERNAL is a good bet (and depending on
the image format, the compression might not compress them very much anyway).

Finally, note that the substr optimisation for text only really buys you
anything if the character-set is single-byte.

Hope this helps

John Gray

(implementer of substr optimisation many moons ago!)

Re: Point-in-time data recovery - v.7.4

From

Shridhar Daithankar

Date:

20 November 2003, 02:20:34

Stef wrote:

> Hi all,
>
> Just another question on all this :
> When will postgres have win32 native
> support. I've been gathering information
> from bits and pieces I found on the net,
> but never got an official answer.
>
> Some said it will be in 7.4, but it's not there,
> and from what I gathered about a month ago
> was that it was aimed for release in version 8.0
> So I have been under the impression that there
> will never be a 7.5 release.

Actually there is a debate ongoing right now, about what to call the next
release? 7.5 or 8.0

So native win32 port is aimed for next major release whatever that might be..:-)

  Shridhar

Re: Point-in-time data recovery - v.7.4

From

Shridhar Daithankar

Date:

20 November 2003, 03:43:38

Stef wrote:

> And the world rejoiced as Shridhar Daithankar said :
>
> => Actually there is a debate ongoing right now, about what to call the next
> => release? 7.5 or 8.0
> =>
> => So native win32 port is aimed for next major release whatever that might be..:-)
>
> Well, put like this, it doesn't really make a difference in the end :)

As long is win32 makes to next release..:-)

  Shridhar

Re: Point-in-time data recovery - v.7.4

From

Andrew Sullivan

Date:

20 November 2003, 14:25:04

On Wed, Nov 19, 2003 at 11:42:40AM -0500, Bruce Momjian wrote:
> Jan is working on a community replication solution, and we will still
> need PITR anyway because everyone doesn't have two machines.

Jan's system, if I understand him correctly, will not actually
require two machines to get you PITR either, although it probably
_would_ require two datatabases and a hunk of disk.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: performance problem

From

"Rick Gigger"

Date:

20 November 2003, 16:41:36

> > > That's VERY high.  When postgresql has to manage a lot of buffers it
> > > actually is slower than letting the kernel in Linux or BSD do it for
you.

I am confused.  In this tutorial (by Bruce Momjian)
http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node8.html
it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4% for
sort size."

If I've got 2g of RAM then that is 2097152k.  25% of that = 524288k.  So the
tutorial is saying that 524288k is a good starting point for shared buffers
with this amount of RAM.

If each buffer is 8k
(http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node3.html)
then that would be 65536 buffers.

I'm pretty sure that that is a lot more than I need to cache every tuple in
my database.  Now everytime I see someone comment on this list about
appropriate numbers for shared buffers they say something like "(65536
buffers is) VERY high.  Now since I obviously don't need that much shared
cache so I am not concerned but it seems to me that one of the following
must be true.

My calculations here are wrong.
or The tutorial is not accurate in saying that 25% is a good starting point.
or The people making comments that 65536 is "VERY high" are wrong.

Am I just confused or does this make sense?

> > Even if you've got the memory to spare?  Does postgres actually slow
down
> > just because it's slower to manager a lot of them just or because you're
> > taking the memory away from the kernel so the kernel has to swap more?
>
> The latter, mainly, I think.  Also you *really* don't want your kernel
> to swap out any of your shared buffers, which can happen if they take
> up a significant portion of RAM...

So if I'm not swapping at all and I've got over 1g of unused memory then I'm
probably ok even with the very high buffer setting? (Although I will
probably reduce it anyway since it is unnecessary).

rg

Re: performance problem

From

"Rick Gigger"

Date:

20 November 2003, 16:52:31

Ok, adding the index back in worked the first time but then I tried
generating the database from scratch again, this time adding the index right
from the start.  It added the 45000 rows in about a minute but then was
going dog slow on the updates again.  So I did an explain and sure enough it
was not using the index.  After some investigation I determined that it was
not using the index because when the transaction started there were only 4
rows in the table so at that point it didn't want to use it.  It apparently
doesn't gather analysis data fast enough to handle this kind of transaction.

I worked around this by starting the transaction and inserting the 45,000
rows and then killing it.  The I removed the index and readded it which
apparently gathered some stats and since there were all of the dead tuples
in there from the failed transaction it now decided that it should use the
index.  I reran the script and this time it took 5 minutes again instead of
1 1/2 hours.

I am using 7.2.4.  Has this improved in later versions?  I'm not concerened
since this is a very rare thing to need to do and it's obviously possible to
work around but it would be nice if postgres could figure things like that
out on it's own.  (It certainly would have saved me a lot of time and
confusion last night at about 3 am).  Is there a way to for the use of a
specific index on a query?

rg

----- Original Message -----
From: "Rick Gigger" <rick@alpinenetworking.com>
To: "Mike Mascari" <mascarm@mascari.com>
Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
Sent: Tuesday, November 18, 2003 2:41 PM
Subject: Re: [GENERAL] performance problem

> Uh, I feel a little silly now.  I had and index on the field in question
> (needed to locate the row to update) but later recreated the table and
> forgot to readd it.  I had assumed that it was there but double checked
just
> now and it was gone.  I then readded the index and and it finished in a
few
> minutes.
> Sorry about that one.  Thanks for the help.
>
> rg
>
> ----- Original Message -----
> From: "Mike Mascari" <mascarm@mascari.com>
> To: "Rick Gigger" <rick@alpinenetworking.com>
> Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
> Sent: Tuesday, November 18, 2003 2:03 PM
> Subject: Re: [GENERAL] performance problem
>
>
> > Rick Gigger wrote:
> >
> > > I am currently trying to import a text data file without about 45,000
> > > records.  At the end of the import it does an update on each of the
> 45,000
> > > records.  Doing all of the inserts completes in a fairly short amount
of
> > > time (about 2 1/2 minutes).  Once it gets to the the updates though it
> slows
> > > to a craw.  After about 10 minutes it's only done about 3000 records.
> > >
> > > Is that normal?  Is it because it's inside such a large transaction?
Is
> > > there anything I can do to speed that up.  It seems awfully slow to
me.
> > >
> > > I didn't think that giving it more shared buffers would help but I
tried
> > > anyway.  It didn't help.
> > >
> > > I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up
a
> lot
> > > of stuff but it didn't speed up the updates at all.
> > >
> > > I am using a dual 800mhz xeon box with 2 gb of ram.  I've tried
anywhere
> > > from about 16,000 to 65000 shared buffers.
> > >
> > > What other factors are involved here?
> >
> > It is difficult to say without knowing either the definition of the
> > relation(s) or the update queries involved. Are there indexes being
> > created after the import that would allow PostgreSQL to locate the
> > rows being updated quickly, or is the update an unqualified update (no
> > WHERE clause) that affects all tuples?
> >
> > EXPLAIN ANALYZE is your friend...
> >
> > Mike Mascari
> > mascarm@mascari.com
> >
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: performance problem

From

fred@redhotpenguin.com

Date:

20 November 2003, 16:59:43

As discussed on previous posts and mentioned below, the kernel is often
better at handling large amounts of buffers than Postgres.

From researching previous posts and testing on my own setups, 256mb is
about the max you want to go with shared buffers (if you have at least 1
gig ram).  After that the smaller queries (which there are usually a lot
of) slow down, and you only gain marginal improvement on the larger
queries.  You'll see an improvement slightly on the large queries going
over 256 but not enough to improve the overall performance.  There's a few
more technically detailed explanations of this in the list archive I
believe, by those who can explain it better than I.

>> > > That's VERY high.  When postgresql has to manage a lot of buffers it
>> > > actually is slower than letting the kernel in Linux or BSD do it for
> you.
>
> I am confused.  In this tutorial (by Bruce Momjian)
> http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node8.html
> it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4%
> for
> sort size."
>
> If I've got 2g of RAM then that is 2097152k.  25% of that = 524288k.  So
> the
> tutorial is saying that 524288k is a good starting point for shared
> buffers
> with this amount of RAM.
>
> If each buffer is 8k
> (http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node3.html)
> then that would be 65536 buffers.
>
> I'm pretty sure that that is a lot more than I need to cache every tuple
> in
> my database.  Now everytime I see someone comment on this list about
> appropriate numbers for shared buffers they say something like "(65536
> buffers is) VERY high.  Now since I obviously don't need that much shared
> cache so I am not concerned but it seems to me that one of the following
> must be true.
>
> My calculations here are wrong.
> or The tutorial is not accurate in saying that 25% is a good starting
> point.
> or The people making comments that 65536 is "VERY high" are wrong.
>
> Am I just confused or does this make sense?
>
>> > Even if you've got the memory to spare?  Does postgres actually slow
> down
>> > just because it's slower to manager a lot of them just or because
>> you're
>> > taking the memory away from the kernel so the kernel has to swap more?
>>
>> The latter, mainly, I think.  Also you *really* don't want your kernel
>> to swap out any of your shared buffers, which can happen if they take
>> up a significant portion of RAM...
>
> So if I'm not swapping at all and I've got over 1g of unused memory then
> I'm
> probably ok even with the very high buffer setting? (Although I will
> probably reduce it anyway since it is unnecessary).
>
> rg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: performance problem

From

Alvaro Herrera Munoz

Date:

20 November 2003, 17:06:43

On Thu, Nov 20, 2003 at 01:52:10PM -0700, Rick Gigger wrote:

> I worked around this by starting the transaction and inserting the 45,000
> rows and then killing it.  The I removed the index and readded it which
> apparently gathered some stats and since there were all of the dead tuples
> in there from the failed transaction it now decided that it should use the
> index.  I reran the script and this time it took 5 minutes again instead of
> 1 1/2 hours.

Stats are not collected automatically.  You should run ANALYZE after
importing your data.  And it's probably faster to create the index after
the data is loaded, too.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
Y una voz del caos me habl� y me dijo
"Sonr�e y s� feliz, podr�a ser peor".
Y sonre�. Y fui feliz.
Y fue peor.

Re: performance problem

From

"Rick Gigger"

Date:

20 November 2003, 17:08:14

Ah, so then the 25% recomendation for a starting point is probably not a bad
one, but if you get over a 1 gig of ram you might as well stop at about
256mb for the shared buffers because it just won't do you much good and will
start to slow down small queries.

That makes sense to me.

----- Original Message -----
From: <fred@redhotpenguin.com>
To: "Rick Gigger" <rick@alpinenetworking.com>
Cc: "Doug McNaught" <doug@mcnaught.org>; "scott.marlowe"
<scott.marlowe@ihs.com>; "PgSQL General ML" <pgsql-general@postgresql.org>
Sent: Thursday, November 20, 2003 1:59 PM
Subject: Re: [GENERAL] performance problem


> As discussed on previous posts and mentioned below, the kernel is often
> better at handling large amounts of buffers than Postgres.
>
> From researching previous posts and testing on my own setups, 256mb is
> about the max you want to go with shared buffers (if you have at least 1
> gig ram).  After that the smaller queries (which there are usually a lot
> of) slow down, and you only gain marginal improvement on the larger
> queries.  You'll see an improvement slightly on the large queries going
> over 256 but not enough to improve the overall performance.  There's a few
> more technically detailed explanations of this in the list archive I
> believe, by those who can explain it better than I.
>
> >> > > That's VERY high.  When postgresql has to manage a lot of buffers
it
> >> > > actually is slower than letting the kernel in Linux or BSD do it
for
> > you.
> >
> > I am confused.  In this tutorial (by Bruce Momjian)
> > http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node8.html
> > it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4%
> > for
> > sort size."
> >
> > If I've got 2g of RAM then that is 2097152k.  25% of that = 524288k.  So
> > the
> > tutorial is saying that 524288k is a good starting point for shared
> > buffers
> > with this amount of RAM.
> >
> > If each buffer is 8k
> > (http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node3.html)
> > then that would be 65536 buffers.
> >
> > I'm pretty sure that that is a lot more than I need to cache every tuple
> > in
> > my database.  Now everytime I see someone comment on this list about
> > appropriate numbers for shared buffers they say something like "(65536
> > buffers is) VERY high.  Now since I obviously don't need that much
shared
> > cache so I am not concerned but it seems to me that one of the following
> > must be true.
> >
> > My calculations here are wrong.
> > or The tutorial is not accurate in saying that 25% is a good starting
> > point.
> > or The people making comments that 65536 is "VERY high" are wrong.
> >
> > Am I just confused or does this make sense?
> >
> >> > Even if you've got the memory to spare?  Does postgres actually slow
> > down
> >> > just because it's slower to manager a lot of them just or because
> >> you're
> >> > taking the memory away from the kernel so the kernel has to swap
more?
> >>
> >> The latter, mainly, I think.  Also you *really* don't want your kernel
> >> to swap out any of your shared buffers, which can happen if they take
> >> up a significant portion of RAM...
> >
> > So if I'm not swapping at all and I've got over 1g of unused memory then
> > I'm
> > probably ok even with the very high buffer setting? (Although I will
> > probably reduce it anyway since it is unnecessary).
> >
> > rg
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
> >
>
>

Re: performance problem

From

Andrew Sullivan

Date:

20 November 2003, 17:12:19

On Thu, Nov 20, 2003 at 01:41:13PM -0700, Rick Gigger wrote:
> it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4% for
> sort size."

It's an old rule of thumb which has been superseded by new tests.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: performance problem

From

Mike Mascari

Date:

20 November 2003, 17:15:36

Rick Gigger wrote:

> Ok, adding the index back in worked the first time but then I tried
> generating the database from scratch again, this time adding the index right
> from the start.  It added the 45000 rows in about a minute but then was
> going dog slow on the updates again.  So I did an explain and sure enough it
> was not using the index.  After some investigation I determined that it was
> not using the index because when the transaction started there were only 4
> rows in the table so at that point it didn't want to use it.  It apparently
> doesn't gather analysis data fast enough to handle this kind of transaction.

The statistics are collected as a result of an ANALYZE command:

http://www.postgresql.org/docs/current/static/sql-analyze.html

This does not happen automatically. EXPLAIN output will show a default
assumption of 1000 rows, IIRC.

> I worked around this by starting the transaction and inserting the 45,000
> rows and then killing it.  The I removed the index and readded it which
> apparently gathered some stats and since there were all of the dead tuples
> in there from the failed transaction it now decided that it should use the
> index.  I reran the script and this time it took 5 minutes again instead of
> 1 1/2 hours.

If you examine the behavior of pg_dump output, you'll notice that it
doesn't built indexes until after the COPY command has completed the
data import. It's a waste of cpu cycles and disk bandwidth to update
indexes on upon every insert. Your script should:

1) Drop all indexes on the relation
2) Use COPY if possible instead of INSERT to import the data
3) Recreate the indexes
4) Use UPDATE to update as necessary

> I am using 7.2.4.  Has this improved in later versions?  I'm not concerened
> since this is a very rare thing to need to do and it's obviously possible to
> work around but it would be nice if postgres could figure things like that
> out on it's own.  (It certainly would have saved me a lot of time and
> confusion last night at about 3 am).  Is there a way to for the use of a
> specific index on a query?

You can force the use of an index scan by turning sequential scans to off:

SET ENABLE_SEQSCAN TO OFF;

But the problem is that the statistics didn't match the data. You
could have:

1) Used INSERTs to insert the data into a relation with an index
2) Executed ANALYZE <foo> to update the statistics
3) Perform the UPDATE

After the UPDATE, you'll still have dead tuples (the original rows)
which require that they be marked as dead, and so you should
occassionally run either VACUUM to mark them as such or VACUUM FULL to
reclaim the dead space or VACUUM FULL ANALYZE to also update the
relation's statistics.

I would just execute the steps I outlined above with COPY and not
worry about ANALYZEs and VACUUMs in a script. People often run VACUUM
ANALYZE in a 1/day cron job and VACUUM FULL ANALYZE in a 1/week cron
job. Then, of course, there's REINDEX...

Hope that helps,

Mike Mascari
mascarm@mascari.com

Re: performance problem

From

"Rick Gigger"

Date:

20 November 2003, 17:16:40

Ah, I didn't realize that you could just do an ANALYZE.  I thought there was
only VACUUM ANALYZE but that can't run inside of a transaction.

Thanks,

rg

----- Original Message -----
From: "Alvaro Herrera Munoz" <alvherre@dcc.uchile.cl>
To: "Rick Gigger" <rick@alpinenetworking.com>
Cc: "Mike Mascari" <mascarm@mascari.com>; "PgSQL General ML"
<pgsql-general@postgresql.org>
Sent: Thursday, November 20, 2003 2:06 PM
Subject: Re: [GENERAL] performance problem


On Thu, Nov 20, 2003 at 01:52:10PM -0700, Rick Gigger wrote:

> I worked around this by starting the transaction and inserting the 45,000
> rows and then killing it.  The I removed the index and readded it which
> apparently gathered some stats and since there were all of the dead tuples
> in there from the failed transaction it now decided that it should use the
> index.  I reran the script and this time it took 5 minutes again instead
of
> 1 1/2 hours.

Stats are not collected automatically.  You should run ANALYZE after
importing your data.  And it's probably faster to create the index after
the data is loaded, too.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y sonreí. Y fui feliz.
Y fue peor.

Re: performance problem

From

Tom Lane

Date:

20 November 2003, 17:20:20

"Rick Gigger" <rick@alpinenetworking.com> writes:
> I am confused.  In this tutorial (by Bruce Momjian)
> http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node8.html
> it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4% for
> sort size."

That advice is widely considered obsolete --- it was developed in the
days when typical RAM sizes were a lot less than today.  I don't believe
anyone has shown a good case for setting shared_buffers much higher than
10000.

            regards, tom lane

Re: Point-in-time data recovery - v.7.4

From

Jan Wieck

Date:

20 November 2003, 17:21:05

Andrew Sullivan wrote:

> On Wed, Nov 19, 2003 at 11:42:40AM -0500, Bruce Momjian wrote:
>> Jan is working on a community replication solution, and we will still
>> need PITR anyway because everyone doesn't have two machines.
>
> Jan's system, if I understand him correctly, will not actually
> require two machines to get you PITR either, although it probably
> _would_ require two datatabases and a hunk of disk.

You do understand me correctly.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: performance problem

From

"Rick Gigger"

Date:

20 November 2003, 18:21:20

> If you examine the behavior of pg_dump output, you'll notice that it
> doesn't built indexes until after the COPY command has completed the
> data import. It's a waste of cpu cycles and disk bandwidth to update
> indexes on upon every insert. Your script should:
>
> 1) Drop all indexes on the relation
> 2) Use COPY if possible instead of INSERT to import the data
> 3) Recreate the indexes
> 4) Use UPDATE to update as necessary

I never thought of dropping the indexes in the middle of the script and
readding them.  I guess that would be fine as long as nothing else was
running that needed the index while it was gone.  I don't think 7.2.x
supports using COPY unless you are inserting all of the fields which I don't
want to do although I suppose I could just insert all of the defaults.  Is
that correct?


> > I am using 7.2.4.  Has this improved in later versions?  I'm not
concerened
> > since this is a very rare thing to need to do and it's obviously
possible to
> > work around but it would be nice if postgres could figure things like
that
> > out on it's own.  (It certainly would have saved me a lot of time and
> > confusion last night at about 3 am).  Is there a way to for the use of a
> > specific index on a query?
>
> You can force the use of an index scan by turning sequential scans to off:
>
SET ENABLE_SEQSCAN TO OFF;

Thanks.


> But the problem is that the statistics didn't match the data. You
> could have:
>
> 1) Used INSERTs to insert the data into a relation with an index
> 2) Executed ANALYZE <foo> to update the statistics
> 3) Perform the UPDATE

That would have been the easiest thing.  I wanted to do that but I thought
that you had to do a vacuum (which I couldn't do in the transaction)
together with analyze.  I didn't realize that analyze was a command all by
itself.

> After the UPDATE, you'll still have dead tuples (the original rows)
> which require that they be marked as dead, and so you should
> occassionally run either VACUUM to mark them as such or VACUUM FULL to
> reclaim the dead space or VACUUM FULL ANALYZE to also update the
> relation's statistics.

I do them all nightly with cron on all my important databases.  I just had a
problem with need the analyze to happen in the middle of the transaction in
this one special case.

> I would just execute the steps I outlined above with COPY and not
> worry about ANALYZEs and VACUUMs in a script. People often run VACUUM
> ANALYZE in a 1/day cron job and VACUUM FULL ANALYZE in a 1/week cron
> job. Then, of course, there's REINDEX...

Doing the vacuum full analyze doesn't take long on to do if I do it once a
day so I just do that.

Re: performance problem

From

"Rick Gigger"

Date:

20 November 2003, 18:22:22

> "Rick Gigger" <rick@alpinenetworking.com> writes:
> > I am confused.  In this tutorial (by Bruce Momjian)
> > http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node8.html
> > it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4%
for
> > sort size."
>
> That advice is widely considered obsolete --- it was developed in the
> days when typical RAM sizes were a lot less than today.  I don't believe
> anyone has shown a good case for setting shared_buffers much higher than
> 10000.
>
> regards, tom lane

Thank you!

Re: performance problem

From

Bruce Momjian

Date:

20 November 2003, 19:51:18

Tom Lane wrote:
> "Rick Gigger" <rick@alpinenetworking.com> writes:
> > I am confused.  In this tutorial (by Bruce Momjian)
> > http://candle.pha.pa.us/main/writings/pgsql/hw_performance/node8.html
> > it says: "As a start for tuning, use 25% of RAM for cache size, and 2-4% for
> > sort size."
>
> That advice is widely considered obsolete --- it was developed in the
> days when typical RAM sizes were a lot less than today.  I don't believe
> anyone has shown a good case for setting shared_buffers much higher than
> 10000.

Yes, I will update that page shortly.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: uploading files

From

"Nigel J. Andrews"

Date:

22 November 2003, 11:43:26

On Tue, 18 Nov 2003, Rick Gigger wrote:

> I will search the archives but does anyone know off the top of their head
> which performs better?

lo_* but then I forgot to turn off compression on the bytea column.

--
Nigel Andrews


>
> ----- Original Message -----
> From: "Keith C. Perry" <netadmin@vcsn.com>
> To: "Rick Gigger" <rick@alpinenetworking.com>
> Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
> Sent: Tuesday, November 18, 2003 12:25 PM
> Subject: Re: [GENERAL] uploading files
>
>
> > Quoting Rick Gigger <rick@alpinenetworking.com>:
> >
> > > What is the best method for storing files in postgres?  Is it better to
> use
> > > the large object functions or to just encode the data and store it in a
> > > regular text or data field?
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 3: if posting/reading through Usenet, please send an appropriate
> > >       subscribe-nomail command to majordomo@postgresql.org so that your
> > >       message can get through to the mailing list cleanly
> > >
> >
> > Rick,
> >
> > This has been discussed recently so you might want to dig through the
> archives
> > but one thing to be aware of is that with large objects, you have have to
> dump
> > your database in a different format- a non-text format.  This is less
> portable
> > than storing files as bytea's which can be dumped in the text and non-text
> > formats.  Argueably is all you use is PosgreSQL then this might not be a
> big deal.
> >
> > Performance-wise I'm not qualified to speak to which is "better".  As
> always, it
> > is going to depend on your specific application and environment.
> >
> > --
> > Keith C. Perry, MS E.E.
> > Director of Networks & Applications
> > VCSN, Inc.
> > http://vcsn.com

Re: Point-in-time data recovery - v.7.4

From

Stef

Date:

24 November 2003, 14:59:29

And the world rejoiced as Shridhar Daithankar said :

=> Actually there is a debate ongoing right now, about what to call the next
=> release? 7.5 or 8.0
=>
=> So native win32 port is aimed for next major release whatever that might be..:-)

Well, put like this, it doesn't really make a difference in the end :)

Thanks.

Stef

Attachment

msg-32091-71191.dat