Thread: Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
lynch@lscorp.com (Richard Lynch)
Date:
At 2:41 PM 7/24/98, Steve Doliov wrote:
>On Fri, 24 Jul 1998, Richard Lynch wrote:
>> At 8:28 AM 7/24/98, Marc Fournier wrote:
>> >        So, essentially, our VACUUM command provides functionality that
>> >Oracle *doesn't* have, right?
>> Yes, but yours doesn't run automatically.
>> <NAIVE>
>> Ideally, when one created a database, one could specify vacuum frequency
>> and/or time slot, and PostgreSQL would just do it...
>> </NAIVE>
>
>uh...isn't that what cron is for????

Sure.  If:

A.  The person who creates the database is allowed to add cron jobs.
    [A host ISP could easily and quite reasonably make this not be true.]
B.  You know that you need to run vacuum regularly.
C.  You know that cron runs regularly for tasks like this.
D.  The webmaster who creates the db knows enough Unix to add cron jobs.
    [See *long* section below:]

For most folks creating databases, all of the above is probably true.  But,
as PostgreSQL becomes more popular, and with the explosion of virtual
hosting, you're going to see a whole lot of users who will fail at least
one of the above.  Do you really want to keep answering this vacuum/cron
question every week?

Why should every PostgreSQL user have to set this up for every editable
database?  Isn't it an axiom to set things up for the most common
situtation, with over-rides for the uncommon, unless there are significant
performance penalties or other, similar, more important reasons not to do
so?

Actually...

<NAIVE MODE=SUPERNAIVE>
How hard would it be for PostgreSQL to count how many deletes there have
been (or measure fragmentation), and run vacuum after XXX deletes or when
needed?  What would be the performance penalty?  Then it could be totally
automatic, transparent to the user, and only hard-core folks who really
want to tune their performance would need to over-ride the automatic
check/vacuum somehow.
</NAIVE>

Disclaimer:
I'm a Macintosh Lisp hacker.  I'm lucky to remember the right flags to
untar something.  (-xvf?) Correction.  I have to go look them up to be
sure. I had to go look up my notes from a Unix friend just to do 'man 5
crontab' just now so I could get to the syntax of the date/time stuff in
the cron file.  More and more webmasters on virtual hosts are going to be
Unix-disabled and wanting to use PostgreSQL.


Super-long section telling you just why I have yet to set up cron to vacuum
my database every night.

Here's what I went through today trying to research this cron thing.  I
already know that cron automagically schedules tasks to happen on a regular
frequent basis, because somebody told me so.  That's all I know about it.

man cron.  Ah.  Okay.  I want crontab(1) or (5) to edit the cron files.

man crontab.  Okay.  What's the syntax for the crontab file?  Must be in (5).

Oh.  How do I get crontab (5)?

man man

Hmm.  Is that [section] thing the 5?  Would it always say "See also xxx(#),
but then the syntax to actually look it up is backwards?:  "man # xxx"
Yup.  God, that's stupid.  Why does Unix always have to be so damn
bassackwards?  They could at least have had an example.

man 5 crontab.  Wow.  Cool.  All sorts of permutations on time.

That MAILTO thing... What if it's defined outside, like, in the
"environment" I keep hearing about and never know how to set?
And those last two examples... they get mailed to Paul also, right?  No,
wait, just because I sent mail to Joe to annoy him doesn't mean Paul
doesn't get mail telling him Joe got mail just to annoy him... Right?  So
that first example must be the only one that doesn't send mail to Paul.
Otherwise, why would the second one say it sends mail to Paul.  Now I know
that 2>&1 stuff sends the errors somewhere...  Where?  Is this what
over-rides the mail to Paul?  Or is mail to Paul sent for all of these just
like it says in the comment before the MAILTO, and that note in the second
one is just totally superfluous?...  And what exactly does it mean "if it
has any reason to send mail as a result of running commands in ''this''
crontab"?  Oh, that's in cron(8).  Look that up later.

Well, let's try crontab -e and mess around to find out the answers to these
questions.  Or at least to see if I can just ignore this whole MAILTO thing
for now.  I can always do crontab -r and blow it away.

Damn! What's this PICO thing?  How do I make it use vi?

Thank god for PICO's instructions so I know how to quit!

man 5 crontab  I know it said something about the editor, but now I don't
see it...

man 5 crontab  Still don't see it...  Oh.  Duh.

man crontab   Ah.  Okay, how do I set the VISUAL or EDITOR variable[s]?
Which one do I set?  Both?  Is it, like, variant based on
BSD/Linux/ATT/whatever?

And if I do find out how to set it, how do I make it always stay set?

Oh hell.  Look at the time.  I give up.

Guess I'll just do vacuum by hand for now.  Sigh.

Maybe next week I'll have a free hour to waste looking up cron(8) and
whatever that leads me to, and trying to figure out this environment
variable crap so I can actually edit a crontab file using an editor I
learned 20 years ago, and still hate, but at least I know how it works.
And I guess I'll need to reread crontab(5) to figure out the time fields
again.  Joy.

--
--
-- "TANSTAAFL" Rich lynch@lscorp.com



Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
Herouth Maoz
Date:
At 23:36 +0300 on 24/7/98, Richard Lynch wrote:


> Well, let's try crontab -e and mess around to find out the answers to these
> questions.  Or at least to see if I can just ignore this whole MAILTO thing
> for now.  I can always do crontab -r and blow it away.
>
> Damn! What's this PICO thing?  How do I make it use vi?
>
> Thank god for PICO's instructions so I know how to quit!
>
> man 5 crontab  I know it said something about the editor, but now I don't
> see it...
>
> man 5 crontab  Still don't see it...  Oh.  Duh.
>
> man crontab   Ah.  Okay, how do I set the VISUAL or EDITOR variable[s]?
> Which one do I set?  Both?  Is it, like, variant based on
> BSD/Linux/ATT/whatever?
>
> And if I do find out how to set it, how do I make it always stay set?
>
> Oh hell.  Look at the time.  I give up.
>
> Guess I'll just do vacuum by hand for now.  Sigh.

I nearly fell off my chair laughing... You know, on my personal TODO list
there's a little item "Write crons for VACUUM", which has been there for
months, and I keep postponing it.

I hate cron. I shudder at the thought I'll have to use it, with its
environment variables (which are not read from .cshrc), its odd intervals
(how do you cron something for "every three days"?), its permissions and
current directory and whatnot...

Although I'm a Mac user, like Richard, I'm also a qualified unix sysadmin.
But whenever someone tells me "oh, just use awk/sed/cron/procmail", I
shudder.

It's like telling me, instead of having a report generator, to use perl and
write my reports. More time wasted, more debugging spent. When unix
standard tools have a user interface, this world will be a nicer place to
live.

Here we go back to the problem. The authors don't want or don't have time
to make automated vacuums. So they tell me to "DIY". "DIY" is acceptable in
free software, but isn't in commercial products. The break-even point is
not entirely on Postgres's side, all things considered.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
James Olin Oden
Date:
> >uh...isn't that what cron is for????
>
> Sure.  If:
>
> A.  The person who creates the database is allowed to add cron jobs.
>     [A host ISP could easily and quite reasonably make this not be true.]

Well,  if they have the authority to create the database but not to use cron,
then its the sysadmin job.  If the sysadmin won't do it, the organization has
problem postgres can't fix (some problems arn't computer problems).

> B.  You know that you need to run vacuum regularly.

That is a documentation issue.

> C.  You know that cron runs regularly for tasks like this.

That is another documentation issue.  When I worked for BLAST we actually had a
tech note for using our product out of cron.  If a FAQ does not exist for this,
or its not in the documentation, that would be a good one to have.  You don't
have to explain cron completely, you just need to show some basic examples of
saying kicking off a vacuum from cron, and of course point them to cron's man
page for the gory details.

> D.  The webmaster who creates the db knows enough Unix to add cron jobs.
>     [See *long* section below:]

Another documentation issue.

>
>
> For most folks creating databases, all of the above is probably true.  But,
> as PostgreSQL becomes more popular, and with the explosion of virtual
> hosting, you're going to see a whole lot of users who will fail at least
> one of the above.  Do you really want to keep answering this vacuum/cron
> question every week?

Again, a documentation issue, though I am not opposed to automatic vacuums, as
long as they are tunable, but it is pretty much standard operating procedures
under UNIX to use cron to automate things like this.

> Why should every PostgreSQL user have to set this up for every editable
> database?  Isn't it an axiom to set things up for the most common
> situtation, with over-rides for the uncommon, unless there are significant
> performance penalties or other, similar, more important reasons not to do
> so?

A good point...james


Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
Chris Johnson
Date:
On Sun, 26 Jul 1998, Herouth Maoz wrote:

> At 23:36 +0300 on 24/7/98, Richard Lynch wrote:
>
>
> > Well, let's try crontab -e and mess around to find out the answers to these
> > questions.  Or at least to see if I can just ignore this whole MAILTO thing
> > for now.  I can always do crontab -r and blow it away.
>
> I nearly fell off my chair laughing... You know, on my personal TODO list
> there's a little item "Write crons for VACUUM", which has been there for
> months, and I keep postponing it.

I'm another one of those people... so I wrote a quick & dirty c program to
vacuum ALL my databases.  I just mailed it to Marc Fournier
(scrappy@hub.org) because I didn't know what to do with it, but if anyone
else wants it just drop me a line directly.

Chris


Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
The Hermit Hacker
Date:
On Mon, 27 Jul 1998, Chris Johnson wrote:

> On Sun, 26 Jul 1998, Herouth Maoz wrote:
>
> > At 23:36 +0300 on 24/7/98, Richard Lynch wrote:
> >
> >
> > > Well, let's try crontab -e and mess around to find out the answers to these
> > > questions.  Or at least to see if I can just ignore this whole MAILTO thing
> > > for now.  I can always do crontab -r and blow it away.
> >
> > I nearly fell off my chair laughing... You know, on my personal TODO list
> > there's a little item "Write crons for VACUUM", which has been there for
> > months, and I keep postponing it.
>
> I'm another one of those people... so I wrote a quick & dirty c program to
> vacuum ALL my databases.  I just mailed it to Marc Fournier
> (scrappy@hub.org) because I didn't know what to do with it, but if anyone
> else wants it just drop me a line directly.

    Will merge it with the base code later tonight...


Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
Dustin Sallings
Date:
> I'm another one of those people... so I wrote a quick & dirty c program to
> vacuum ALL my databases.  I just mailed it to Marc Fournier
> (scrappy@hub.org) because I didn't know what to do with it, but if anyone
> else wants it just drop me a line directly.

    I do this to vacuum and back up all of my databases every day.

#!/bin/sh

cat <<EOF
*
* Vacuuming databasen at `date`
*
EOF

(
for host in dbserver1 dbserver2
do
    echo "Doing host $host"
    PSQL="/usr/local/pgsql/bin/psql -h $host"
    PDUMP="/usr/local/pgsql/bin/pg_dump -h $host"
    CI=/usr/local/bin/ci
    cd $HOME/bak/$host
    for i in `$PSQL -t -c "select datname from pg_database" template1`
    do
        echo "Vacuuming $i ..."
        $PSQL -c "vacuum analyze;" $i

        if [ -f RCS/$i,v.gz ]
        then
            /usr/local/bin/gzip -d RCS/$i,v.gz
        fi

        $PDUMP -f$i $i
        $CI -l -m"backup" $i
        rm $i

        if [ -f RCS/$i,v ]
        then
            ls -l RCS/$i,v
            /usr/local/bin/gzip -9 RCS/$i,v &
        else
            echo "Oops!  ci broke somehow."
        fi

    done

    echo "Waiting for all the gzips to finish..."
    df .
    wait
done
)


--
SA, software.net         My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: [GENERAL] Re: [HACKERS] [Fwd: SGVLLUG Oracle and Informix on Linux]

From
Bruce Momjian
Date:
> On Sun, 26 Jul 1998, Herouth Maoz wrote:
>
> > At 23:36 +0300 on 24/7/98, Richard Lynch wrote:
> >
> >
> > > Well, let's try crontab -e and mess around to find out the answers to these
> > > questions.  Or at least to see if I can just ignore this whole MAILTO thing
> > > for now.  I can always do crontab -r and blow it away.
> >
> > I nearly fell off my chair laughing... You know, on my personal TODO list
> > there's a little item "Write crons for VACUUM", which has been there for
> > months, and I keep postponing it.
>
> I'm another one of those people... so I wrote a quick & dirty c program to
> vacuum ALL my databases.  I just mailed it to Marc Fournier
> (scrappy@hub.org) because I didn't know what to do with it, but if anyone
> else wants it just drop me a line directly.
>

Here is a little shell script to do it.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
psql -l -A -q -t|cut -d"|" -f1 | while read DATABASE
    do
        psql -q -c vacuum $DATABASE
    done

Re: [GENERAL] How to know when to vacuum

From
Chris Johnson
Date:
OK, so there's been quite a bit of traffic about vacuuming databases as
well as more than one suggestion on how to do it.  But there really hasn't
been an answer to the question of how to know when to vacuum.

I now vacuum the databases every night, but this seems somewhat
inefficient... I know that some of the more active databases could use
some extra vacuuming, but which ones? how often?  If there is a way to ask
postgres whether a specific database could use a vacuuming I will be happy
to write something to automate it, but I see no way to do so.

Anyone have any suggestions?

Thanks,
Chris



Re: [GENERAL] How to know when to vacuum

From
The Hermit Hacker
Date:
On Tue, 28 Jul 1998, Chris Johnson wrote:

>
> OK, so there's been quite a bit of traffic about vacuuming databases as
> well as more than one suggestion on how to do it.  But there really hasn't
> been an answer to the question of how to know when to vacuum.
>
> I now vacuum the databases every night, but this seems somewhat
> inefficient... I know that some of the more active databases could use
> some extra vacuuming, but which ones? how often?  If there is a way to ask
> postgres whether a specific database could use a vacuuming I will be happy
> to write something to automate it, but I see no way to do so.
>
> Anyone have any suggestions?

There are two reasons, that I can think of, to run vacuum:

    1. update statistics used for the optimizer
    2. clean up "fragmentation"

1. the optimizer decides whether or not to use indices, and which ones it
   uses, based on a miriad of values, but one of them is based on
   statistics that vacuum generates.  ie. if the table is small, it might
   be faster to just do a sequential scan vs using an index.  As such, a
   vacuum should be performed after a large amount of inserts/deletes or
   updates have been performed, so that the optimizer has reasonably
   accurate numbers to work with.  VACUUM ANALYZE can be used for this
   one, which, in the future, will hopefully not lock down the database
   while its being performed.

2. the server currently doesn't "reuse" deleted rows, but just keeps
   appending them to the end.  running a straight VACUUM will perform a
   de-fragmentation by essentially re-writing the database and then
   performing equivalent to an 'ftruncate()' at the end to shrink the
   table size back down again.  The only time you should have to do a full
   VACUUM is after a massive amount of DELETEs to a table...and,
   hopefully, the requirement for that will decrease over time too, as
   there has been talk about adding in functionality to reuse delete
   rows..





row oids as "foreign keys" in other tables ?

From
Matt McClure
Date:
I'm relatively new to postgres and I've had a couple of questions for a
while now.  This post made me worry about them again:

> 2. the server currently doesn't "reuse" deleted rows, but just keeps
>    appending them to the end.  running a straight VACUUM will perform a
>    de-fragmentation by essentially re-writing the database and then
>    performing equivalent to an 'ftruncate()' at the end to shrink the
>    table size back down again.  The only time you should have to do a full
>    VACUUM is after a massive amount of DELETEs to a table...and,
>    hopefully, the requirement for that will decrease over time too, as
>    there has been talk about adding in functionality to reuse delete
>    rows..

I started to make a database and I wanted to simulate foreign keys
somehow.  So I decided to simply insert the oid of a row in one table into
the "foreign key" column in another table.

For example,

create table concert (
    day_of_show date,
    venue text);

create table song (
    song_name text,
    author_fname text,
    author_lname text);

create table concert_song (
    concert_oid oid,
    song_oid oid);

Then I have a perl script that does my inserts so that whenever I insert a
concert and the songs played, I take the appropriate row oids from concert
and song and insert them into concert_song.

You say that vacuum "re-writes" the database.  Does it alter row oids???
If so, my scheme completely corrupts my database whenever I do a vacuum,
since in concert and song the row oids would change, but my inserted
values would remain the same in concert_song, right?

If vacuum does not alter row oids, then I have another question.  How does
postgres re-use oids?  I've seen the numbers grow and grow, but despite
deletes, etc, I have never seen a lower oid get re-used.  How does this
work?

Thanks a bunch,
Matt


Re: [GENERAL] How to know when to vacuum

From
Chris Johnson
Date:
Yes, but... does postgres maintain some statistics that could be queried
to determine whether vacuuming would be helpful? For Case 1 I would need
to know how many records were added since the last vacuum relative to the
total number of records in each table.  For case 2 I guess you really only
need to know how many records have been deleted.

Any way to get that information?

Chris

On Tue, 28 Jul 1998, The Hermit Hacker wrote:

> On Tue, 28 Jul 1998, Chris Johnson wrote:
>
> >
> > OK, so there's been quite a bit of traffic about vacuuming databases as
> > well as more than one suggestion on how to do it.  But there really hasn't
> > been an answer to the question of how to know when to vacuum.
> >
> > I now vacuum the databases every night, but this seems somewhat
> > inefficient... I know that some of the more active databases could use
> > some extra vacuuming, but which ones? how often?  If there is a way to ask
> > postgres whether a specific database could use a vacuuming I will be happy
> > to write something to automate it, but I see no way to do so.
> >
> > Anyone have any suggestions?
>
> There are two reasons, that I can think of, to run vacuum:
>
>     1. update statistics used for the optimizer
>     2. clean up "fragmentation"
>
> 1. the optimizer decides whether or not to use indices, and which ones it
>    uses, based on a miriad of values, but one of them is based on
>    statistics that vacuum generates.  ie. if the table is small, it might
>    be faster to just do a sequential scan vs using an index.  As such, a
>    vacuum should be performed after a large amount of inserts/deletes or
>    updates have been performed, so that the optimizer has reasonably
>    accurate numbers to work with.  VACUUM ANALYZE can be used for this
>    one, which, in the future, will hopefully not lock down the database
>    while its being performed.
>
> 2. the server currently doesn't "reuse" deleted rows, but just keeps
>    appending them to the end.  running a straight VACUUM will perform a
>    de-fragmentation by essentially re-writing the database and then
>    performing equivalent to an 'ftruncate()' at the end to shrink the
>    table size back down again.  The only time you should have to do a full
>    VACUUM is after a massive amount of DELETEs to a table...and,
>    hopefully, the requirement for that will decrease over time too, as
>    there has been talk about adding in functionality to reuse delete
>    rows..
>
>
>
>
>


Re: [GENERAL] row oids as "foreign keys" in other tables ?

From
Vadim Mikheev
Date:
Matt McClure wrote:
>
> You say that vacuum "re-writes" the database.  Does it alter row oids???
                                                 ^^^^^^^^^^^^^^^^^^^^^^
No.

> If so, my scheme completely corrupts my database whenever I do a vacuum,
> since in concert and song the row oids would change, but my inserted
> values would remain the same in concert_song, right?
>
> If vacuum does not alter row oids, then I have another question.  How does
> postgres re-use oids?  I've seen the numbers grow and grow, but despite
  ^^^^^^^^^^^^^^^^^^^^
It doesn't.

> deletes, etc, I have never seen a lower oid get re-used.  How does this
> work?

Vadim

Re: [GENERAL] row oids as "foreign keys" in other tables ?

From
Matt McClure
Date:
On Wed, 29 Jul 1998, Vadim Mikheev wrote:

> Matt McClure wrote:
> >
> > You say that vacuum "re-writes" the database.  Does it alter row oids???
>                                                  ^^^^^^^^^^^^^^^^^^^^^^
> No.
>
> > If so, my scheme completely corrupts my database whenever I do a vacuum,
> > since in concert and song the row oids would change, but my inserted
> > values would remain the same in concert_song, right?
> >
> > If vacuum does not alter row oids, then I have another question.  How does
> > postgres re-use oids?  I've seen the numbers grow and grow, but despite
>   ^^^^^^^^^^^^^^^^^^^^
> It doesn't.
>
> > deletes, etc, I have never seen a lower oid get re-used.  How does this
> > work?
>
> Vadim
>

Thanks for the help.

Doesn't the fact that postgres never re-uses deleted (and therefore no
longer in use anywhere) oids create a problem when you reach the upper
bound?  Or is the upper bound on oids so ridiculously high that it
shouldn't be a concern?  Or does postgres have a scheme for increasing
oids without bound entirely?

In any case, using row oids from one table as values in another table
won't ever be an issue, right?

-Matt


Re: [GENERAL] row oids as "foreign keys" in other tables ?

From
Vadim Mikheev
Date:
Matt McClure wrote:
> > >
> > > If vacuum does not alter row oids, then I have another question.  How does
> > > postgres re-use oids?  I've seen the numbers grow and grow, but despite
> >   ^^^^^^^^^^^^^^^^^^^^
> > It doesn't.
>
> Doesn't the fact that postgres never re-uses deleted (and therefore no
> longer in use anywhere) oids create a problem when you reach the upper
> bound?  Or is the upper bound on oids so ridiculously high that it
> shouldn't be a concern?  Or does postgres have a scheme for increasing
> oids without bound entirely?

We have plans to make using global oid in user tables optional...

>
> In any case, using row oids from one table as values in another table
> won't ever be an issue, right?

Right. But you could also use sequences...

Vadim