Thread: Any Good Way To Do Sync DB's?

Any Good Way To Do Sync DB's?

From
Joseph Koenig
Date:
I have a project where a client has products stored in a large Progress
DB on an NT server. The web server is a FreeBSD box though, and the
client wants to try to avoid the $5,500 license for the Unlimited
Connections via OpenLink software and would like to take advantage of
the 'free' non-expiring 2 connection (concurrent) license. This wouldn't
be a huge problem, but the DB can easily reach 1 million records. Is
there any good way to pull this data out of Progess and get it into
Postgres? This is way too large of a db to do a "SELECT * FROM table"
and do an insert for each row. Any brilliant ideas? Thanks,

Joe

Re: Any Good Way To Do Sync DB's?

From
Doug McNaught
Date:
Joseph Koenig <joe@jwebmedia.com> writes:

> I have a project where a client has products stored in a large Progress
> DB on an NT server. The web server is a FreeBSD box though, and the
> client wants to try to avoid the $5,500 license for the Unlimited
> Connections via OpenLink software and would like to take advantage of
> the 'free' non-expiring 2 connection (concurrent) license. This wouldn't
> be a huge problem, but the DB can easily reach 1 million records. Is
> there any good way to pull this data out of Progess and get it into
> Postgres? This is way too large of a db to do a "SELECT * FROM table"
> and do an insert for each row. Any brilliant ideas? Thanks,

Probably the best thing to do is to export the data from Progress in a
format that the PostgreSQL COPY command can read.  See the docs for
details.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: Any Good Way To Do Sync DB's?

From
Gordan Bobic
Date:
On 12 Oct 2001, Doug McNaught wrote:

> Joseph Koenig <joe@jwebmedia.com> writes:
>
> > I have a project where a client has products stored in a large Progress
> > DB on an NT server. The web server is a FreeBSD box though, and the
> > client wants to try to avoid the $5,500 license for the Unlimited
> > Connections via OpenLink software and would like to take advantage of
> > the 'free' non-expiring 2 connection (concurrent) license. This wouldn't
> > be a huge problem, but the DB can easily reach 1 million records. Is
> > there any good way to pull this data out of Progess and get it into
> > Postgres? This is way too large of a db to do a "SELECT * FROM table"
> > and do an insert for each row. Any brilliant ideas? Thanks,
>
> Probably the best thing to do is to export the data from Progress in a
> format that the PostgreSQL COPY command can read.  See the docs for
> details.

I'm going to have to rant now. The "dump" and "restore" which use the COPY
method are actually totally useless for large databases. The reason for
this is simple. Copying a 4 GB table with 40M rows requires over 40GB of
temporary scratch space to copy, due to the WAL temp files. That sounds
totally silly. Why doesn't pg_dump insert commits every 1000 rows or so???

Cheers.

Gordan


Re: Any Good Way To Do Sync DB's?

From
Gordan Bobic
Date:
On Fri, 12 Oct 2001, Joseph Koenig wrote:

> I have a project where a client has products stored in a large Progress
> DB on an NT server. The web server is a FreeBSD box though, and the
> client wants to try to avoid the $5,500 license for the Unlimited
> Connections via OpenLink software and would like to take advantage of
> the 'free' non-expiring 2 connection (concurrent) license. This wouldn't
> be a huge problem, but the DB can easily reach 1 million records. Is
> there any good way to pull this data out of Progess and get it into
> Postgres? This is way too large of a db to do a "SELECT * FROM table"
> and do an insert for each row. Any brilliant ideas? Thanks,

Yes. Throw away NT, and replace it with Linux or FreeBSD. They are free,
they are much more stable, and you will find that PostgreSQL on Linux is
likely to utterly blow away PostgreSQL on NT. IIRC, something to do with
the way NT handles forking and threads. Allegedly, this is particularly
noticeable in a heavy multi-user environment, where lots of connections
are spawned and killed.

Another thing you could try - Apache persistent DBI connections using
mod_perl. Set up Apache to use mod_perl and the persistent DBI database
connections through it. Then limit the number of Apache instances to 2
(i.e. up to 2 simultaneous connections). Then put an accelerating
squid proxy in front of the Apache web server, so that it can handle
static objects without wasting precious Apache back-ends.

All this seems like a hugely complicated solution when you could just
replace NT with Linux or FreeBSD and have a cheaper and MUCH faster
sollution, not to mention more stable.

Just my $0.02

Regards.

Gordan


Re: Any Good Way To Do Sync DB's?

From
Tom Lane
Date:
Gordan Bobic <gordan@bobich.net> writes:
> I'm going to have to rant now. The "dump" and "restore" which use the COPY
> method are actually totally useless for large databases. The reason for
> this is simple. Copying a 4 GB table with 40M rows requires over 40GB of
> temporary scratch space to copy, due to the WAL temp files. That sounds
> totally silly. Why doesn't pg_dump insert commits every 1000 rows or so???

Yup, "rant" is the correct term.  The WAL file growth problem has been
fixed as of 7.1.3, so your premise is false.

            regards, tom lane

Re: Any Good Way To Do Sync DB's?

From
Gordan Bobic
Date:
On Saturday 13 Oct 2001 15:54, you wrote:
> Gordan Bobic <gordan@bobich.net> writes:
> > I'm going to have to rant now. The "dump" and "restore" which use the
> > COPY method are actually totally useless for large databases. The reason
> > for this is simple. Copying a 4 GB table with 40M rows requires over 40GB
> > of temporary scratch space to copy, due to the WAL temp files. That
> > sounds totally silly. Why doesn't pg_dump insert commits every 1000 rows
> > or so???
>
> Yup, "rant" is the correct term.  The WAL file growth problem has been
> fixed as of 7.1.3, so your premise is false.

D'oh! OK, I'm off to upgrade... I'll just shut up, shall I... ;-)

Thanks for the tip. :-)

Regards.

Gordan

Re: Any Good Way To Do Sync DB's?

From
"Gurunandan R. Bhat"
Date:
On 12 Oct 2001, Doug McNaught wrote:

> Probably the best thing to do is to export the data from Progress in a
> format that the PostgreSQL COPY command can read.  See the docs for
> details.

Hi,

    I wrote a quick and dirty function/trigger to sync two DBs - one
local and the other on the web. The method is quite simple. Any insert or
update fires a trigger that "serialises" the entries and stores it in a
log with the table name, the primary key and the timestamp. When an entry
is deleted, the same happens except that the serialised column contains a
null. So when I sync, I just need to upload the changes and not the entire
dump. I think this is a good opportunity to get some advice feedback on
the code, so here it is:

--------------------------------------------------------------------------------------------
drop function setuptriggers();
create function setuptriggers() returns int as '
       declare
        fb        text;
        tb        text;
        tresult        record;
        cresult        record;
        pkeyname    name;
    begin
        for tresult in select * from pg_class
                  where relkind = ''r''
                  and   relname !~ ''^pg_''
                  and   relname !~ ''^Inv''
                  and   relname !~ ''^pga_''
                  order by relname
        loop
            select into pkeyname c.attname from pg_class a, pg_index b, pg_attribute c
            where a.relname = tresult.relname and
                  a.oid = b.indrelid and
                  a.oid = c.attrelid and
                  b.indkey[0] = c.attnum and
                  b.indisprimary=''t'';
            if pkeyname is not null and tresult.relname != ''logtable'' then
               fb := ''
                  create function logchange_'' || quote_ident(tresult.relname) || ''() returns opaque as ''''
                  declare
                     serialized text;
                     updatetime timestamp;
                     separator text;
                  begin
                     updatetime := ''''''''now'''''''';
                     separator := chr(178);
                     serialized := '''''''''''''''';
                  '';
                  for cresult in select * from pg_class a, pg_attribute b
                       where a.relname = tresult.relname and
                                       a.oid = b.attrelid and
                                       b.attnum > 0
                       order by b.attnum
                  loop
                fb := fb || ''   if NEW.'' || quote_ident(cresult.attname) || '' is not null then
                      serialized := serialized || separator || '''''''''' || quote_ident(cresult.attname) ||
''=''''''''||  NEW.'' || quote_ident(cresult.attname) || ''; 
                     end if;
                  '';
                  end loop;
                  fb := fb || ''   insert into logtable values (NEW.''|| quote_ident(pkeyname) || '', '''''''''' ||
quote_ident(tresult.relname)|| '''''''''', serialized, updatetime); 
                     return new;
                  end;''''
                  language ''''plpgsql'''';'';
                  execute fb;
                  tb := ''create trigger fireon_'' || quote_ident(tresult.relname) || '' before insert or update on ''
||quote_ident(tresult.relname) || '' 
                        for each row execute procedure logchange_'' || quote_ident(tresult.relname) || ''();'';
                  execute tb;
            end if;
        end loop;
        return 1;
    end;'
language 'plpgsql';
-------------------------------------------------------------------------------------------------
I hope this is usefule


Re: Any Good Way To Do Sync DB's?

From
Joseph Koenig
Date:
Your solution sounds very interesting (Not the throw away NT
part...)...does anyone else have any input on this? Would it work well?
Any idea as to what amount of traffic it would be capable of handling?
If apache is only running in two instances, would that really keep the
number of ODBC connections to 2 at max? By the way, I fully agree with
throwing away NT -- unfortunately this DB is part of their inventory and
POS system that another company set up for them. They just finished
investing about $200K in the system, so just telling them to get rid of
it isn't quite an answer. Thanks for all of the input from everyone.

Joe

Gordan Bobic wrote:
>
> On Fri, 12 Oct 2001, Joseph Koenig wrote:
>
> > I have a project where a client has products stored in a large Progress
> > DB on an NT server. The web server is a FreeBSD box though, and the
> > client wants to try to avoid the $5,500 license for the Unlimited
> > Connections via OpenLink software and would like to take advantage of
> > the 'free' non-expiring 2 connection (concurrent) license. This wouldn't
> > be a huge problem, but the DB can easily reach 1 million records. Is
> > there any good way to pull this data out of Progess and get it into
> > Postgres? This is way too large of a db to do a "SELECT * FROM table"
> > and do an insert for each row. Any brilliant ideas? Thanks,
>
> Yes. Throw away NT, and replace it with Linux or FreeBSD. They are free,
> they are much more stable, and you will find that PostgreSQL on Linux is
> likely to utterly blow away PostgreSQL on NT. IIRC, something to do with
> the way NT handles forking and threads. Allegedly, this is particularly
> noticeable in a heavy multi-user environment, where lots of connections
> are spawned and killed.
>
> Another thing you could try - Apache persistent DBI connections using
> mod_perl. Set up Apache to use mod_perl and the persistent DBI database
> connections through it. Then limit the number of Apache instances to 2
> (i.e. up to 2 simultaneous connections). Then put an accelerating
> squid proxy in front of the Apache web server, so that it can handle
> static objects without wasting precious Apache back-ends.
>
> All this seems like a hugely complicated solution when you could just
> replace NT with Linux or FreeBSD and have a cheaper and MUCH faster
> sollution, not to mention more stable.
>
> Just my $0.02
>
> Regards.
>
> Gordan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Any Good Way To Do Sync DB's?

From
Gordan Bobic
Date:
On Monday 15 Oct 2001 13:35, Joseph Koenig wrote:
> Your solution sounds very interesting (Not the throw away NT
> part...)

That is where a signifficant part of the performance improvement would come
from, if performance was what you were after...

> ...does anyone else have any input on this? Would it work well?

It works great - I am using it on a number of servers. It also gives you a
much more secure setup, because you can pipe everything through a single
firewall with a squid accelerating proxy. That means that a hacker would have
to breach your firewall befure they would have even a remote opportunity to
penetrate deeper into your network and attack the web server.

> Any idea as to what amount of traffic it would be capable of handling?

Any amount of traffic, it just means that only two connections would be
served at any one time. If your query return and CGI page construction times
are very low (which they should be if you are using mod_perl on a decent
server), then you are unlikely to notice any slowdown. The excess hits would
just get queued and processed when back-ends become available. It would also
be a good idea to look through your scripts and attempt to connect the
database in Perl::DBI as late as possible (not at the beginning of the script
as a global variable, but just before the query is issued), and disconnect as
early as possible (as soon as the query is executed and the data retrieved).
This would not hog the connections to the PostgreSQL end as much.

Of course, this is all provided you use perl CGIs and mod_perl on Apache.
Otherwise, persistent connections don't come into the picture at all.

> If apache is only running in two instances, would that really keep the
> number of ODBC connections to 2 at max?

No, that would only keep the number of your connections from the WEB SERVER
to two max. ODBC connections from the rest of your network would be a
completely separate issue. Basically, if Apache only ever has two instances
running, then it can only ever serve up to 2 requests at any one time, and
hence the database server will never receive more than 2 simultaneous request
from the web server - UNLESS your CGIs make multiple database connections in
parallel. If they do, then you're fighting a loosing battle, and you might as
well give up.

If your CGIs only ever use one connection, then putting the web server behind
an accelerating squid proxy would actually help further, by not using the
Apache back ends to serve static documents, such as frames pages, gif/jpg
background or button images, etc. This means that only the actual CGI
requests would go to Apache. I have implemented this sort of a load-reduction
solution with a reduced number of active Apache servers in my last
consultancy contract, and it worked very well.

> By the way, I fully agree with throwing away NT --

Understandable...

> unfortunately this DB is part of their inventory and
> POS system that another company set up for them. They just finished
> investing about $200K in the system, so just telling them to get rid of
> it isn't quite an answer. Thanks for all of the input from everyone.

Hang on - if they have just invested $200K into this "solution", then why
exactly is an additional $5,500 for another licence a problem all of a
sudden???

Regards.

Gordan