Thread: Moving my business to PostgreSQL

Moving my business to PostgreSQL

From
"Jason Watkins"
Date:
This post is long, if you'd like to offer quick point advice, I specificly
am looking for live replication/fallover for PostgreSQL, daily
synchronization with Access, and data import/restructuring from the D3
nested relational database to PostgreSQL.

I also am looking for more general sage like advice on the varried
challanges I will likely face in the next 6 months. Some of these are not
Postgre specific, but since I hope to make Postgre the core of this
business, I hope you'll indulge me and offer any advice you might have.

I'm the new manager of operations of a mail order retailer. I was hired by
new ownership, which purchased the business assets from the previous owners
in Nov. We went to the application vendor, and had them build us a system
preconfigured with the same version of their software the previous owners
used. For a variety of reasons, this system _barely_ works at the moment,
and most likely will not support our future plans for the business.

There is, quite simply, no budget to buy a more capable off the shelf
system.

So, where that brings me is trying to engineer a better system myself. I've
worked as a CRM implimenter, and done a variety of web dev. Over the last
year, I've become quite sold on postgre's capabilities. Unless your needs
are esoteric, it's Oracle at a FreeBSD price point :).

The basic process of our business is we handle orders through a call center.
Orders are exported from their system by flat file and imported by our
system several times daily. Payment is processed in batch by modem daily. We
authorize before shipping, then capture deposits after shipping.

Orders are sent to our shipping facility 1 day ahead of each shipping day.
They're exported as a flat file from our system, moved over *cough* PC
Anywhere to the shipping facility, and I execute some DTS scripts on the
remote machine via PC Anywhere that verify and insert the flat file into
their Access database.

At the shipping facility, a bar code system from ADSI manages picking
products, generating shipping invoices, etc.

The day after each shipping data, a flat file is built on that system, and
again by PC Anywhere, moved back to our system. The order id's and FedEx
tracking #'s are used to capture payments, again batch processed by modem.

All of these process steps are manually initiated.

The existing system uses the D3 nested relational database.

The general outline of what I'm planning is to move most everything to 2
hosted servers. A web application will be used by the call center agents to
enter orders directly to our system. Payment processing will be done over
ssl by a schedualed cron job. Postgre will of course, be the database.

Some of the bigger problems I'm facing:

1.) Exporting 12 years worth of data from a nested relational database,
restructuring that to a sql database to get it into Postgre. The current
database is ~500meg. I'm not that familiar with D3, but a large portion of
that seems to be history table data. I'd estimate the business has done
50,000 orders. There's information relating to 10 warehouses, and probibly a
few thousand SKU's over the years. It's vital that the new system have this
data, since forcasting and modeling are the life or death of a mail order
business.

2.) Providing live fallover between the servers. As we internalize what used
to be a web application hosted at the call center to our own system, we take
on the responsibility of providing 24/7/365 uptime on that system.

3.) Integrating my new Posgre based system with the barcode/shipping system.
This comes down to 2 way sync of a few tables between Postgre and Access.

Smaller stuff:

1.) Reporting tool: right now I'm thinking of having my web application just
generate simple spreadsheet files. Anything more snazzy will need to be
accessable to business persons who uses mac and pc systems. We also use
linux and irix desktops, but I'm fine with ignoring those for reporting.

2.) Monitoring and backup schemes for Postgre. I plan to host the servers at
someplace like rackspace. I've found that co-location facilities don't offer
the backup frequency I'd like, so I'd like to get updates to backups daily.
These can just be generated on the remote servers and ftp'd by my job to our
location where I can put them on tape and in the fire safe. I also need some
way of monitoring the whole system, and quick recovery/reinstallation in the
face of a failure.

Stuff not related to Postgre (off list response please):
1.) Payment processing: we currently use Paymentech. They say they're
working on an SSL gateway, but so far they only offer a one item at a time
SOAP-like interface. Our products are requested for delivery on particular
days, and often far in advance of when an authorization done on the day of
ordering would expire. That means while it would be *very* nice to do AVS on
order entry, we have to have a schedualed job that looks for orders soon
shipping that need authorization, or orders just shipped that need to
capture deposit. I've found getting real answers from payment processors a
real PITA, so any anicdotal advice here I'd really welcome.
2.) Anything that speeds developing the web application. This will be done
entirely by me, with about 20hrs/week to dedicate to it. I need to have the
system tested, running and be 100% confident of it by Nov 1 to handle the
xmas rush. I know, I know, but there simply isn't any other option available
to me... I'm workin' with what I got. I imagine there are a few eCom systems
compatable with Postgre that might be worth looking at. However, you can see
these will need signifigant customization, and the ones I've looked at so
far were to inflexible to handle what I need.
3.) At this point, I'm not ready to consider changing the shipping system.
However, once the front half of the business is moved to Postgre, I would
like to entertain thoughts about moving those pieces to open source as well.
Any info on barcode hardware and FedEx software interfaces would be helpful
when I come around to that.

A lot of what I'm facing only bears partly on Postgre, but, assuming I find
this can be done, I want to do a detailed write up (*cough* "case study" for
you MBA's). I think Postgre could really benifit if people can read how a
small company like us was able to move our entire business to open source
infrastructure, with Postgre at the core. I'd love to do whatever I can to
show people that this is a viable option to build your business is around. I
appreciate any advice or anicdotes you might offer. If not PostgreSQL
specific, or you feel they'd not be of interest to the list as a whole,
please respond directly.

jason watkins


Re: Moving my business to PostgreSQL

From
Jeff Davis
Date:
On Sunday 27 January 2002 10:20 pm, you wrote:
> This post is long, if you'd like to offer quick point advice, I specificly
> am looking for live replication/fallover for PostgreSQL, daily
> synchronization with Access, and data import/restructuring from the D3
> nested relational database to PostgreSQL.
>

Sorry I can't answer too many of your questions, but one thing I am quite
sure of is that Postgres does not offer replication/failover. I inferred this
information from the postgres developer TODO
(http://developer.postgresql.org/todo.php), which still lists both features
(and has for some time, I might add).

I really think that postgres is a great database; it certainly has many
advanced features, and is also developing quite rapidly. However, it sounds
as though it doesn't have every capability you require. Replication/failover
seem important to what you're doing, and unless I'm mistaken, they don't
exist in postgres. You may be able to make use of what work has been done in
that regard.

I guess the question now is: Is there something out there that does meet your
requirements? Postgres may be your best bet if you can't afford the costs
associated with oracle. Oracle may be best if the replication/failover really
is a requirement. Or maybe you can find a cheaper commercial database that
offers more of what you need than postgres.

Note that my experience is somewhat limited, so don't take what I say for
fact.

Regards, and good luck,

    Jeff


Re: Moving my business to PostgreSQL

From
"Jason Watkins"
Date:
> Sorry I can't answer too many of your questions, but one thing I am quite
> sure of is that Postgres does not offer replication/failover. I inferred
this

I did some more hunting tonight. It seems postgresql.com recently released a
beta of their eRServ, which looks to do exactly what I need in that reguard.

I know that it was high on the todo list, and then I heard something about
Great Bridge closing its doors so the replication work was loosing it's
sponsor. I also remember that someone out in academia had code that worked
with an older postgre version and was working on porting it to the current
codebase.

I was hoping that there was new information I hadn't come across :(.

> I really think that postgres is a great database; it certainly has many
> advanced features, and is also developing quite rapidly. However, it
sounds
> as though it doesn't have every capability you require.
Replication/failover
> seem important to what you're doing, and unless I'm mistaken, they don't
> exist in postgres. You may be able to make use of what work has been done
in
> that regard.

Indeed, it is important, simply because this business operates 24/7. I
simply have to have some way to run 2 servers at once so that everything can
fall over if one fails. There seem to be a couple middle ware things out
there designed to do connection pooling, but so far I can't find out for
sure if they support failover.

> I guess the question now is: Is there something out there that does meet
your
> requirements? Postgres may be your best bet if you can't afford the costs
> associated with oracle. Oracle may be best if the replication/failover
really
> is a requirement. Or maybe you can find a cheaper commercial database that
> offers more of what you need than postgres.

My options seem to be Postgre with a commercial replication product,
Interbase, building my own system with the sleepycat lib and some sort of
distributed que lib, or buying oracle. I don't like the idea of running MS
Sql on a remote host. I don't know oracle, and frankly, we'd spend more on
oracle liscensing than the business assests in their entirety cost us. We
anticipate around $1.2 mil in revenue this year, as the previous ownership
had been in cost freeze for a year, letting the business decline. We will
turn an ok profit accounting for the cost of the purchase, but what profit
we do achive needs to go back into growing the business, not software
liscenses.

So, really I guess I'm just in the position of praying Postgre gets
repliation soon. I'd like to think my c skills were hot enough I could grab
the tarball, add failover and get it into the cvs, but I'm not that arrogant
or naieve :P. Same argument applies to crufting something out of the Berkley
DB libs as well.

jason watkins


Re: Moving my business to PostgreSQL

From
Helge Bahmann
Date:
About replication:

I know that some people are running Postgres on top of DRBD. This is
a Linux virtual block-device driver that allows replicating complete
disks across the network.

The idea is to have Postgres running on one (active) node with all disk
write operations transparently replicated to a second node (passive, no
Postgres running here). If the first node fails, the second node will
start up Postgres which will then detect an unclean shutdown of the
database and perform recovery.

I was going to try this out myself, but due to lack of time have not, but
perhaps this option is viable.

Regards
--
Helge Bahmann <bahmann@math.tu-freiberg.de>             /| \__
Network admin, systems programmer                      /_|____\
                                                     _/\ |   __)
$ ./configure                                        \\ \|__/__|
checking whether build environment is sane... yes     \\/___/ |
checking for AIX... no (we already did this)            |


Re: Moving my business to PostgreSQL

From
Brent Verner
Date:
[2002-01-28 02:52] Jason Watkins said:

| Indeed, it is important, simply because this business operates 24/7. I
| simply have to have some way to run 2 servers at once so that everything can
| fall over if one fails. There seem to be a couple middle ware things out
| there designed to do connection pooling, but so far I can't find out for
| sure if they support failover.

Take a look at this:
  http://www.mclx.com/products/cluster.php

I've not set one of these up, but if I was tasked with making a
failover-safe postgresql server, I'd certainly look toward having
failover at the OS/environment level instead of application level.

cheers.
  brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman

Re: Moving my business to PostgreSQL

From
"Marc G. Fournier"
Date:
On Mon, 28 Jan 2002, Jason Watkins wrote:

>
> > Sorry I can't answer too many of your questions, but one thing I am quite
> > sure of is that Postgres does not offer replication/failover. I inferred
> this
>
> I did some more hunting tonight. It seems postgresql.com recently released a
> beta of their eRServ, which looks to do exactly what I need in that reguard.

There is a complete version available to PgSQL.Com's clients ...
LibertyRMS is using it to provide redundancy for the, I believe, .info(?)
domain registry they are running ... the full version is not open source
at this time ...


Re: Moving my business to PostgreSQL

From
"Jason Watkins"
Date:
While none are ideal, it looks like I can meet my postgre failover needs by
a variety of methods. What about occasional (a few times daily) sync between
access and postgre? I could write my own scripts that spit tuples back and
forth over ssl, but there's gotta be something out there that can facilitate
this.

jason


Re: Moving my business to PostgreSQL

From
"Peter Darley"
Date:
Jason,
    You should be able to connect to PostgreSQL as an ODBC data source, which
will allow you to use the Access File->Get External Data->Import menu item
to select your PostgreSQL database, select the tables you want to import and
bring them into Access.  This can be automated using VB or (I think) Access
macros.
    Additionally, you could use File->Get External Data->Link Tables to set up
a link to your database, and use it to pass only certain records back and
forth to your PostgreSQL database, which can also be automated.
    Without knowing exactly what you're doing, the best solution would probably
not be to synchronize the data, but to just link the PostgreSQL tables into
Access as tables and use them natively in Access, which is what we do where
I work.  This allows you to use the Access reporting system, it's creepy
query builder, Access forms, etc. directly against your main database
without having to worry about synchronization.
    If you are going to link the tables into Access you might want to let this
inform your table design decisions; for example the Text type in PostgreSQL
maps to Memo in Access, which isn't linkable or sortable under some
conditions, meaning that if you need it linkable or sortable it's best to
use varchar(255).
    Just my 2c.
Thanks,
Peter Darley

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Jason Watkins
Sent: Monday, January 28, 2002 9:29 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Moving my business to PostgreSQL


While none are ideal, it looks like I can meet my postgre failover needs by
a variety of methods. What about occasional (a few times daily) sync between
access and postgre? I could write my own scripts that spit tuples back and
forth over ssl, but there's gotta be something out there that can facilitate
this.

jason


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: Moving my business to PostgreSQL

From
Andrew Sullivan
Date:
On Mon, Jan 28, 2002 at 09:56:11AM -0400, Marc G. Fournier wrote:

> There is a complete version available to PgSQL.Com's clients ...
> LibertyRMS is using it to provide redundancy for the, I believe, .info(?)
> domain registry they are running ... the full version is not open source
> at this time ...

Yep, that's the one.  We are about to move another small ccTLD onto
the same machines, and replicate it as well.  The PgSQL.com version
works very nicely, in my view.  There were a few initial "gotchas" to
it in operation, but once we got everything sorted out, it just
works.

Some of our internal programmers did some more development with it,
and produced a Java interface which uses the underlying code (there
was something about using Java that they wanted, but I forget what it
was).  Anyway, we find the system very reliable.

One thing we _have_ noticed, by the way, is that dropping the RI
triggers on the slave is extremely helpful for performance.  (I had
some discussion about this with Tom Lane, who was puzzled; but I
haven't had time to do any more tests to explain the phenomenon).  If
your replication is stopped for a long time, leaving the RI triggers
on the slave sometimes prevents replication from ever catching up on
a loaded system.  I still haven't figured out why, though.

If anyone wants any more information about rserv, please feel free to
contact me.

Naturally, opinions expressed here (or anywhere else for that matter)
are my own, and may not be those of my employer.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: Moving my business to PostgreSQL

From
Tom Lane
Date:
Andrew Sullivan <andrew@libertyrms.info> writes:
> If your replication is stopped for a long time, leaving the RI triggers
> on the slave sometimes prevents replication from ever catching up on
> a loaded system.  I still haven't figured out why, though.

I suspect the issue has to do with the infamous "triggered data change
violation" code in 7.1.  That has O(N^2) behavior if a single
transaction makes a lot of repeated updates to the same rows.  Which
I think is exactly the scenario for Andrew's system when the replicator
gets behind and is trying to propagate a lot of history in one batch.

The error check and the O(N^2) behavior are both gone in 7.2.

            regards, tom lane

Re: Moving my business to PostgreSQL

From
Andrew Sullivan
Date:
On Tue, Jan 29, 2002 at 12:12:55PM -0500, Tom Lane wrote:

> The error check and the O(N^2) behavior are both gone in 7.2.

Which reminds me of one other limitation of the eRserver: it doesn't
(yet) work with 7.2, because of a function call that, I gather, needs
another argument.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: Moving my business to PostgreSQL

From
"Jason Watkins"
Date:
Thanks Peter. I just found out today that we've landed in FedEx's
discontinueing support window for that system. Given it's current
performance, I don't think we'll be going back to the same vendor for the
bar code/ship management stuff, so I may be able to totally ditch access
(yay!).

Of course, until then, I may try exactly what you mentioned. I don't know
access well enough to have thought of just referncing external tables.

jason


Re: Moving my business to PostgreSQL

From
Francisco Reyes
Date:
On Mon, 28 Jan 2002, Jason Watkins wrote:

> Indeed, it is important, simply because this business operates 24/7. I
> simply have to have some way to run 2 servers at once so that everything can
> fall over if one fails. There seem to be a couple middle ware things out
> there designed to do connection pooling, but so far I can't find out for
> sure if they support failover.

One more issue regarding 24x7. Until 7.2 is out there are certain
operations , vacuum analyze , vacuum(?), which lock the entire database.
So be aware of this.

I believe 7.2 is around the corner so you may start develoment with 7.1.3
and go production when 7.2 is out.

Have you talked to the folks at pgsql.com about their replication? It may
be good to email/call them to see if what they have satisfies your needs.