Thread: Moving my business to PostgreSQL
This post is long, if you'd like to offer quick point advice, I specificly am looking for live replication/fallover for PostgreSQL, daily synchronization with Access, and data import/restructuring from the D3 nested relational database to PostgreSQL. I also am looking for more general sage like advice on the varried challanges I will likely face in the next 6 months. Some of these are not Postgre specific, but since I hope to make Postgre the core of this business, I hope you'll indulge me and offer any advice you might have. I'm the new manager of operations of a mail order retailer. I was hired by new ownership, which purchased the business assets from the previous owners in Nov. We went to the application vendor, and had them build us a system preconfigured with the same version of their software the previous owners used. For a variety of reasons, this system _barely_ works at the moment, and most likely will not support our future plans for the business. There is, quite simply, no budget to buy a more capable off the shelf system. So, where that brings me is trying to engineer a better system myself. I've worked as a CRM implimenter, and done a variety of web dev. Over the last year, I've become quite sold on postgre's capabilities. Unless your needs are esoteric, it's Oracle at a FreeBSD price point :). The basic process of our business is we handle orders through a call center. Orders are exported from their system by flat file and imported by our system several times daily. Payment is processed in batch by modem daily. We authorize before shipping, then capture deposits after shipping. Orders are sent to our shipping facility 1 day ahead of each shipping day. They're exported as a flat file from our system, moved over *cough* PC Anywhere to the shipping facility, and I execute some DTS scripts on the remote machine via PC Anywhere that verify and insert the flat file into their Access database. At the shipping facility, a bar code system from ADSI manages picking products, generating shipping invoices, etc. The day after each shipping data, a flat file is built on that system, and again by PC Anywhere, moved back to our system. The order id's and FedEx tracking #'s are used to capture payments, again batch processed by modem. All of these process steps are manually initiated. The existing system uses the D3 nested relational database. The general outline of what I'm planning is to move most everything to 2 hosted servers. A web application will be used by the call center agents to enter orders directly to our system. Payment processing will be done over ssl by a schedualed cron job. Postgre will of course, be the database. Some of the bigger problems I'm facing: 1.) Exporting 12 years worth of data from a nested relational database, restructuring that to a sql database to get it into Postgre. The current database is ~500meg. I'm not that familiar with D3, but a large portion of that seems to be history table data. I'd estimate the business has done 50,000 orders. There's information relating to 10 warehouses, and probibly a few thousand SKU's over the years. It's vital that the new system have this data, since forcasting and modeling are the life or death of a mail order business. 2.) Providing live fallover between the servers. As we internalize what used to be a web application hosted at the call center to our own system, we take on the responsibility of providing 24/7/365 uptime on that system. 3.) Integrating my new Posgre based system with the barcode/shipping system. This comes down to 2 way sync of a few tables between Postgre and Access. Smaller stuff: 1.) Reporting tool: right now I'm thinking of having my web application just generate simple spreadsheet files. Anything more snazzy will need to be accessable to business persons who uses mac and pc systems. We also use linux and irix desktops, but I'm fine with ignoring those for reporting. 2.) Monitoring and backup schemes for Postgre. I plan to host the servers at someplace like rackspace. I've found that co-location facilities don't offer the backup frequency I'd like, so I'd like to get updates to backups daily. These can just be generated on the remote servers and ftp'd by my job to our location where I can put them on tape and in the fire safe. I also need some way of monitoring the whole system, and quick recovery/reinstallation in the face of a failure. Stuff not related to Postgre (off list response please): 1.) Payment processing: we currently use Paymentech. They say they're working on an SSL gateway, but so far they only offer a one item at a time SOAP-like interface. Our products are requested for delivery on particular days, and often far in advance of when an authorization done on the day of ordering would expire. That means while it would be *very* nice to do AVS on order entry, we have to have a schedualed job that looks for orders soon shipping that need authorization, or orders just shipped that need to capture deposit. I've found getting real answers from payment processors a real PITA, so any anicdotal advice here I'd really welcome. 2.) Anything that speeds developing the web application. This will be done entirely by me, with about 20hrs/week to dedicate to it. I need to have the system tested, running and be 100% confident of it by Nov 1 to handle the xmas rush. I know, I know, but there simply isn't any other option available to me... I'm workin' with what I got. I imagine there are a few eCom systems compatable with Postgre that might be worth looking at. However, you can see these will need signifigant customization, and the ones I've looked at so far were to inflexible to handle what I need. 3.) At this point, I'm not ready to consider changing the shipping system. However, once the front half of the business is moved to Postgre, I would like to entertain thoughts about moving those pieces to open source as well. Any info on barcode hardware and FedEx software interfaces would be helpful when I come around to that. A lot of what I'm facing only bears partly on Postgre, but, assuming I find this can be done, I want to do a detailed write up (*cough* "case study" for you MBA's). I think Postgre could really benifit if people can read how a small company like us was able to move our entire business to open source infrastructure, with Postgre at the core. I'd love to do whatever I can to show people that this is a viable option to build your business is around. I appreciate any advice or anicdotes you might offer. If not PostgreSQL specific, or you feel they'd not be of interest to the list as a whole, please respond directly. jason watkins
On Sunday 27 January 2002 10:20 pm, you wrote: > This post is long, if you'd like to offer quick point advice, I specificly > am looking for live replication/fallover for PostgreSQL, daily > synchronization with Access, and data import/restructuring from the D3 > nested relational database to PostgreSQL. > Sorry I can't answer too many of your questions, but one thing I am quite sure of is that Postgres does not offer replication/failover. I inferred this information from the postgres developer TODO (http://developer.postgresql.org/todo.php), which still lists both features (and has for some time, I might add). I really think that postgres is a great database; it certainly has many advanced features, and is also developing quite rapidly. However, it sounds as though it doesn't have every capability you require. Replication/failover seem important to what you're doing, and unless I'm mistaken, they don't exist in postgres. You may be able to make use of what work has been done in that regard. I guess the question now is: Is there something out there that does meet your requirements? Postgres may be your best bet if you can't afford the costs associated with oracle. Oracle may be best if the replication/failover really is a requirement. Or maybe you can find a cheaper commercial database that offers more of what you need than postgres. Note that my experience is somewhat limited, so don't take what I say for fact. Regards, and good luck, Jeff
> Sorry I can't answer too many of your questions, but one thing I am quite > sure of is that Postgres does not offer replication/failover. I inferred this I did some more hunting tonight. It seems postgresql.com recently released a beta of their eRServ, which looks to do exactly what I need in that reguard. I know that it was high on the todo list, and then I heard something about Great Bridge closing its doors so the replication work was loosing it's sponsor. I also remember that someone out in academia had code that worked with an older postgre version and was working on porting it to the current codebase. I was hoping that there was new information I hadn't come across :(. > I really think that postgres is a great database; it certainly has many > advanced features, and is also developing quite rapidly. However, it sounds > as though it doesn't have every capability you require. Replication/failover > seem important to what you're doing, and unless I'm mistaken, they don't > exist in postgres. You may be able to make use of what work has been done in > that regard. Indeed, it is important, simply because this business operates 24/7. I simply have to have some way to run 2 servers at once so that everything can fall over if one fails. There seem to be a couple middle ware things out there designed to do connection pooling, but so far I can't find out for sure if they support failover. > I guess the question now is: Is there something out there that does meet your > requirements? Postgres may be your best bet if you can't afford the costs > associated with oracle. Oracle may be best if the replication/failover really > is a requirement. Or maybe you can find a cheaper commercial database that > offers more of what you need than postgres. My options seem to be Postgre with a commercial replication product, Interbase, building my own system with the sleepycat lib and some sort of distributed que lib, or buying oracle. I don't like the idea of running MS Sql on a remote host. I don't know oracle, and frankly, we'd spend more on oracle liscensing than the business assests in their entirety cost us. We anticipate around $1.2 mil in revenue this year, as the previous ownership had been in cost freeze for a year, letting the business decline. We will turn an ok profit accounting for the cost of the purchase, but what profit we do achive needs to go back into growing the business, not software liscenses. So, really I guess I'm just in the position of praying Postgre gets repliation soon. I'd like to think my c skills were hot enough I could grab the tarball, add failover and get it into the cvs, but I'm not that arrogant or naieve :P. Same argument applies to crufting something out of the Berkley DB libs as well. jason watkins
About replication: I know that some people are running Postgres on top of DRBD. This is a Linux virtual block-device driver that allows replicating complete disks across the network. The idea is to have Postgres running on one (active) node with all disk write operations transparently replicated to a second node (passive, no Postgres running here). If the first node fails, the second node will start up Postgres which will then detect an unclean shutdown of the database and perform recovery. I was going to try this out myself, but due to lack of time have not, but perhaps this option is viable. Regards -- Helge Bahmann <bahmann@math.tu-freiberg.de> /| \__ Network admin, systems programmer /_|____\ _/\ | __) $ ./configure \\ \|__/__| checking whether build environment is sane... yes \\/___/ | checking for AIX... no (we already did this) |
[2002-01-28 02:52] Jason Watkins said: | Indeed, it is important, simply because this business operates 24/7. I | simply have to have some way to run 2 servers at once so that everything can | fall over if one fails. There seem to be a couple middle ware things out | there designed to do connection pooling, but so far I can't find out for | sure if they support failover. Take a look at this: http://www.mclx.com/products/cluster.php I've not set one of these up, but if I was tasked with making a failover-safe postgresql server, I'd certainly look toward having failover at the OS/environment level instead of application level. cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
On Mon, 28 Jan 2002, Jason Watkins wrote: > > > Sorry I can't answer too many of your questions, but one thing I am quite > > sure of is that Postgres does not offer replication/failover. I inferred > this > > I did some more hunting tonight. It seems postgresql.com recently released a > beta of their eRServ, which looks to do exactly what I need in that reguard. There is a complete version available to PgSQL.Com's clients ... LibertyRMS is using it to provide redundancy for the, I believe, .info(?) domain registry they are running ... the full version is not open source at this time ...
While none are ideal, it looks like I can meet my postgre failover needs by a variety of methods. What about occasional (a few times daily) sync between access and postgre? I could write my own scripts that spit tuples back and forth over ssl, but there's gotta be something out there that can facilitate this. jason
Jason, You should be able to connect to PostgreSQL as an ODBC data source, which will allow you to use the Access File->Get External Data->Import menu item to select your PostgreSQL database, select the tables you want to import and bring them into Access. This can be automated using VB or (I think) Access macros. Additionally, you could use File->Get External Data->Link Tables to set up a link to your database, and use it to pass only certain records back and forth to your PostgreSQL database, which can also be automated. Without knowing exactly what you're doing, the best solution would probably not be to synchronize the data, but to just link the PostgreSQL tables into Access as tables and use them natively in Access, which is what we do where I work. This allows you to use the Access reporting system, it's creepy query builder, Access forms, etc. directly against your main database without having to worry about synchronization. If you are going to link the tables into Access you might want to let this inform your table design decisions; for example the Text type in PostgreSQL maps to Memo in Access, which isn't linkable or sortable under some conditions, meaning that if you need it linkable or sortable it's best to use varchar(255). Just my 2c. Thanks, Peter Darley -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Jason Watkins Sent: Monday, January 28, 2002 9:29 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Moving my business to PostgreSQL While none are ideal, it looks like I can meet my postgre failover needs by a variety of methods. What about occasional (a few times daily) sync between access and postgre? I could write my own scripts that spit tuples back and forth over ssl, but there's gotta be something out there that can facilitate this. jason ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
On Mon, Jan 28, 2002 at 09:56:11AM -0400, Marc G. Fournier wrote: > There is a complete version available to PgSQL.Com's clients ... > LibertyRMS is using it to provide redundancy for the, I believe, .info(?) > domain registry they are running ... the full version is not open source > at this time ... Yep, that's the one. We are about to move another small ccTLD onto the same machines, and replicate it as well. The PgSQL.com version works very nicely, in my view. There were a few initial "gotchas" to it in operation, but once we got everything sorted out, it just works. Some of our internal programmers did some more development with it, and produced a Java interface which uses the underlying code (there was something about using Java that they wanted, but I forget what it was). Anyway, we find the system very reliable. One thing we _have_ noticed, by the way, is that dropping the RI triggers on the slave is extremely helpful for performance. (I had some discussion about this with Tom Lane, who was puzzled; but I haven't had time to do any more tests to explain the phenomenon). If your replication is stopped for a long time, leaving the RI triggers on the slave sometimes prevents replication from ever catching up on a loaded system. I still haven't figured out why, though. If anyone wants any more information about rserv, please feel free to contact me. Naturally, opinions expressed here (or anywhere else for that matter) are my own, and may not be those of my employer. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Andrew Sullivan <andrew@libertyrms.info> writes: > If your replication is stopped for a long time, leaving the RI triggers > on the slave sometimes prevents replication from ever catching up on > a loaded system. I still haven't figured out why, though. I suspect the issue has to do with the infamous "triggered data change violation" code in 7.1. That has O(N^2) behavior if a single transaction makes a lot of repeated updates to the same rows. Which I think is exactly the scenario for Andrew's system when the replicator gets behind and is trying to propagate a lot of history in one batch. The error check and the O(N^2) behavior are both gone in 7.2. regards, tom lane
On Tue, Jan 29, 2002 at 12:12:55PM -0500, Tom Lane wrote: > The error check and the O(N^2) behavior are both gone in 7.2. Which reminds me of one other limitation of the eRserver: it doesn't (yet) work with 7.2, because of a function call that, I gather, needs another argument. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Thanks Peter. I just found out today that we've landed in FedEx's discontinueing support window for that system. Given it's current performance, I don't think we'll be going back to the same vendor for the bar code/ship management stuff, so I may be able to totally ditch access (yay!). Of course, until then, I may try exactly what you mentioned. I don't know access well enough to have thought of just referncing external tables. jason
On Mon, 28 Jan 2002, Jason Watkins wrote: > Indeed, it is important, simply because this business operates 24/7. I > simply have to have some way to run 2 servers at once so that everything can > fall over if one fails. There seem to be a couple middle ware things out > there designed to do connection pooling, but so far I can't find out for > sure if they support failover. One more issue regarding 24x7. Until 7.2 is out there are certain operations , vacuum analyze , vacuum(?), which lock the entire database. So be aware of this. I believe 7.2 is around the corner so you may start develoment with 7.1.3 and go production when 7.2 is out. Have you talked to the folks at pgsql.com about their replication? It may be good to email/call them to see if what they have satisfies your needs.