Thread: unorthodox use of PG for a customer
Hi Everyone:
I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy. I'm on the brink of telling him it's impractical and/or inadvisable. But maybe someone has a solution.
He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work.
At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants.
I can't think of anything better. Does anyone have any ideas?
Thanks in Advance !
> he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
So they're trading consistency concerns for ... not having a central db? Even if your shop requires a DBA for any DB, it sounds like a really bad deal.
Jim
On Fri, Aug 24, 2018 at 1:18 PM, David Gauthier <davegauthierpg@gmail.com> wrote:
Hi Everyone:I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy. I'm on the brink of telling him it's impractical and/or inadvisable. But maybe someone has a solution.He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work.At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants.I can't think of anything better. Does anyone have any ideas?Thanks in Advance !
On 08/24/2018 11:18 AM, David Gauthier wrote: > Hi Everyone: > > I'm going to throw this internal customer request out for ideas, even > though I think it's a bit crazy. I'm on the brink of telling him it's > impractical and/or inadvisable. But maybe someone has a solution. > > He's writing a script/program that runs on a workstation and needs to > write data to a DB. This process also sends work to a batch system on a > server farm external to the workstation that will create multiple, > parallel jobs/processes that also have to write to the DB as well. The > workstation may have many of these jobs running at the same time. And > there are 58 workstation which all have/use locally mounted disks for > this work. > > At first blush, this is easy. Just create a DB on a server and have all > those clients work with it. But he's also adamant about having the DB > on the same server(s) that ran the script AND on the locally mounted > disk. He said he doesn't want the overhead, dependencies and worries of > anything like an external DB with a DBA, etc... . He also wants this to > be fast. > My first thought was SQLite. Apparently, they now have some sort of > multiple, concurrent write ability. But there's no way those batch jobs > on remote machines are going to be able to get at the locally mounted > disk on the workstation. So I dismissed that idea. Then I thought about > having 58 PG installs, one per workstation, each serving all the jobs > pertaining to that workstation. That could work. But 58 DB instances > ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the > DB would be on the workstation which seems to be what he wants. > I can't think of anything better. Does anyone have any ideas? So are the 58 database(stores) on the workstation going to be working with data independent to each or is the data shared/synced between instances? > > Thanks in Advance ! > -- Adrian Klaver adrian.klaver@aklaver.com
Em 24/08/2018 15:18, David Gauthier escreveu: > Hi Everyone: > > I'm going to throw this internal customer request out for ideas, even > though I think it's a bit crazy. I'm on the brink of telling him it's > impractical and/or inadvisable. But maybe someone has a solution. > > He's writing a script/program that runs on a workstation and needs to > write data to a DB. This process also sends work to a batch system on > a server farm external to the workstation that will create multiple, > parallel jobs/processes that also have to write to the DB as well. The > workstation may have many of these jobs running at the same time. And > there are 58 workstation which all have/use locally mounted disks for > this work. > > At first blush, this is easy. Just create a DB on a server and have > all those clients work with it. But he's also adamant about having > the DB on the same server(s) that ran the script AND on the locally > mounted disk. He said he doesn't want the overhead, dependencies and > worries of anything like an external DB with a DBA, etc... . He also > wants this to be fast. > My first thought was SQLite. Apparently, they now have some sort of > multiple, concurrent write ability. But there's no way those batch > jobs on remote machines are going to be able to get at the locally > mounted disk on the workstation. So I dismissed that idea. Then I > thought about having 58 PG installs, one per workstation, each serving > all the jobs pertaining to that workstation. That could work. But 58 > DB instances ? If he didn't like the ideal of one DBA, 58 can't be > good. Still, the DB would be on the workstation which seems to be > what he wants. > I can't think of anything better. Does anyone have any ideas? > > Thanks in Advance ! > I'm no expert, but I've dozens of PostgreSQL databases running mostly without manual maintenance for years, just do the backups, and you are fine. In any way, if you need any kind of maintenance, you can program it in your app (even backup, restore and vacuum) - it is easy to throw administrative commands thru the available interfaces. And if the database get out of access, no matter if it is centralized or remote: you will need someone phisically there to fix it. AFAIK, you don't even PostgreSQL installer - you can run it embed if you wish. Just my2c, Edson
I tried to convince him of the wisdom of one central DB. I'll try again.
>>So are the 58 database(stores) on the workstation going to be working
with data independent to each or is the data shared/synced between instances?
No, 58 workstations, each with its own DB. There's a concept of a "workarea" (really a dir with a lot of stuff in it) where the script runs. He wants to tie all the runs for any one workarea together and is stuck on the idea that there should be a separate DB per workarea. I told him you could just stick all the data in the same table just with a "workarea" column to distinguish between the workareas. He likes the idea of a separate DB per workarea. He just doesn't gt it.
>>I'm no expert, but I've dozens of PostgreSQL databases running mostly
without manual maintenance for years.
Ya, I've sort of had the same experience with PG DBs. Like the everready bunny, they just keep on running. But these workstations are pretty volatile as they keep overloading them and crash them. Of course any DB running would die too and have to be restarted/recovered. So the place for the DB is really elsewhere, on an external server that wouldn't be subject to this volatility and crashing. I told him about transactions and how you could prevent partial writing of data sets.
So far, I'm not hearing of anything that looks like a solution given the constraints he's put on this. Don't get me wrong, he's a very smart and sharp software engineer. Very smart. But for some reason, he doesn't like the client/server DB model which would work so nicely here. I'm just trying to make sure I didn't miss some sort of solution, PG or not, that would work here.
Thanks for your interest and input everyone !
On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter <richter@simkorp.com.br> wrote:
Em 24/08/2018 15:18, David Gauthier escreveu:
> Hi Everyone:
>
> I'm going to throw this internal customer request out for ideas, even
> though I think it's a bit crazy. I'm on the brink of telling him it's
> impractical and/or inadvisable. But maybe someone has a solution.
>
> He's writing a script/program that runs on a workstation and needs to
> write data to a DB. This process also sends work to a batch system on
> a server farm external to the workstation that will create multiple,
> parallel jobs/processes that also have to write to the DB as well. The
> workstation may have many of these jobs running at the same time. And
> there are 58 workstation which all have/use locally mounted disks for
> this work.
>
> At first blush, this is easy. Just create a DB on a server and have
> all those clients work with it. But he's also adamant about having
> the DB on the same server(s) that ran the script AND on the locally
> mounted disk. He said he doesn't want the overhead, dependencies and
> worries of anything like an external DB with a DBA, etc... . He also
> wants this to be fast.
> My first thought was SQLite. Apparently, they now have some sort of
> multiple, concurrent write ability. But there's no way those batch
> jobs on remote machines are going to be able to get at the locally
> mounted disk on the workstation. So I dismissed that idea. Then I
> thought about having 58 PG installs, one per workstation, each serving
> all the jobs pertaining to that workstation. That could work. But 58
> DB instances ? If he didn't like the ideal of one DBA, 58 can't be
> good. Still, the DB would be on the workstation which seems to be
> what he wants.
> I can't think of anything better. Does anyone have any ideas?
>
> Thanks in Advance !
>
I'm no expert, but I've dozens of PostgreSQL databases running mostly
without manual maintenance for years, just do the backups, and you are fine.
In any way, if you need any kind of maintenance, you can program it in
your app (even backup, restore and vacuum) - it is easy to throw
administrative commands thru the available interfaces.
And if the database get out of access, no matter if it is centralized or
remote: you will need someone phisically there to fix it.
AFAIK, you don't even PostgreSQL installer - you can run it embed if you
wish.
Just my2c,
Edson
On 08/24/2018 12:07 PM, David Gauthier wrote: > I tried to convince him of the wisdom of one central DB. I'll try again. > > >>So are the 58 database(stores) on the workstation going to be working > with data independent to each or is the data shared/synced between > instances? > > No, 58 workstations, each with its own DB. There's a concept of a > "workarea" (really a dir with a lot of stuff in it) where the script > runs. He wants to tie all the runs for any one workarea together and is > stuck on the idea that there should be a separate DB per workarea. I > told him you could just stick all the data in the same table just with a > "workarea" column to distinguish between the workareas. He likes the > idea of a separate DB per workarea. He just doesn't gt it. Then a SQLite db per work area and as part of the batch rsync db file to remote server, do work on external server, rsync back to work area. > > >>I'm no expert, but I've dozens of PostgreSQL databases running mostly > without manual maintenance for years. > > Ya, I've sort of had the same experience with PG DBs. Like the > everready bunny, they just keep on running. But these workstations are > pretty volatile as they keep overloading them and crash them. Of course > any DB running would die too and have to be restarted/recovered. So the > place for the DB is really elsewhere, on an external server that > wouldn't be subject to this volatility and crashing. I told him about > transactions and how you could prevent partial writing of data sets. > > So far, I'm not hearing of anything that looks like a solution given the > constraints he's put on this. Don't get me wrong, he's a very smart and > sharp software engineer. Very smart. But for some reason, he doesn't > like the client/server DB model which would work so nicely here. I'm > just trying to make sure I didn't miss some sort of solution, PG or not, > that would work here. > > Thanks for your interest and input everyone ! > > > > > On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter > <richter@simkorp.com.br <mailto:richter@simkorp.com.br>> wrote: > > Em 24/08/2018 15:18, David Gauthier escreveu: > > Hi Everyone: > > > > I'm going to throw this internal customer request out for ideas, > even > > though I think it's a bit crazy. I'm on the brink of telling him > it's > > impractical and/or inadvisable. But maybe someone has a solution. > > > > He's writing a script/program that runs on a workstation and > needs to > > write data to a DB. This process also sends work to a batch > system on > > a server farm external to the workstation that will create multiple, > > parallel jobs/processes that also have to write to the DB as > well. The > > workstation may have many of these jobs running at the same > time. And > > there are 58 workstation which all have/use locally mounted disks > for > > this work. > > > > At first blush, this is easy. Just create a DB on a server and have > > all those clients work with it. But he's also adamant about having > > the DB on the same server(s) that ran the script AND on the locally > > mounted disk. He said he doesn't want the overhead, dependencies > and > > worries of anything like an external DB with a DBA, etc... . He also > > wants this to be fast. > > My first thought was SQLite. Apparently, they now have some sort of > > multiple, concurrent write ability. But there's no way those batch > > jobs on remote machines are going to be able to get at the locally > > mounted disk on the workstation. So I dismissed that idea. Then I > > thought about having 58 PG installs, one per workstation, each > serving > > all the jobs pertaining to that workstation. That could work. > But 58 > > DB instances ? If he didn't like the ideal of one DBA, 58 can't be > > good. Still, the DB would be on the workstation which seems to be > > what he wants. > > I can't think of anything better. Does anyone have any ideas? > > > > Thanks in Advance ! > > > > I'm no expert, but I've dozens of PostgreSQL databases running mostly > without manual maintenance for years, just do the backups, and you > are fine. > In any way, if you need any kind of maintenance, you can program it in > your app (even backup, restore and vacuum) - it is easy to throw > administrative commands thru the available interfaces. > And if the database get out of access, no matter if it is > centralized or > remote: you will need someone phisically there to fix it. > AFAIK, you don't even PostgreSQL installer - you can run it embed if > you > wish. > > Just my2c, > > Edson > > -- Adrian Klaver adrian.klaver@aklaver.com
On 08/24/2018 02:07 PM, David Gauthier wrote: > > ... He likes the idea of a > separate DB per workarea. He just doesn't gt it. Well there are advantages to that. > But for some reason, he doesn't like > the client/server DB model which would work so nicely here. I'm just > trying to make sure I didn't miss some sort of solution, PG or not, that > would work here. What you should tell him is that he can't have it both ways. Either it's multiple worker nodes concurrently writing to the same workstation -- in which case he needs "a server" on the workstation to accept incoming connections and all that overhead, with enough oomph to handle concurrency he expects. Or it's a beefed-up central server where everybody writes to, and every workstation can poll it and maintain its own state databases from there (i.e. worker nodes don't talk back to submit nodes). -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment
Em 24/08/2018 16:07, David Gauthier escreveu: > I tried to convince him of the wisdom of one central DB. I'll try again. > > >>So are the 58 database(stores) on the workstation going to be working > with data independent to each or is the data shared/synced between > instances? > > No, 58 workstations, each with its own DB. There's a concept of a > "workarea" (really a dir with a lot of stuff in it) where the script > runs. He wants to tie all the runs for any one workarea together and > is stuck on the idea that there should be a separate DB per workarea. > I told him you could just stick all the data in the same table just > with a "workarea" column to distinguish between the workareas. He > likes the idea of a separate DB per workarea. He just doesn't gt it. > > >>I'm no expert, but I've dozens of PostgreSQL databases running mostly > without manual maintenance for years. > > Ya, I've sort of had the same experience with PG DBs. Like the > everready bunny, they just keep on running. But these workstations > are pretty volatile as they keep overloading them and crash them. Of > course any DB running would die too and have to be > restarted/recovered. So the place for the DB is really elsewhere, on > an external server that wouldn't be subject to this volatility and > crashing. I told him about transactions and how you could prevent > partial writing of data sets. > > So far, I'm not hearing of anything that looks like a solution given > the constraints he's put on this. Don't get me wrong, he's a very > smart and sharp software engineer. Very smart. But for some reason, > he doesn't like the client/server DB model which would work so nicely > here. I'm just trying to make sure I didn't miss some sort of > solution, PG or not, that would work here. > > Thanks for your interest and input everyone ! > > > > > On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter > <richter@simkorp.com.br <mailto:richter@simkorp.com.br>> wrote: > > Em 24/08/2018 15:18, David Gauthier escreveu: > > Hi Everyone: > > > > I'm going to throw this internal customer request out for ideas, > even > > though I think it's a bit crazy. I'm on the brink of telling > him it's > > impractical and/or inadvisable. But maybe someone has a solution. > > > > He's writing a script/program that runs on a workstation and > needs to > > write data to a DB. This process also sends work to a batch > system on > > a server farm external to the workstation that will create > multiple, > > parallel jobs/processes that also have to write to the DB as > well. The > > workstation may have many of these jobs running at the same > time. And > > there are 58 workstation which all have/use locally mounted > disks for > > this work. > > > > At first blush, this is easy. Just create a DB on a server and > have > > all those clients work with it. But he's also adamant about having > > the DB on the same server(s) that ran the script AND on the locally > > mounted disk. He said he doesn't want the overhead, > dependencies and > > worries of anything like an external DB with a DBA, etc... . He > also > > wants this to be fast. > > My first thought was SQLite. Apparently, they now have some > sort of > > multiple, concurrent write ability. But there's no way those batch > > jobs on remote machines are going to be able to get at the locally > > mounted disk on the workstation. So I dismissed that idea. Then I > > thought about having 58 PG installs, one per workstation, each > serving > > all the jobs pertaining to that workstation. That could work. > But 58 > > DB instances ? If he didn't like the ideal of one DBA, 58 can't be > > good. Still, the DB would be on the workstation which seems to be > > what he wants. > > I can't think of anything better. Does anyone have any ideas? > > > > Thanks in Advance ! > > > > I'm no expert, but I've dozens of PostgreSQL databases running mostly > without manual maintenance for years, just do the backups, and you > are fine. > In any way, if you need any kind of maintenance, you can program > it in > your app (even backup, restore and vacuum) - it is easy to throw > administrative commands thru the available interfaces. > And if the database get out of access, no matter if it is > centralized or > remote: you will need someone phisically there to fix it. > AFAIK, you don't even PostgreSQL installer - you can run it embed > if you > wish. > > Just my2c, > > Edson > > I think its worth to add, PG or not PG, if the workstation crash, you will be in trouble with ANY database or file solution you choose - but with PG you can minimize the risk by fine tunning the flush to disk (either in PG and in OS). When correctly tuned, it works like a tank, and is hard to defeat. Regards, Edson.
Unless I am missing something, it sounds like you might be able to do this with an nfs export shared to each workstation. But I am not sure if I understood what you were describing either.
On Fri, Aug 24, 2018 at 2:22 PM Edson Carlos Ericksson Richter <richter@simkorp.com.br> wrote:
Em 24/08/2018 16:07, David Gauthier escreveu:
> I tried to convince him of the wisdom of one central DB. I'll try again.
>
> >>So are the 58 database(stores) on the workstation going to be working
> with data independent to each or is the data shared/synced between
> instances?
>
> No, 58 workstations, each with its own DB. There's a concept of a
> "workarea" (really a dir with a lot of stuff in it) where the script
> runs. He wants to tie all the runs for any one workarea together and
> is stuck on the idea that there should be a separate DB per workarea.
> I told him you could just stick all the data in the same table just
> with a "workarea" column to distinguish between the workareas. He
> likes the idea of a separate DB per workarea. He just doesn't gt it.
>
> >>I'm no expert, but I've dozens of PostgreSQL databases running mostly
> without manual maintenance for years.
>
> Ya, I've sort of had the same experience with PG DBs. Like the
> everready bunny, they just keep on running. But these workstations
> are pretty volatile as they keep overloading them and crash them. Of
> course any DB running would die too and have to be
> restarted/recovered. So the place for the DB is really elsewhere, on
> an external server that wouldn't be subject to this volatility and
> crashing. I told him about transactions and how you could prevent
> partial writing of data sets.
>
> So far, I'm not hearing of anything that looks like a solution given
> the constraints he's put on this. Don't get me wrong, he's a very
> smart and sharp software engineer. Very smart. But for some reason,
> he doesn't like the client/server DB model which would work so nicely
> here. I'm just trying to make sure I didn't miss some sort of
> solution, PG or not, that would work here.
>
> Thanks for your interest and input everyone !
>
>
>
>
> On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter
> <richter@simkorp.com.br <mailto:richter@simkorp.com.br>> wrote:
>
> Em 24/08/2018 15:18, David Gauthier escreveu:
> > Hi Everyone:
> >
> > I'm going to throw this internal customer request out for ideas,
> even
> > though I think it's a bit crazy. I'm on the brink of telling
> him it's
> > impractical and/or inadvisable. But maybe someone has a solution.
> >
> > He's writing a script/program that runs on a workstation and
> needs to
> > write data to a DB. This process also sends work to a batch
> system on
> > a server farm external to the workstation that will create
> multiple,
> > parallel jobs/processes that also have to write to the DB as
> well. The
> > workstation may have many of these jobs running at the same
> time. And
> > there are 58 workstation which all have/use locally mounted
> disks for
> > this work.
> >
> > At first blush, this is easy. Just create a DB on a server and
> have
> > all those clients work with it. But he's also adamant about having
> > the DB on the same server(s) that ran the script AND on the locally
> > mounted disk. He said he doesn't want the overhead,
> dependencies and
> > worries of anything like an external DB with a DBA, etc... . He
> also
> > wants this to be fast.
> > My first thought was SQLite. Apparently, they now have some
> sort of
> > multiple, concurrent write ability. But there's no way those batch
> > jobs on remote machines are going to be able to get at the locally
> > mounted disk on the workstation. So I dismissed that idea. Then I
> > thought about having 58 PG installs, one per workstation, each
> serving
> > all the jobs pertaining to that workstation. That could work.
> But 58
> > DB instances ? If he didn't like the ideal of one DBA, 58 can't be
> > good. Still, the DB would be on the workstation which seems to be
> > what he wants.
> > I can't think of anything better. Does anyone have any ideas?
> >
> > Thanks in Advance !
> >
>
> I'm no expert, but I've dozens of PostgreSQL databases running mostly
> without manual maintenance for years, just do the backups, and you
> are fine.
> In any way, if you need any kind of maintenance, you can program
> it in
> your app (even backup, restore and vacuum) - it is easy to throw
> administrative commands thru the available interfaces.
> And if the database get out of access, no matter if it is
> centralized or
> remote: you will need someone phisically there to fix it.
> AFAIK, you don't even PostgreSQL installer - you can run it embed
> if you
> wish.
>
> Just my2c,
>
> Edson
>
>
I think its worth to add, PG or not PG, if the workstation crash, you
will be in trouble with ANY database or file solution you choose - but
with PG you can minimize the risk by fine tunning the flush to disk
(either in PG and in OS). When correctly tuned, it works like a tank,
and is hard to defeat.
Regards,
Edson.
--
Andrew W. Kerber
'If at first you dont succeed, dont take up skydiving.'
'If at first you dont succeed, dont take up skydiving.'
On 08/24/2018 02:35 PM, Andrew Kerber wrote: > Unless I am missing something, it sounds like you might be able to do this > with an nfs export shared to each workstation. That's no different from polling the central database though, you're just using nfs server and files instead of a db server and queries. A compute cluster has to have a manager node that has all the state. The way to do what he wants is to query that manager from each workstation and keep the results in the local state database. Whether that is actually feasible/doable in his particular case is another question. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment
David Gauthier <davegauthierpg@gmail.com> writes: > Hi Everyone: > > I'm going to throw this internal customer request out for ideas, even > though I think it's a bit crazy. I'm on the brink of telling him it's > impractical and/or inadvisable. But maybe someone has a solution. > > He's writing a script/program that runs on a workstation and needs to write > data to a DB. This process also sends work to a batch system on a server > farm external to the workstation that will create multiple, parallel > jobs/processes that also have to write to the DB as well. The workstation > may have many of these jobs running at the same time. And there are 58 > workstation which all have/use locally mounted disks for this work. > > At first blush, this is easy. Just create a DB on a server and have all > those clients work with it. But he's also adamant about having the DB on > the same server(s) that ran the script AND on the locally mounted disk. He > said he doesn't want the overhead, dependencies and worries of anything > like an external DB with a DBA, etc... . He also wants this to be fast. > I would agree the customers proposed architecture has problems. It is likely to be fragile and difficult to maintain. At some point, there willl likely be a need to consolidate the data in all these separate databases, which could lead to other problems. It sounds like there is some previous experience which has caused problems for your customer and they are trying to avoid a repeat by defining the technical solution rather than asking for a solution. The first step is to spend more time talking to your customer and getting to understand the underlying reasons why he is proposing those technical/architecture constraints. I think once you have full details regarding his requirements and the risks as he perceives them, you will likely be able to come up with a much more workable solution which will both address his/her concerns and be an architecture which is solid and maintainable. There is a good chance all the reasons will not be purely technical. My wild guess is that previously, there has been problems with central IT services - probably bureaucracy and poor response times or communication or there was misalignment with regards to expectations. I often encounter this type of problem working with researchers who are very smart and informed in their local area of expertise, but less so when it comes to understanding the complexities, maintenance overheads and other activities associated with providing reliable services (backups, upgrades, tuning etc). The two areas (IT and customer) frequently speak different languages even when using the same words. It can be extremely challenging to get clear, consistent and agreed requirements. For example, what does 'fast' mean? The 'fast' requirement and the desire to have things run locally could indicate a concern regarding network performance. I find performance is often blamed on the network, but this is less often the case in modern networks. More often than not it is poor algorithm design, badly tuned databases or badly designed database schemas and CPU or memory limits. Pointing out the maintenance overheads and possible failure points in his proposed architecture may help. Being able to collect real metrics to demonstrate where bottlenecks and performance issues reside will also help going forward - good metrics and hard numbers can often circumvent circular arguments regarding problem causes. Also consider your internal business processes. I've seen too many good architecture solutions becoming perceived as failures because the other non-technical stuff failed - poor communications, failure to align technical maintenance with business requirements, unnecessary delays or poor responses to enquiries and questions and inability to adapt to changing business requirements in a timely manner. This is often the most frustrating part - you can be an excellent technical person able to define and implement really good technical solutions, but if the customer is unable to use the solution effectively, it will be seen as a technical failure. Tim -- Tim Cross
Hmmm I usually don't participate in forums and accidentally sent my reply to David Gauthier directly. This issue sounds interesting I'll give it another try and send the reply to the group.
I prefer these analytical "rules of 10" as starting point to guide alternatives
From my simple, engineering hat perspective I feel this question is difficult to answer without hard numbers which include:
1) the amount of data stored in the db including indexes
2) Expected peak number and type of db operations per second
3) Max latency the operations can consume
4) Number and type of CPUs on each workstation
5) Available RAM for each workstation
6) Data storage utilization (MBs, IOPs)
7) Data storage capacity and type (SSD/spinning/network)
8) Type of network ( number of networks and bw)9) Network bw utilization
10) during operation what is the swap space utilization
From these numbers the approach and architecture can be determined.
For example, if the available ram has sufficient capacity, file system cache and pgcache can be tuned to keep all the data in ram likely to significantly improve write and query performance. Another option is to have a dedicate SSD for data and/or swap space.
b/r
gary
On Fri, Aug 24, 2018 at 12:18 PM, David Gauthier <davegauthierpg@gmail.com> wrote:
Hi Everyone:I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy. I'm on the brink of telling him it's impractical and/or inadvisable. But maybe someone has a solution.He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work.At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants.I can't think of anything better. Does anyone have any ideas?Thanks in Advance !
On Fri, Aug 24, 2018 at 1:19 PM David Gauthier <davegauthierpg@gmail.com> wrote:
Hi Everyone:I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy. I'm on the brink of telling him it's impractical and/or inadvisable. But maybe someone has a solution.
Reading below, I think you're right about it being inadvisable. It seems to me that your user, like most, just has "needs" and doesn't really want to worry about "details" or much of anything else. We have programmers where I would with this attitude: We list what we want -- you supply it and maintain it. Because we only code (we don't even design much because we're AGILE! ), we don't worry about those little details (availability, reliability, security -- not our concern!). One thing missing from your post is the OS involved. Linux? Windows? Other? Multiple different ones? E.g. some users are Linux while others are Windows. {ouch}.
He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work.
First question. You say DB. Do you need an SQL based data base. Or do you just need a shared data store which is easy to use in a script. Have you considered any NOSQL type data stores such as CouchDB, MongoDB
Second question, which scripting language and what programming language? Examples might be: Windows Powershell, Linux BASH, Windows BASH, Python, Perl, surely not some AWK variant, R, Google GO, Rush, C/C++, and so on.
At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
So, if I understand, he wants a single shared data store for 58 clients. He also wants the disk holding the data to be "locally mounted" to every workstation. Depending on what "locally mounted" means to the user, I only way that I know of to do something like this is to have the actual disk / filesystem directly attached to a single server. All the workstations would need to use a communication protocol (IP?) to communicate their I/O to the "shared data" to this server to do the physical I/O. So we're talking some protocol like NFS or CIFS (Windows "share"). The easiest way that I know to do this sharing is to have a NAS box, such as NetApp, which really is a server+disk "in a box" and which implements these protocols in an easy to manage manner.
Or can each workstation have a separate, unrelated data store for its processing. The thought then would be some way to have the application write locally into its data store. Something else, perhaps "in line", would replicate the data to a central store on a server. That server would then distribute the changes back out to all 58 workstations and ??? servers so that each will have, eventually, an identical copy of the current data; but the stores might have inconsistencies until they synchronise.
My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants.I can't think of anything better. Does anyone have any ideas?
Yes, but the civil authorities would prosecute you if they found the body.
Thanks in Advance !
Between infinite and short there is a big difference. -- G.H. Gonnet
Maranatha! <><
John McKown
John McKown
Hi Dave, lots of nice inputs but let's try to summarize your user's needs.
On Fri, Aug 24, 2018 at 3:18 PM, David Gauthier <davegauthierpg@gmail.com> wrote:
[...]He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work.
Do the workstations work independently or do they need to share information? If they share information, does it need to be ACID or is "eventually consistent" enough? Also, what is the size of the database? A local database is viable as long as it does not overload the workstation but it you have many instances of the job per workstation and 58 of them, I would rely on an optimized central DB.
Are the jobs mainly INSERT or a mix of INSERT and SELECT/UPDATE?
Things to consider: amount of RAM required, CPU, disk I/O... The cost factor should be considered.
At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
If he insists on having a local copy of the database, suggest PG10 with logical replication - ideally the bidirectional replication. Once he realizes that every INSERT causes 57 replications and this will most likely kill the network, he may reconsider the requirement... Queries to a central DB is likely to cause less stress and will be easier to handle.
Now, if speed is critical, you may also consider an in-memory DB like Redis. There are schemes running in multi-master.
My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants.
A distributed database is likely to cause failures at some point if not handled properly - and it sounds like there won't be any DBA at all. I would be extremely cautious in this case.
On 8/27/2018 8:20 AM, John McKown wrote: > On Fri, Aug 24, 2018 at 1:19 PM David Gauthier <davegauthierpg@gmail.com > <mailto:davegauthierpg@gmail.com>> wrote: > I can't think of anything better. Does anyone have any ideas? > > Yes, but the civil authorities would prosecute you if they found the body. And that's why we have raised floors and locked doors in the server rooms. Dima