Thread: unorthodox use of PG for a customer

unorthodox use of PG for a customer

From
David Gauthier
Date:
Hi Everyone:

I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy.  I'm on the brink of telling him it's impractical and/or inadvisable.  But maybe someone has a solution.

He's writing a script/program that runs on a workstation and needs to write data to a DB.  This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time.  And there are 58 workstation which all have/use locally mounted disks for this work. 

At first blush, this is easy.  Just create a DB on a server and have all those clients work with it.  But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk.  He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
 
My first thought was SQLite.  Apparently, they now have some sort of multiple, concurrent write ability.  But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea.  Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation.  That could work.  But 58 DB instances ?  If he didn't like the ideal of one DBA, 58 can't be good.  Still, the DB would be on the workstation which seems to be what he wants.
 
I can't think of anything better.  Does anyone have any ideas?

Thanks in Advance !

Re: unorthodox use of PG for a customer

From
James Keener
Date:
> he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.

So they're trading consistency concerns for ... not having a central db? Even if your shop requires a DBA for any DB, it sounds like a really bad deal.

Jim


On Fri, Aug 24, 2018 at 1:18 PM, David Gauthier <davegauthierpg@gmail.com> wrote:
Hi Everyone:

I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy.  I'm on the brink of telling him it's impractical and/or inadvisable.  But maybe someone has a solution.

He's writing a script/program that runs on a workstation and needs to write data to a DB.  This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time.  And there are 58 workstation which all have/use locally mounted disks for this work. 

At first blush, this is easy.  Just create a DB on a server and have all those clients work with it.  But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk.  He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
 
My first thought was SQLite.  Apparently, they now have some sort of multiple, concurrent write ability.  But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea.  Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation.  That could work.  But 58 DB instances ?  If he didn't like the ideal of one DBA, 58 can't be good.  Still, the DB would be on the workstation which seems to be what he wants.
 
I can't think of anything better.  Does anyone have any ideas?

Thanks in Advance !


Re: unorthodox use of PG for a customer

From
Adrian Klaver
Date:
On 08/24/2018 11:18 AM, David Gauthier wrote:
> Hi Everyone:
> 
> I'm going to throw this internal customer request out for ideas, even 
> though I think it's a bit crazy.  I'm on the brink of telling him it's 
> impractical and/or inadvisable.  But maybe someone has a solution.
> 
> He's writing a script/program that runs on a workstation and needs to 
> write data to a DB.  This process also sends work to a batch system on a 
> server farm external to the workstation that will create multiple, 
> parallel jobs/processes that also have to write to the DB as well. The 
> workstation may have many of these jobs running at the same time.  And 
> there are 58 workstation which all have/use locally mounted disks for 
> this work.
> 
> At first blush, this is easy.  Just create a DB on a server and have all 
> those clients work with it.  But he's also adamant about having the DB 
> on the same server(s) that ran the script AND on the locally mounted 
> disk.  He said he doesn't want the overhead, dependencies and worries of 
> anything like an external DB with a DBA, etc... . He also wants this to 
> be fast.
> My first thought was SQLite.  Apparently, they now have some sort of 
> multiple, concurrent write ability.  But there's no way those batch jobs 
> on remote machines are going to be able to get at the locally mounted 
> disk on the workstation. So I dismissed that idea.  Then I thought about 
> having 58 PG installs, one per workstation, each serving all the jobs 
> pertaining to that workstation.  That could work.  But 58 DB instances 
> ?  If he didn't like the ideal of one DBA, 58 can't be good.  Still, the 
> DB would be on the workstation which seems to be what he wants.
> I can't think of anything better.  Does anyone have any ideas?

So are the 58 database(stores) on the workstation going to be working 
with data independent to each or is the data shared/synced between 
instances?


> 
> Thanks in Advance !
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: unorthodox use of PG for a customer

From
Edson Carlos Ericksson Richter
Date:
Em 24/08/2018 15:18, David Gauthier escreveu:
> Hi Everyone:
>
> I'm going to throw this internal customer request out for ideas, even 
> though I think it's a bit crazy.  I'm on the brink of telling him it's 
> impractical and/or inadvisable.  But maybe someone has a solution.
>
> He's writing a script/program that runs on a workstation and needs to 
> write data to a DB.  This process also sends work to a batch system on 
> a server farm external to the workstation that will create multiple, 
> parallel jobs/processes that also have to write to the DB as well. The 
> workstation may have many of these jobs running at the same time.  And 
> there are 58 workstation which all have/use locally mounted disks for 
> this work.
>
> At first blush, this is easy.  Just create a DB on a server and have 
> all those clients work with it.  But he's also adamant about having 
> the DB on the same server(s) that ran the script AND on the locally 
> mounted disk.  He said he doesn't want the overhead, dependencies and 
> worries of anything like an external DB with a DBA, etc... . He also 
> wants this to be fast.
> My first thought was SQLite.  Apparently, they now have some sort of 
> multiple, concurrent write ability.  But there's no way those batch 
> jobs on remote machines are going to be able to get at the locally 
> mounted disk on the workstation. So I dismissed that idea. Then I 
> thought about having 58 PG installs, one per workstation, each serving 
> all the jobs pertaining to that workstation.  That could work.  But 58 
> DB instances ?  If he didn't like the ideal of one DBA, 58 can't be 
> good.  Still, the DB would be on the workstation which seems to be 
> what he wants.
> I can't think of anything better.  Does anyone have any ideas?
>
> Thanks in Advance !
>

I'm no expert, but I've dozens of PostgreSQL databases running mostly 
without manual maintenance for years, just do the backups, and you are fine.
In any way, if you need any kind of maintenance, you can program it in 
your app (even backup, restore and vacuum) - it is easy to throw 
administrative commands thru the available interfaces.
And if the database get out of access, no matter if it is centralized or 
remote: you will need someone phisically there to fix it.
AFAIK, you don't even PostgreSQL installer - you can run it embed if you 
wish.

Just my2c,

Edson



Re: unorthodox use of PG for a customer

From
David Gauthier
Date:
I tried to convince him of the wisdom of one central DB.  I'll try again.

>>So are the 58 database(stores) on the workstation going to be working 
with data independent to each or is the data shared/synced between 
instances?

No, 58 workstations, each with its own DB.  There's a concept of a "workarea" (really a dir with a lot of stuff in it) where the script runs.  He wants to tie all the runs for any one workarea together and is stuck on the idea that there should be a separate DB per workarea.  I told him you could just stick all the data in the same table just with a "workarea" column to distinguish between the workareas.  He likes the idea of a separate DB per workarea.  He just doesn't gt it.

 
>>I'm no expert, but I've dozens of PostgreSQL databases running mostly 
without manual maintenance for years.

Ya, I've sort of had the same experience with PG DBs.  Like the everready bunny, they just keep on running.  But these workstations are pretty volatile as they keep overloading them and crash them.  Of course any DB running would die too and have to be restarted/recovered.  So the place for  the DB is really elsewhere, on an external server that wouldn't be subject to this volatility and crashing.  I told him about transactions and how you could prevent partial writing of data sets.  

So far, I'm not hearing of anything that looks like a solution given the constraints he's put on this.  Don't get me wrong, he's a very smart and sharp software engineer.  Very smart.  But for some reason, he doesn't like the client/server DB model which would work so nicely here.  I'm just trying to make sure I didn't miss some sort of solution, PG or not, that would work here.  

Thanks for your interest and input everyone !




On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter <richter@simkorp.com.br> wrote:
Em 24/08/2018 15:18, David Gauthier escreveu:
> Hi Everyone:
>
> I'm going to throw this internal customer request out for ideas, even
> though I think it's a bit crazy.  I'm on the brink of telling him it's
> impractical and/or inadvisable.  But maybe someone has a solution.
>
> He's writing a script/program that runs on a workstation and needs to
> write data to a DB.  This process also sends work to a batch system on
> a server farm external to the workstation that will create multiple,
> parallel jobs/processes that also have to write to the DB as well. The
> workstation may have many of these jobs running at the same time.  And
> there are 58 workstation which all have/use locally mounted disks for
> this work.
>
> At first blush, this is easy.  Just create a DB on a server and have
> all those clients work with it.  But he's also adamant about having
> the DB on the same server(s) that ran the script AND on the locally
> mounted disk.  He said he doesn't want the overhead, dependencies and
> worries of anything like an external DB with a DBA, etc... . He also
> wants this to be fast.
> My first thought was SQLite.  Apparently, they now have some sort of
> multiple, concurrent write ability.  But there's no way those batch
> jobs on remote machines are going to be able to get at the locally
> mounted disk on the workstation. So I dismissed that idea. Then I
> thought about having 58 PG installs, one per workstation, each serving
> all the jobs pertaining to that workstation.  That could work.  But 58
> DB instances ?  If he didn't like the ideal of one DBA, 58 can't be
> good.  Still, the DB would be on the workstation which seems to be
> what he wants.
> I can't think of anything better.  Does anyone have any ideas?
>
> Thanks in Advance !
>

I'm no expert, but I've dozens of PostgreSQL databases running mostly
without manual maintenance for years, just do the backups, and you are fine.
In any way, if you need any kind of maintenance, you can program it in
your app (even backup, restore and vacuum) - it is easy to throw
administrative commands thru the available interfaces.
And if the database get out of access, no matter if it is centralized or
remote: you will need someone phisically there to fix it.
AFAIK, you don't even PostgreSQL installer - you can run it embed if you
wish.

Just my2c,

Edson


Re: unorthodox use of PG for a customer

From
Adrian Klaver
Date:
On 08/24/2018 12:07 PM, David Gauthier wrote:
> I tried to convince him of the wisdom of one central DB.  I'll try again.
> 
>  >>So are the 58 database(stores) on the workstation going to be working
> with data independent to each or is the data shared/synced between
> instances?
> 
> No, 58 workstations, each with its own DB.  There's a concept of a 
> "workarea" (really a dir with a lot of stuff in it) where the script 
> runs.  He wants to tie all the runs for any one workarea together and is 
> stuck on the idea that there should be a separate DB per workarea.  I 
> told him you could just stick all the data in the same table just with a 
> "workarea" column to distinguish between the workareas.  He likes the 
> idea of a separate DB per workarea.  He just doesn't gt it.

Then a SQLite db per work area and as part of the batch rsync db file to 
remote server, do work on external server, rsync back to work area.

> 
>  >>I'm no expert, but I've dozens of PostgreSQL databases running mostly
> without manual maintenance for years.
> 
> Ya, I've sort of had the same experience with PG DBs.  Like the 
> everready bunny, they just keep on running.  But these workstations are 
> pretty volatile as they keep overloading them and crash them.  Of course 
> any DB running would die too and have to be restarted/recovered.  So the 
> place for  the DB is really elsewhere, on an external server that 
> wouldn't be subject to this volatility and crashing.  I told him about 
> transactions and how you could prevent partial writing of data sets.
> 
> So far, I'm not hearing of anything that looks like a solution given the 
> constraints he's put on this.  Don't get me wrong, he's a very smart and 
> sharp software engineer.  Very smart.  But for some reason, he doesn't 
> like the client/server DB model which would work so nicely here.  I'm 
> just trying to make sure I didn't miss some sort of solution, PG or not, 
> that would work here.
> 
> Thanks for your interest and input everyone !
> 
> 
> 
> 
> On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter 
> <richter@simkorp.com.br <mailto:richter@simkorp.com.br>> wrote:
> 
>     Em 24/08/2018 15:18, David Gauthier escreveu:
>      > Hi Everyone:
>      >
>      > I'm going to throw this internal customer request out for ideas,
>     even
>      > though I think it's a bit crazy.  I'm on the brink of telling him
>     it's
>      > impractical and/or inadvisable.  But maybe someone has a solution.
>      >
>      > He's writing a script/program that runs on a workstation and
>     needs to
>      > write data to a DB.  This process also sends work to a batch
>     system on
>      > a server farm external to the workstation that will create multiple,
>      > parallel jobs/processes that also have to write to the DB as
>     well. The
>      > workstation may have many of these jobs running at the same
>     time.  And
>      > there are 58 workstation which all have/use locally mounted disks
>     for
>      > this work.
>      >
>      > At first blush, this is easy.  Just create a DB on a server and have
>      > all those clients work with it.  But he's also adamant about having
>      > the DB on the same server(s) that ran the script AND on the locally
>      > mounted disk.  He said he doesn't want the overhead, dependencies
>     and
>      > worries of anything like an external DB with a DBA, etc... . He also
>      > wants this to be fast.
>      > My first thought was SQLite.  Apparently, they now have some sort of
>      > multiple, concurrent write ability.  But there's no way those batch
>      > jobs on remote machines are going to be able to get at the locally
>      > mounted disk on the workstation. So I dismissed that idea. Then I
>      > thought about having 58 PG installs, one per workstation, each
>     serving
>      > all the jobs pertaining to that workstation.  That could work. 
>     But 58
>      > DB instances ?  If he didn't like the ideal of one DBA, 58 can't be
>      > good.  Still, the DB would be on the workstation which seems to be
>      > what he wants.
>      > I can't think of anything better.  Does anyone have any ideas?
>      >
>      > Thanks in Advance !
>      >
> 
>     I'm no expert, but I've dozens of PostgreSQL databases running mostly
>     without manual maintenance for years, just do the backups, and you
>     are fine.
>     In any way, if you need any kind of maintenance, you can program it in
>     your app (even backup, restore and vacuum) - it is easy to throw
>     administrative commands thru the available interfaces.
>     And if the database get out of access, no matter if it is
>     centralized or
>     remote: you will need someone phisically there to fix it.
>     AFAIK, you don't even PostgreSQL installer - you can run it embed if
>     you
>     wish.
> 
>     Just my2c,
> 
>     Edson
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: unorthodox use of PG for a customer

From
Dimitri Maziuk
Date:
On 08/24/2018 02:07 PM, David Gauthier wrote:
>
> ...  He likes the idea of a
> separate DB per workarea.  He just doesn't gt it.

Well there are advantages to that.

> But for some reason, he doesn't like
> the client/server DB model which would work so nicely here.  I'm just
> trying to make sure I didn't miss some sort of solution, PG or not, that
> would work here.

What you should tell him is that he can't have it both ways. Either it's
multiple worker nodes concurrently writing to the same workstation -- in
which case he needs "a server" on the workstation to accept incoming
connections and all that overhead, with enough oomph to handle
concurrency he expects. Or it's a beefed-up central server where
everybody writes to, and every workstation can poll it and maintain its
own state databases from there (i.e. worker nodes don't talk back to
submit nodes).

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


Attachment

Re: unorthodox use of PG for a customer

From
Edson Carlos Ericksson Richter
Date:
Em 24/08/2018 16:07, David Gauthier escreveu:
> I tried to convince him of the wisdom of one central DB.  I'll try again.
>
> >>So are the 58 database(stores) on the workstation going to be working
> with data independent to each or is the data shared/synced between
> instances?
>
> No, 58 workstations, each with its own DB.  There's a concept of a 
> "workarea" (really a dir with a lot of stuff in it) where the script 
> runs.  He wants to tie all the runs for any one workarea together and 
> is stuck on the idea that there should be a separate DB per workarea.  
> I told him you could just stick all the data in the same table just 
> with a "workarea" column to distinguish between the workareas.  He 
> likes the idea of a separate DB per workarea.  He just doesn't gt it.
>
> >>I'm no expert, but I've dozens of PostgreSQL databases running mostly
> without manual maintenance for years.
>
> Ya, I've sort of had the same experience with PG DBs.  Like the 
> everready bunny, they just keep on running.  But these workstations 
> are pretty volatile as they keep overloading them and crash them.  Of 
> course any DB running would die too and have to be 
> restarted/recovered.  So the place for  the DB is really elsewhere, on 
> an external server that wouldn't be subject to this volatility and 
> crashing.  I told him about transactions and how you could prevent 
> partial writing of data sets.
>
> So far, I'm not hearing of anything that looks like a solution given 
> the constraints he's put on this.  Don't get me wrong, he's a very 
> smart and sharp software engineer.  Very smart.  But for some reason, 
> he doesn't like the client/server DB model which would work so nicely 
> here.  I'm just trying to make sure I didn't miss some sort of 
> solution, PG or not, that would work here.
>
> Thanks for your interest and input everyone !
>
>
>
>
> On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter 
> <richter@simkorp.com.br <mailto:richter@simkorp.com.br>> wrote:
>
>     Em 24/08/2018 15:18, David Gauthier escreveu:
>     > Hi Everyone:
>     >
>     > I'm going to throw this internal customer request out for ideas,
>     even
>     > though I think it's a bit crazy.  I'm on the brink of telling
>     him it's
>     > impractical and/or inadvisable.  But maybe someone has a solution.
>     >
>     > He's writing a script/program that runs on a workstation and
>     needs to
>     > write data to a DB.  This process also sends work to a batch
>     system on
>     > a server farm external to the workstation that will create
>     multiple,
>     > parallel jobs/processes that also have to write to the DB as
>     well. The
>     > workstation may have many of these jobs running at the same
>     time.  And
>     > there are 58 workstation which all have/use locally mounted
>     disks for
>     > this work.
>     >
>     > At first blush, this is easy.  Just create a DB on a server and
>     have
>     > all those clients work with it.  But he's also adamant about having
>     > the DB on the same server(s) that ran the script AND on the locally
>     > mounted disk.  He said he doesn't want the overhead,
>     dependencies and
>     > worries of anything like an external DB with a DBA, etc... . He
>     also
>     > wants this to be fast.
>     > My first thought was SQLite.  Apparently, they now have some
>     sort of
>     > multiple, concurrent write ability.  But there's no way those batch
>     > jobs on remote machines are going to be able to get at the locally
>     > mounted disk on the workstation. So I dismissed that idea. Then I
>     > thought about having 58 PG installs, one per workstation, each
>     serving
>     > all the jobs pertaining to that workstation.  That could work. 
>     But 58
>     > DB instances ?  If he didn't like the ideal of one DBA, 58 can't be
>     > good.  Still, the DB would be on the workstation which seems to be
>     > what he wants.
>     > I can't think of anything better.  Does anyone have any ideas?
>     >
>     > Thanks in Advance !
>     >
>
>     I'm no expert, but I've dozens of PostgreSQL databases running mostly
>     without manual maintenance for years, just do the backups, and you
>     are fine.
>     In any way, if you need any kind of maintenance, you can program
>     it in
>     your app (even backup, restore and vacuum) - it is easy to throw
>     administrative commands thru the available interfaces.
>     And if the database get out of access, no matter if it is
>     centralized or
>     remote: you will need someone phisically there to fix it.
>     AFAIK, you don't even PostgreSQL installer - you can run it embed
>     if you
>     wish.
>
>     Just my2c,
>
>     Edson
>
>
I think its worth to add, PG or not PG, if the workstation crash, you 
will be in trouble with ANY database or file solution you choose - but 
with PG you can minimize the risk by fine tunning the flush to disk 
(either in PG and in OS). When correctly tuned, it works like a tank, 
and is hard to defeat.

Regards,

Edson.


Re: unorthodox use of PG for a customer

From
Andrew Kerber
Date:
Unless I am missing something, it sounds like you might be able to do this with an nfs export shared to each workstation. But I am not sure if I understood what you were describing either.

On Fri, Aug 24, 2018 at 2:22 PM Edson Carlos Ericksson Richter <richter@simkorp.com.br> wrote:
Em 24/08/2018 16:07, David Gauthier escreveu:
> I tried to convince him of the wisdom of one central DB.  I'll try again.
>
> >>So are the 58 database(stores) on the workstation going to be working
> with data independent to each or is the data shared/synced between
> instances?
>
> No, 58 workstations, each with its own DB.  There's a concept of a
> "workarea" (really a dir with a lot of stuff in it) where the script
> runs.  He wants to tie all the runs for any one workarea together and
> is stuck on the idea that there should be a separate DB per workarea. 
> I told him you could just stick all the data in the same table just
> with a "workarea" column to distinguish between the workareas.  He
> likes the idea of a separate DB per workarea.  He just doesn't gt it.
>
> >>I'm no expert, but I've dozens of PostgreSQL databases running mostly
> without manual maintenance for years.
>
> Ya, I've sort of had the same experience with PG DBs.  Like the
> everready bunny, they just keep on running.  But these workstations
> are pretty volatile as they keep overloading them and crash them.  Of
> course any DB running would die too and have to be
> restarted/recovered.  So the place for  the DB is really elsewhere, on
> an external server that wouldn't be subject to this volatility and
> crashing.  I told him about transactions and how you could prevent
> partial writing of data sets.
>
> So far, I'm not hearing of anything that looks like a solution given
> the constraints he's put on this.  Don't get me wrong, he's a very
> smart and sharp software engineer.  Very smart.  But for some reason,
> he doesn't like the client/server DB model which would work so nicely
> here.  I'm just trying to make sure I didn't miss some sort of
> solution, PG or not, that would work here.
>
> Thanks for your interest and input everyone !
>
>
>
>
> On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter
> <richter@simkorp.com.br <mailto:richter@simkorp.com.br>> wrote:
>
>     Em 24/08/2018 15:18, David Gauthier escreveu:
>     > Hi Everyone:
>     >
>     > I'm going to throw this internal customer request out for ideas,
>     even
>     > though I think it's a bit crazy.  I'm on the brink of telling
>     him it's
>     > impractical and/or inadvisable.  But maybe someone has a solution.
>     >
>     > He's writing a script/program that runs on a workstation and
>     needs to
>     > write data to a DB.  This process also sends work to a batch
>     system on
>     > a server farm external to the workstation that will create
>     multiple,
>     > parallel jobs/processes that also have to write to the DB as
>     well. The
>     > workstation may have many of these jobs running at the same
>     time.  And
>     > there are 58 workstation which all have/use locally mounted
>     disks for
>     > this work.
>     >
>     > At first blush, this is easy.  Just create a DB on a server and
>     have
>     > all those clients work with it.  But he's also adamant about having
>     > the DB on the same server(s) that ran the script AND on the locally
>     > mounted disk.  He said he doesn't want the overhead,
>     dependencies and
>     > worries of anything like an external DB with a DBA, etc... . He
>     also
>     > wants this to be fast.
>     > My first thought was SQLite.  Apparently, they now have some
>     sort of
>     > multiple, concurrent write ability.  But there's no way those batch
>     > jobs on remote machines are going to be able to get at the locally
>     > mounted disk on the workstation. So I dismissed that idea. Then I
>     > thought about having 58 PG installs, one per workstation, each
>     serving
>     > all the jobs pertaining to that workstation.  That could work. 
>     But 58
>     > DB instances ?  If he didn't like the ideal of one DBA, 58 can't be
>     > good.  Still, the DB would be on the workstation which seems to be
>     > what he wants.
>     > I can't think of anything better.  Does anyone have any ideas?
>     >
>     > Thanks in Advance !
>     >
>
>     I'm no expert, but I've dozens of PostgreSQL databases running mostly
>     without manual maintenance for years, just do the backups, and you
>     are fine.
>     In any way, if you need any kind of maintenance, you can program
>     it in
>     your app (even backup, restore and vacuum) - it is easy to throw
>     administrative commands thru the available interfaces.
>     And if the database get out of access, no matter if it is
>     centralized or
>     remote: you will need someone phisically there to fix it.
>     AFAIK, you don't even PostgreSQL installer - you can run it embed
>     if you
>     wish.
>
>     Just my2c,
>
>     Edson
>
>
I think its worth to add, PG or not PG, if the workstation crash, you
will be in trouble with ANY database or file solution you choose - but
with PG you can minimize the risk by fine tunning the flush to disk
(either in PG and in OS). When correctly tuned, it works like a tank,
and is hard to defeat.

Regards,

Edson.



--
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'

Re: unorthodox use of PG for a customer

From
Dimitri Maziuk
Date:
On 08/24/2018 02:35 PM, Andrew Kerber wrote:
> Unless I am missing something, it sounds like you might be able to do this
> with an nfs export shared to each workstation.

That's no different from polling the central database though, you're
just using nfs server and files instead of a db server and queries.

A compute cluster has to have a manager node that has all the state. The
way to do what he wants is to query that manager from each workstation
and keep the results in the local state database. Whether that is
actually feasible/doable in his particular case is another question.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


Attachment

Re: unorthodox use of PG for a customer

From
Tim Cross
Date:
David Gauthier <davegauthierpg@gmail.com> writes:

> Hi Everyone:
>
> I'm going to throw this internal customer request out for ideas, even
> though I think it's a bit crazy.  I'm on the brink of telling him it's
> impractical and/or inadvisable.  But maybe someone has a solution.
>
> He's writing a script/program that runs on a workstation and needs to write
> data to a DB.  This process also sends work to a batch system on a server
> farm external to the workstation that will create multiple, parallel
> jobs/processes that also have to write to the DB as well. The workstation
> may have many of these jobs running at the same time.  And there are 58
> workstation which all have/use locally mounted disks for this work.
>
> At first blush, this is easy.  Just create a DB on a server and have all
> those clients work with it.  But he's also adamant about having the DB on
> the same server(s) that ran the script AND on the locally mounted disk.  He
> said he doesn't want the overhead, dependencies and worries of anything
> like an external DB with a DBA, etc... . He also wants this to be fast.
>

I would agree the customers proposed architecture has problems. It is
likely to be fragile and difficult to maintain. At some point, there
willl likely be a need to consolidate the data in all these separate
databases, which could lead to other problems.  

It sounds like there is some previous experience which has caused
problems for your customer and they are trying to avoid a repeat by
defining the technical solution rather than asking for a solution. The
first step is to spend more time talking to your customer and getting to
understand the underlying reasons why he is proposing those
technical/architecture constraints. I think once you have full details
regarding his requirements and the risks as he perceives them, you will
likely be able to come up with a much more workable solution which will
both address his/her concerns and be an architecture which is solid and
maintainable. There is a good chance all the reasons will not be purely
technical. 

My wild guess is that previously, there has been problems with central IT
services - probably bureaucracy and poor response times or
communication or there was misalignment with regards to expectations. I
often encounter this type of problem working with researchers who are
very smart and informed in their local area of expertise, but less so
when it comes to understanding the complexities, maintenance overheads
and other activities associated with providing reliable services
(backups, upgrades, tuning etc). The two areas (IT and customer)
frequently speak different languages even when using the same words. It
can be extremely challenging to get clear, consistent and agreed
requirements. For example, what does 'fast' mean?

The 'fast' requirement and the desire to have things run locally could
indicate a concern regarding network performance. I find performance is
often blamed on the network, but this is less often the case in modern
networks. More often than not it is poor algorithm design, badly tuned
databases or badly designed database schemas and CPU or memory
limits.

Pointing out the maintenance overheads and possible failure points in
his proposed architecture may help. Being able to collect real metrics
to demonstrate where bottlenecks and performance issues reside will also
help going forward - good metrics and hard numbers can often circumvent
circular arguments regarding problem causes.

Also consider your internal business processes. I've seen too many good
architecture solutions becoming perceived as failures because the other
non-technical stuff failed - poor communications, failure to align
technical maintenance with business requirements, unnecessary delays or
poor responses to enquiries and questions and inability to adapt to
changing business requirements in a timely manner. This is often the
most frustrating part - you can be an excellent technical person able to
define and implement really good technical solutions, but if the
customer is unable to use the solution effectively, it will be seen as a
technical failure.

Tim
--
Tim Cross


Re: unorthodox use of PG for a customer

From
Gary M
Date:
Hmmm  I usually don't participate in forums and accidentally sent my reply to  David Gauthier directly. This issue sounds interesting I'll give it another try and send the reply to the group.
 
From my simple, engineering hat perspective I feel this question is difficult to answer without hard numbers which include:

1) the amount of data stored in the db including indexes
2) Expected peak number and type of db operations per second
3) Max latency the operations can consume
4) Number and type of CPUs on each workstation
5) Available RAM for each workstation
6) Data storage utilization (MBs, IOPs)
7) Data storage capacity and type (SSD/spinning/network)
8) Type of network ( number of networks and bw)
9)  Network bw utilization 
10) during operation what is the swap space utilization

From these numbers the approach and architecture can be determined.

For example, if the available ram has sufficient capacity, file system cache and pgcache can be tuned to keep all the data in ram likely to significantly improve write and query performance.  Another option is to have a dedicate SSD for data and/or swap space. 

I prefer these analytical "rules of 10" as  starting point to guide alternatives

b/r
gary 


On Fri, Aug 24, 2018 at 12:18 PM, David Gauthier <davegauthierpg@gmail.com> wrote:
Hi Everyone:

I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy.  I'm on the brink of telling him it's impractical and/or inadvisable.  But maybe someone has a solution.

He's writing a script/program that runs on a workstation and needs to write data to a DB.  This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time.  And there are 58 workstation which all have/use locally mounted disks for this work. 

At first blush, this is easy.  Just create a DB on a server and have all those clients work with it.  But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk.  He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
 
My first thought was SQLite.  Apparently, they now have some sort of multiple, concurrent write ability.  But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea.  Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation.  That could work.  But 58 DB instances ?  If he didn't like the ideal of one DBA, 58 can't be good.  Still, the DB would be on the workstation which seems to be what he wants.
 
I can't think of anything better.  Does anyone have any ideas?

Thanks in Advance !


Re: unorthodox use of PG for a customer

From
John McKown
Date:
On Fri, Aug 24, 2018 at 1:19 PM David Gauthier <davegauthierpg@gmail.com> wrote:
Hi Everyone:

I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy.  I'm on the brink of telling him it's impractical and/or inadvisable.  But maybe someone has a solution.

Reading below, I think you're right about it being inadvisable. It seems to me that your user, like most, just has "needs" and doesn't really want to worry about "details" or much of anything else. We have programmers where I would with this attitude: We list what we want -- you supply it and maintain it. Because we only code (we don't even design much because we're AGILE! ), we don't worry about those little details (availability, reliability, security -- not our concern!). One thing missing from your post is the OS involved. Linux? Windows? Other? Multiple different ones? E.g. some users are Linux while others are Windows. {ouch}.

 

He's writing a script/program that runs on a workstation and needs to write data to a DB.  This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time.  And there are 58 workstation which all have/use locally mounted disks for this work. 

First question. You say DB. Do you need an SQL based data base. Or do you just need a shared data store which is easy to use in a script. Have you considered any NOSQL type data stores such as CouchDB, MongoDB

Second question, which scripting language and what programming language? Examples might be: Windows Powershell, Linux BASH, Windows BASH, Python, Perl, surely not some AWK variant, R, Google GO, Rush, C/C++, and so on.

 

At first blush, this is easy.  Just create a DB on a server and have all those clients work with it.  But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk.  He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.

So, if I understand, he wants a single shared data store for 58 clients. He also wants the disk holding the data to be "locally mounted" to every workstation. Depending on what "locally mounted" means to the user, I only way that I know of to do something like this is to have the actual disk / filesystem directly attached to a single server. All the workstations would need to use a communication protocol (IP?) to communicate their I/O to the "shared data" to this server to do the physical I/O. So we're talking some protocol like NFS or CIFS (Windows "share"). The easiest way that I know to do this sharing is to have a NAS box, such as NetApp, which really is a server+disk "in a box" and which implements these protocols in an easy to manage manner.

Or can each workstation have a separate, unrelated data store for its processing. The thought then would be some way to have the application write locally into its data store. Something else, perhaps "in line", would replicate the data to a central store on a server. That server would then distribute the changes back out to all 58 workstations and ??? servers so that each will have, eventually, an identical copy of the current data; but the stores might have inconsistencies until they synchronise. 

 
 
My first thought was SQLite.  Apparently, they now have some sort of multiple, concurrent write ability.  But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea.  Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation.  That could work.  But 58 DB instances ?  If he didn't like the ideal of one DBA, 58 can't be good.  Still, the DB would be on the workstation which seems to be what he wants.
 
I can't think of anything better.  Does anyone have any ideas?

Yes, but the civil authorities would prosecute you if they found the body.

 

Thanks in Advance !



--
Between infinite and short there is a big difference. -- G.H. Gonnet

Maranatha! <><
John McKown

Re: unorthodox use of PG for a customer

From
Olivier Gautherot
Date:
Hi Dave, lots of nice inputs but let's try to summarize your user's needs.

On Fri, Aug 24, 2018 at 3:18 PM, David Gauthier <davegauthierpg@gmail.com> wrote:
[...]

He's writing a script/program that runs on a workstation and needs to write data to a DB.  This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time.  And there are 58 workstation which all have/use locally mounted disks for this work.

Do the workstations work independently or do they need to share information? If they share information, does it need to be ACID or is "eventually consistent" enough? Also, what is the size of the database? A local database is viable as long as it does not overload the workstation but it you have many instances of the job per workstation and 58 of them, I would rely on an optimized central DB.

Are the jobs mainly INSERT or a mix of INSERT and SELECT/UPDATE?

Things to consider: amount of RAM required, CPU, disk I/O... The cost factor should be considered.

At first blush, this is easy.  Just create a DB on a server and have all those clients work with it.  But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk.  He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.

If he insists on having a local copy of the database, suggest PG10 with logical replication - ideally the bidirectional replication. Once he realizes that every INSERT causes 57 replications and this will most likely kill the network, he may reconsider the requirement... Queries to a central DB is likely to cause less stress and will be easier to handle.

Now, if speed is critical, you may also consider an in-memory DB like Redis. There are schemes running in multi-master.

My first thought was SQLite.  Apparently, they now have some sort of multiple, concurrent write ability.  But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea.  Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation.  That could work.  But 58 DB instances ?  If he didn't like the ideal of one DBA, 58 can't be good.  Still, the DB would be on the workstation which seems to be what he wants.

A distributed database is likely to cause failures at some point if not handled properly - and it sounds like there won't be any DBA at all. I would be extremely cautious in this case.

Re: unorthodox use of PG for a customer

From
Dimitri Maziuk
Date:
On 8/27/2018 8:20 AM, John McKown wrote:
> On Fri, Aug 24, 2018 at 1:19 PM David Gauthier <davegauthierpg@gmail.com 
> <mailto:davegauthierpg@gmail.com>> wrote:

>     I can't think of anything better.  Does anyone have any ideas?
> 
> Yes, but the civil authorities would prosecute you if they found the body.

And that's why we have raised floors and locked doors in the server rooms.

Dima