Thread: Where **not** to use PostgreSQL?

Where **not** to use PostgreSQL?

From
Thomas Güttler
Date:
Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?


Regards,
   Thomas Güttler


-- 
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines


Re: Where **not** to use PostgreSQL?

From
Pavel Stehule
Date:
Hi

čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:
Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?

Don't use Postgres like cache, don't use Postgres for non transactional short life often updated data.

Use inmemory databases instead

Pavel



Regards,
   Thomas Güttler


--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines

Re: Where **not** to use PostgreSQL?

From
Chris Travers
Date:


On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:
Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Hard question.  There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful weighing of pros and cons would need to be made before deciding to use it.

Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and near-real-time OLAP workloads, I am fairly aware of how hard it can be pushed.

So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives."

 

Why would you do this?

Replacing with "What would you consider to be the tradeoffs?" 

What alternative would you suggest instead?

So a few possibilities:

1.  a) TB-scale full text search systems.
     b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems.  So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL.  Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
     c) Solr or ElasticSearch

2.  a) High performance job queues
     b) PostgreSQL index and table structures are not well suited to large numbers of ordered deletes.  There are ways around these problems and again if other data is in PostgreSQL, the tradeoff is around transactional behavior and complexity there vs ease of scaling performance.
     c) Redis if the job queue easily fits into a small enough amount of memory, or Kafka if it does not

On to where you actually should never use PostgreSQL:

Don't use PostgreSQL for things where you do not want or cannot guarantee transactional atomicity.  
While it is possible to have untrusted languages have side effects in the real world, the fact is that mixing transactions and non-transactional behavior in this way adds a lot of really ugly complexity.
Use another development environment instead.
 

Don't use Postgres like cache, don't use Postgres for non transactional short life often updated data.

Use inmemory databases instead

Pavel



Regards,
   Thomas Güttler


--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Where **not** to use PostgreSQL?

From
Nicolas Grilly
Date:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com> wrote:
1.  a) TB-scale full text search systems.
     b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems.  So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL.  Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
     c) Solr or ElasticSearch

One question about your use of PostgreSQL for a TB-scale full-text search system: Did you order search results using ts_rank or ts_rank_cd? I'm asking because in my experience, PostgreSQL full-text search is extremely efficient, until you need ranking. It's because the indexes don't contain the necessary information for ranking, and because of this the heap has to be consulted, which implies a lot of random IO.

I'd be curious to know a bit more about your experience in this regard.

Regards,

Nicolas Grilly

PS: A potential solution to the performance issue I mentioned is this PG extension: https://github.com/postgrespro/rum

Re: Where **not** to use PostgreSQL?

From
Chris Travers
Date:


On Thu, Feb 28, 2019 at 1:50 PM Nicolas Grilly <nicolas@gardentechno.com> wrote:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com> wrote:
1.  a) TB-scale full text search systems.
     b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems.  So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL.  Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
     c) Solr or ElasticSearch


One question about your use of PostgreSQL for a TB-scale full-text search system: Did you order search results using ts_rank or ts_rank_cd? I'm asking because in my experience, PostgreSQL full-text search is extremely efficient, until you need ranking. It's because the indexes don't contain the necessary information for ranking, and because of this the heap has to be consulted, which implies a lot of random IO.

I'd be curious to know a bit more about your experience in this regard.


Where I did this on the TB scale, we had some sort of ranking but it was not based on ts_rank.

On the PB scale systems I work on now, it is distributed, and we don't order in PostgreSQL (or anywhere else, though if someone wants to write to disk and sort, they can do this I guess)  

Regards,

Nicolas Grilly

PS: A potential solution to the performance issue I mentioned is this PG extension: https://github.com/postgrespro/rum



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Where **not** to use PostgreSQL?

From
Nicolas Grilly
Date:
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com> wrote:
Where I did this on the TB scale, we had some sort of ranking but it was not based on ts_rank.

On the PB scale systems I work on now, it is distributed, and we don't order in PostgreSQL (or anywhere else, though if someone wants to write to disk and sort, they can do this I guess)  

Thanks!

Re: Where **not** to use PostgreSQL?

From
Mark Moellering
Date:
I wish more people would ask this question, to me, it is the true mark of experience.  In general, I think of PostgreSQL as the leading Relational Database.  The farther you get away from relational data and relational queries, the more I would say, you should look for other products or solutions.  But if you want to store relational data and then run queries over it, then stick with PostgreSQL.

My 2 scents..

Mark

On Thu, Feb 28, 2019 at 8:28 AM Nicolas Grilly <nicolas@gardentechno.com> wrote:
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com> wrote:
Where I did this on the TB scale, we had some sort of ranking but it was not based on ts_rank.

On the PB scale systems I work on now, it is distributed, and we don't order in PostgreSQL (or anywhere else, though if someone wants to write to disk and sort, they can do this I guess)  

Thanks!

Re: Where **not** to use PostgreSQL?

From
Ron
Date:
On 2/28/19 5:47 AM, Thomas Güttler wrote:
> Hi experts,
>
> where would you suggest someone to **not** use PostgreSQL?

1. Small embedded systems.  SQLite is great for that.

2. Easy-to-implement Master-Master replication.  (The Percona fork of MySQL 
does that really well, if you can handle MySQL's limitations.)



-- 
Angular momentum makes the world go 'round.


Re: Where **not** to use PostgreSQL?

From
Michel Pelletier
Date:
On Thu, Feb 28, 2019 at 4:50 AM Nicolas Grilly <nicolas@gardentechno.com> wrote:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com> wrote:
1.  a) TB-scale full text search systems.
     b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems.  So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL.  Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
     c) Solr or ElasticSearch

One question about your use of PostgreSQL for a TB-scale full-text search system: Did you order search results using ts_rank or ts_rank_cd? I'm asking because in my experience, PostgreSQL full-text search is extremely efficient, until you need ranking. It's because the indexes don't contain the necessary information for ranking, and because of this the heap has to be consulted, which implies a lot of random IO.


Check out the RUM index extension, it adds ranking information to indexes to speed up exactly the problem you pointed out:


 

Re: Where **not** to use PostgreSQL?

From
Raymond Brinzer
Date:
I often avoid PostgreSQL when using software for which PostgreSQL
support is secondary. Usually this is the case where MySQL is the
default, but PostgreSQL is on the "also supported" list.  "Also" is
too often a synonym for "badly", here, and unless I really want to be
able to approach the underlying database *as a database*, the better
choice is to go with the flow.  Even when I do need that, I'll
consider whether the features I need are exclusive to, or much better
on, PostgreSQL.

When developing something myself, I've also chosen MySQL because the
other technical people likely to be involved are at least somewhat
familiar with it.  A person who is marginally competent with databases
doesn't need the added difficulty of learning a new DBMS while
learning whatever I created.

It's always a pleasure when I don't have such issues, and I can use PostgreSQL.

On Thu, Feb 28, 2019 at 6:47 AM Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
>
> Hi experts,
>
> where would you suggest someone to **not** use PostgreSQL?
>
> Why would you do this?
>
> What alternative would you suggest instead?
>
>
> Regards,
>    Thomas Güttler
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
> I am looking for feedback: https://github.com/guettli/programming-guidelines
>


--
Ray Brinzer


Re: Where **not** to use PostgreSQL?

From
Michel Pelletier
Date:
Doh, sorry I missed your postscript!

On Thu, Feb 28, 2019 at 8:02 AM Nicolas Grilly <nicolas@garden-paris.com> wrote:
Le jeu. 28 févr. 2019 à 16:48, Michel Pelletier <pelletier.michel@gmail.com> a écrit :
Check out the RUM index extension, it adds ranking information to indexes to speed up exactly the problem you pointed out:


I mentioned it at the end of my message:-)

It would be great to have it integrated in the standard distribution.

Thank you. Was: Where **not** to use PostgreSQL?

From
Thomas Güttler Lists
Date:
Thank you very much for your friendly answers.


I added some parts to my guidlines:

  https://github.com/guettli/programming-guidelines/blob/master/README.rst#where-to-not-use-postgresql


Feedback is welcome.


Regards,

   Thomas Güttler


Am 28.02.19 um 12:47 schrieb Thomas Güttler:
> Hi experts,
>
> where would you suggest someone to **not** use PostgreSQL?
>
> Why would you do this?
>
> What alternative would you suggest instead?
>
>
> Regards,
>   Thomas Güttler
>
>
-- 
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines



Re: Where **not** to use PostgreSQL?

From
Merlin Moncure
Date:
On Thu, Feb 28, 2019 at 6:24 AM Chris Travers <chris.travers@gmail.com> wrote:
>
> On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> Hi
>>
>> čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:
>>>
>>> Hi experts,
>>>
>>> where would you suggest someone to **not** use PostgreSQL?
>
>
> Hard question.  There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful
weighingof pros and cons would need to be made before deciding to use it. 
>
> Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and
near-real-timeOLAP workloads, I am fairly aware of how hard it can be pushed. 
>
> So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives."
>
>
>>>
>>>
>>> Why would you do this?
>
>
> Replacing with "What would you consider to be the tradeoffs?"
>>>
>>>
>>> What alternative would you suggest instead?
>
>
> So a few possibilities:
>
> 1.  a) TB-scale full text search systems.
>      b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace
Lucene-basedsystems.  So you have to consider complexity vs functionality if you are tying with other data that is
alreadyin PostgreSQL.  Note further that my experience with at least ElasticSearch is that it is easier to scale
somethingbuilt on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range. 
>      c) Solr or ElasticSearch

In my company we had to swap out solr for postgres.  The main
challenge was that solr's limited query language was not able to deal
with complex authorization use cases that were coming in after the
original project was deployed.  Our only solution was to  heavily
denormalize the documents so that when simple changes happened on the
OLTP side we had to push large amounts of data into SOLR.  In addition
to being slow, solr (or I guess the underlying lucene) started to
develop weird failure modes; there were unpredictable swings in memory
and/or disk usage, underlying system resource exhaustion (especially
fd).  The whole thing felt unstable; we had tested heavily with the
original requirements but the tech did not evolve with the project.

The solution was to junk the whole thing and replace it with an API
compatible version of solr in the database.  To index the document we
use a special search string with upper case keys and lower case values
in a tab delimited text string; pg_trgm/gin does the rest of the
lifting.   It can't compete with solr on best case behavior but give
much better worst case behavior, and, since we don't have to
denormalize, the system fits within memory making scaling a snap.

The moral of the story here is 'Just use postgres'.  This is not
zealotry; if I were a microsoft inclined person, I might be advising
use of sql server.  If you are not totally and completely aware of the
limits of the system you are probably operating within them.  The
database is evolving rapidly and insanely powerful servers,
supercomputers even, from the perspective of even 10 years ago, can be
had for a mouse click on the cheap.

There of course a few use cases were postgres is not optimal tech;
highly unstructured data...super high transaction rate master master
loss tolerant data archiving, warm log storage, etc.  These problems
show up quite rarely in the world of data which is generally directed
towards systems support of business applications.  If you haven't
mastered the database first, you probably shouldn't be building out
complex systems in non-database technology since you don't know what
you don't know (for example, the immense value that transactions bring
to the table).

merlin


Re: Where **not** to use PostgreSQL?

From
Steven Lembark
Date:
For small, lightweight, portable SQLite is quite nice for times you
don't want to create a full PG installation and still prefer SQL for
lookups.

-- 
Steven Lembark                                     3920 10th Ave South
Workhorse Computing                               Birmingham, AL 35222
lembark@wrkhors.com                                    +1 888 359 3508


Re: Where **not** to use PostgreSQL?

From
Thomas Kellerer
Date:
Thomas Güttler schrieb am 28.02.2019 um 12:47:
> where would you suggest someone to **not** use PostgreSQL?
> 
> Why would you do this?
> 
> What alternative would you suggest instead?

Due to the MVCC implementation, I would hesitate to use Postgres in environments that have an extremely high and
constantrate of DELETE an UPDATE statements, e.g. several thousands or even tens of thousands transactions per second
withoutany "quiet" times where vacuum could "catch up" clean out dead tuples. 
 

Tuning autovacuum to cope with that is very challenging 
(at least until the new UNDO log implementation is ready ;)