Thread: Where **not** to use PostgreSQL?
Hi experts, where would you suggest someone to **not** use PostgreSQL? Why would you do this? What alternative would you suggest instead? Regards, Thomas Güttler -- Thomas Guettler http://www.thomas-guettler.de/ I am looking for feedback: https://github.com/guettli/programming-guidelines
Hi
čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Why would you do this?
What alternative would you suggest instead?
Don't use Postgres like cache, don't use Postgres for non transactional short life often updated data.
Use inmemory databases instead
Pavel
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines
On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hičt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Hard question. There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful weighing of pros and cons would need to be made before deciding to use it.
Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and near-real-time OLAP workloads, I am fairly aware of how hard it can be pushed.
So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives."
Why would you do this?
Replacing with "What would you consider to be the tradeoffs?"
What alternative would you suggest instead?
So a few possibilities:
1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
c) Solr or ElasticSearch
2. a) High performance job queues
b) PostgreSQL index and table structures are not well suited to large numbers of ordered deletes. There are ways around these problems and again if other data is in PostgreSQL, the tradeoff is around transactional behavior and complexity there vs ease of scaling performance.
c) Redis if the job queue easily fits into a small enough amount of memory, or Kafka if it does not
On to where you actually should never use PostgreSQL:
Don't use PostgreSQL for things where you do not want or cannot guarantee transactional atomicity.
While it is possible to have untrusted languages have side effects in the real world, the fact is that mixing transactions and non-transactional behavior in this way adds a lot of really ugly complexity.
Use another development environment instead.
Don't use Postgres like cache, don't use Postgres for non transactional short life often updated data.Use inmemory databases insteadPavel
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com> wrote:
1. a) TB-scale full text search systems.b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.c) Solr or ElasticSearch
One question about your use of PostgreSQL for a TB-scale full-text search system: Did you order search results using ts_rank or ts_rank_cd? I'm asking because in my experience, PostgreSQL full-text search is extremely efficient, until you need ranking. It's because the indexes don't contain the necessary information for ranking, and because of this the heap has to be consulted, which implies a lot of random IO.
I'd be curious to know a bit more about your experience in this regard.
Regards,
Nicolas Grilly
PS: A potential solution to the performance issue I mentioned is this PG extension: https://github.com/postgrespro/rum
On Thu, Feb 28, 2019 at 1:50 PM Nicolas Grilly <nicolas@gardentechno.com> wrote:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com> wrote:1. a) TB-scale full text search systems.b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.c) Solr or ElasticSearch
One question about your use of PostgreSQL for a TB-scale full-text search system: Did you order search results using ts_rank or ts_rank_cd? I'm asking because in my experience, PostgreSQL full-text search is extremely efficient, until you need ranking. It's because the indexes don't contain the necessary information for ranking, and because of this the heap has to be consulted, which implies a lot of random IO.I'd be curious to know a bit more about your experience in this regard.
Where I did this on the TB scale, we had some sort of ranking but it was not based on ts_rank.
On the PB scale systems I work on now, it is distributed, and we don't order in PostgreSQL (or anywhere else, though if someone wants to write to disk and sort, they can do this I guess)
Regards,Nicolas GrillyPS: A potential solution to the performance issue I mentioned is this PG extension: https://github.com/postgrespro/rum
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com> wrote:
Where I did this on the TB scale, we had some sort of ranking but it was not based on ts_rank.On the PB scale systems I work on now, it is distributed, and we don't order in PostgreSQL (or anywhere else, though if someone wants to write to disk and sort, they can do this I guess)
Thanks!
I wish more people would ask this question, to me, it is the true mark of experience. In general, I think of PostgreSQL as the leading Relational Database. The farther you get away from relational data and relational queries, the more I would say, you should look for other products or solutions. But if you want to store relational data and then run queries over it, then stick with PostgreSQL.
My 2 scents..
Mark
On Thu, Feb 28, 2019 at 8:28 AM Nicolas Grilly <nicolas@gardentechno.com> wrote:
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com> wrote:Where I did this on the TB scale, we had some sort of ranking but it was not based on ts_rank.On the PB scale systems I work on now, it is distributed, and we don't order in PostgreSQL (or anywhere else, though if someone wants to write to disk and sort, they can do this I guess)Thanks!
On 2/28/19 5:47 AM, Thomas Güttler wrote: > Hi experts, > > where would you suggest someone to **not** use PostgreSQL? 1. Small embedded systems. SQLite is great for that. 2. Easy-to-implement Master-Master replication. (The Percona fork of MySQL does that really well, if you can handle MySQL's limitations.) -- Angular momentum makes the world go 'round.
On Thu, Feb 28, 2019 at 4:50 AM Nicolas Grilly <nicolas@gardentechno.com> wrote:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com> wrote:1. a) TB-scale full text search systems.b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.c) Solr or ElasticSearchOne question about your use of PostgreSQL for a TB-scale full-text search system: Did you order search results using ts_rank or ts_rank_cd? I'm asking because in my experience, PostgreSQL full-text search is extremely efficient, until you need ranking. It's because the indexes don't contain the necessary information for ranking, and because of this the heap has to be consulted, which implies a lot of random IO.
Check out the RUM index extension, it adds ranking information to indexes to speed up exactly the problem you pointed out:
I often avoid PostgreSQL when using software for which PostgreSQL support is secondary. Usually this is the case where MySQL is the default, but PostgreSQL is on the "also supported" list. "Also" is too often a synonym for "badly", here, and unless I really want to be able to approach the underlying database *as a database*, the better choice is to go with the flow. Even when I do need that, I'll consider whether the features I need are exclusive to, or much better on, PostgreSQL. When developing something myself, I've also chosen MySQL because the other technical people likely to be involved are at least somewhat familiar with it. A person who is marginally competent with databases doesn't need the added difficulty of learning a new DBMS while learning whatever I created. It's always a pleasure when I don't have such issues, and I can use PostgreSQL. On Thu, Feb 28, 2019 at 6:47 AM Thomas Güttler <guettliml@thomas-guettler.de> wrote: > > Hi experts, > > where would you suggest someone to **not** use PostgreSQL? > > Why would you do this? > > What alternative would you suggest instead? > > > Regards, > Thomas Güttler > > > -- > Thomas Guettler http://www.thomas-guettler.de/ > I am looking for feedback: https://github.com/guettli/programming-guidelines > -- Ray Brinzer
Doh, sorry I missed your postscript!
On Thu, Feb 28, 2019 at 8:02 AM Nicolas Grilly <nicolas@garden-paris.com> wrote:
Le jeu. 28 févr. 2019 à 16:48, Michel Pelletier <pelletier.michel@gmail.com> a écrit :Check out the RUM index extension, it adds ranking information to indexes to speed up exactly the problem you pointed out:I mentioned it at the end of my message:-)It would be great to have it integrated in the standard distribution.
Thank you very much for your friendly answers. I added some parts to my guidlines: https://github.com/guettli/programming-guidelines/blob/master/README.rst#where-to-not-use-postgresql Feedback is welcome. Regards, Thomas Güttler Am 28.02.19 um 12:47 schrieb Thomas Güttler: > Hi experts, > > where would you suggest someone to **not** use PostgreSQL? > > Why would you do this? > > What alternative would you suggest instead? > > > Regards, > Thomas Güttler > > -- Thomas Guettler http://www.thomas-guettler.de/ I am looking for feedback: https://github.com/guettli/programming-guidelines
On Thu, Feb 28, 2019 at 6:24 AM Chris Travers <chris.travers@gmail.com> wrote: > > On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> Hi >> >> čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal: >>> >>> Hi experts, >>> >>> where would you suggest someone to **not** use PostgreSQL? > > > Hard question. There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful weighingof pros and cons would need to be made before deciding to use it. > > Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and near-real-timeOLAP workloads, I am fairly aware of how hard it can be pushed. > > So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives." > > >>> >>> >>> Why would you do this? > > > Replacing with "What would you consider to be the tradeoffs?" >>> >>> >>> What alternative would you suggest instead? > > > So a few possibilities: > > 1. a) TB-scale full text search systems. > b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-basedsystems. So you have to consider complexity vs functionality if you are tying with other data that is alreadyin PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale somethingbuilt on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range. > c) Solr or ElasticSearch In my company we had to swap out solr for postgres. The main challenge was that solr's limited query language was not able to deal with complex authorization use cases that were coming in after the original project was deployed. Our only solution was to heavily denormalize the documents so that when simple changes happened on the OLTP side we had to push large amounts of data into SOLR. In addition to being slow, solr (or I guess the underlying lucene) started to develop weird failure modes; there were unpredictable swings in memory and/or disk usage, underlying system resource exhaustion (especially fd). The whole thing felt unstable; we had tested heavily with the original requirements but the tech did not evolve with the project. The solution was to junk the whole thing and replace it with an API compatible version of solr in the database. To index the document we use a special search string with upper case keys and lower case values in a tab delimited text string; pg_trgm/gin does the rest of the lifting. It can't compete with solr on best case behavior but give much better worst case behavior, and, since we don't have to denormalize, the system fits within memory making scaling a snap. The moral of the story here is 'Just use postgres'. This is not zealotry; if I were a microsoft inclined person, I might be advising use of sql server. If you are not totally and completely aware of the limits of the system you are probably operating within them. The database is evolving rapidly and insanely powerful servers, supercomputers even, from the perspective of even 10 years ago, can be had for a mouse click on the cheap. There of course a few use cases were postgres is not optimal tech; highly unstructured data...super high transaction rate master master loss tolerant data archiving, warm log storage, etc. These problems show up quite rarely in the world of data which is generally directed towards systems support of business applications. If you haven't mastered the database first, you probably shouldn't be building out complex systems in non-database technology since you don't know what you don't know (for example, the immense value that transactions bring to the table). merlin
For small, lightweight, portable SQLite is quite nice for times you don't want to create a full PG installation and still prefer SQL for lookups. -- Steven Lembark 3920 10th Ave South Workhorse Computing Birmingham, AL 35222 lembark@wrkhors.com +1 888 359 3508
Thomas Güttler schrieb am 28.02.2019 um 12:47: > where would you suggest someone to **not** use PostgreSQL? > > Why would you do this? > > What alternative would you suggest instead? Due to the MVCC implementation, I would hesitate to use Postgres in environments that have an extremely high and constantrate of DELETE an UPDATE statements, e.g. several thousands or even tens of thousands transactions per second withoutany "quiet" times where vacuum could "catch up" clean out dead tuples. Tuning autovacuum to cope with that is very challenging (at least until the new UNDO log implementation is ready ;)