Home > mailing lists

Re: PostgreSQL VS MongoDB: a use case comparison - Mailing list pgsql-performance

From	Stephen Frost
Subject	Re: PostgreSQL VS MongoDB: a use case comparison
Date	November 20, 2018 16:34:20
Msg-id	20181120133420.GA3415@tamriel.snowman.net Whole thread Raw
In response to	Re: PostgreSQL VS MongoDB: a use case comparison (Fabio Pardi <f.pardi@portavita.eu>)
Responses	Re: PostgreSQL VS MongoDB: a use case comparison (Fabio Pardi <f.pardi@portavita.eu>) Re: PostgreSQL VS MongoDB: a use case comparison (Nicolas Paris <nicolas.paris@riseup.net>)
List	pgsql-performance

Tree view

Greetings,

* Fabio Pardi (f.pardi@portavita.eu) wrote:
> thanks for your feedback.

We prefer on these mailing lists to not top-post but instead to reply
inline, as I'm doing here.  This helps the conversation by eliminating
unnecessary dialogue and being able to make comments regarding specific
points clearly.

> I agree with you the compression is playing a role in the comparison.
> Probably there is a toll to pay when the load is high and the CPU
> stressed from de/compressing data. If we will be able to bring our
> studies that further, this is definitely something we would like to measure.

I was actually thinking of the compression as having more of an impact
with regard to the 'cold' cases because you're pulling fewer blocks when
it's compressed.  The decompression cost on CPU is typically much, much
less than the cost to pull the data off of the storage medium.  When
things are 'hot' and in cache then it might be interesting to question
if the compression/decompression is worth the cost.

> I also agree with you that at the moment Postgres really shines on
> relational data. To be honest, after seeing the outcome of our research,
> we are actually considering to decouple some (or all) fields from their
> JSON structure. There will be a toll to be payed there too, since we are
> receiving data in JSON format.

PostgreSQL has tools to help with this, you might look into
'json_to_record' and friends.

> And the toll will be in time spent to deliver such a solution, and
> indeed time spent by the engine in doing the conversion. It might not be
> that convenient after all.

Oh, the kind of reduction you'd see in space from both an on-disk and
in-memory footprint would almost certainly be worth the tiny amount of
CPU overhead from this.

> Anyway, to bring data from JSON to a relational model is out of topic
> for the current discussion, since we are actually questioning if
> Postgres is a good replacement for Mongo when handling JSON data.

This narrow viewpoint isn't really sensible though- what you should be
thinking about is what's appropriate for your *data*.  JSON is just a
data format, and while it's alright as a system inter-exchange format,
it's rather terrible as a storage format.

> As per sharing the dataset, as mentioned in the post we are handling
> medical data. Even if the content is anonymized, we are not keen to
> share the data structure too for security reasons.

If you really want people to take your analysis seriously, others must
be able to reproduce your results.  I certainly appreciate that there
are very good reasons that you can't share this actual data, but your
testing could be done with completely generated data which happens to be
similar in structure to your data and have similar frequency of values.

The way to approach generating such a data set would be to aggregate up
the actual data to a point where the appropriate committee/board agree
that it can be shared publicly, and then you build a randomly generated
set of data which aggregates to the same result and then use that for
testing.

> That's a pity I know but i cannot do anything about it.
> The queries we ran and the commands we used are mentioned in the blog
> post but if you see gaps, feel free to ask.

There were a lot of gaps that I saw when I looked through the article-
starting with things like the actual CREATE TABLE command you used, and
the complete size/structure of the JSON object, but really what a paper
like this should include is a full script which creates all the tables,
loads all the data, runs the analysis, calculates the results, etc.

Thanks!

Stephen

Attachment

signature.asc

pgsql-performance by date:

From: Fabio Pardi
Date: 20 November 2018, 16:11:23
Subject: Re: PostgreSQL VS MongoDB: a use case comparison

From: Fabio Pardi
Date: 20 November 2018, 18:53:03
Subject: Re: PostgreSQL VS MongoDB: a use case comparison

Re: PostgreSQL VS MongoDB: a use case comparison - Mailing list pgsql-performance

Attachment

Previous

Next