Thread: GSOC 2018 ideas
Hi Aleksander,
This is Yan from Columbia University. I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the ideas of thrift data types support that proposed by you. So, I want to prepare for a proposal based on this idea. Can I have more detailed information of what documents or code that I need to understand? Also, if this idea is allocated to other student (or in other worlds, you prefer some student to work on it), do let me know, so that I can pick some other project in PostgreSQL. Any comments or suggestions are welcomed!
Hope for your reply!
Thanks Charles!
Hello Charles, > I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the > ideas of thrift data types support that proposed by you. So, I want to > prepare for a proposal based on this idea. Glad you are interested in this project! > Can I have more detailed information of what documents or code that I > need to understand? I would recommend the following documents and code: * Source code of pg_protobuf https://github.com/afiskon/pg_protobuf * "Writing Postgres Extensions" tutorial series by Manuel Kniep http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/ * "So you want to make an extension?" talk by Keith Fiske http://slides.keithf4.com/extension_dev/#/ * Apache Thrift official website https://thrift.apache.org/ * Also a great explanation of the Thrift format can be found in the book "Designing Data-Intensive Applications" by Martin Kleppmann http://dataintensive.net/ > Also, if this idea is allocated to other student (or in other worlds, > you prefer some student to work on it), do let me know, so that I can > pick some other project in PostgreSQL. Any comments or suggestions are > welcomed! To my best knowledge currently there are no other students interested in this particular work. -- Best regards, Aleksander Alekseev
Attachment
Got it, Aleksander! Will study these documents carefully!
2018-02-26 4:21 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
Hello Charles,
> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.
Glad you are interested in this project!
> Can I have more detailed information of what documents or code that I
> need to understand?
I would recommend the following documents and code:
* Source code of pg_protobuf
https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
http://big-elephants.com/2015-10/writing-postgres- extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
book "Designing Data-Intensive Applications" by Martin Kleppmann
http://dataintensive.net/
> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!
To my best knowledge currently there are no other students interested in
this particular work.
--
Best regards,
Aleksander Alekseev
Hi Aleksander,
Went through the documents listed by you, and they are helpful!
It seems the main purpose of extension pg_protobuf is to parse
a protobuf struct and return the decoded field. May I ask how these kinds
of extensions are used in postgreSQL (or in other words, the scenarios to
use these plugins)?
Thanks Charles!
2018-03-02 21:11 GMT-08:00 Charles Cui <charles.cui1984@gmail.com>:
Got it, Aleksander! Will study these documents carefully!2018-02-26 4:21 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:Hello Charles,
> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.
Glad you are interested in this project!
> Can I have more detailed information of what documents or code that I
> need to understand?
I would recommend the following documents and code:
* Source code of pg_protobuf
https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
http://big-elephants.com/2015-10/writing-postgres-extensions -part-i/
* "So you want to make an extension?" talk by Keith Fiske
http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
book "Designing Data-Intensive Applications" by Martin Kleppmann
http://dataintensive.net/
> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!
To my best knowledge currently there are no other students interested in
this particular work.
--
Best regards,
Aleksander Alekseev
Hello Charles, > Went through the documents listed by you, and they are helpful! > It seems the main purpose of extension pg_protobuf is to parse > a protobuf struct and return the decoded field. May I ask how these kinds > of extensions are used in postgreSQL (or in other words, the scenarios to > use these plugins)? There are a few ideas behind all of this. 1) Sometimes people are not quite happy with strict relational schema by various reasons and prefer something more agile, like XML or JSON. These formats are indeed more convenient under certain circumstances, for instance in terms of ease of changing and migrating the schema. 2) One drawback of JSON is redundancy. For instance, you have to store the names of all document fields. These names don't carry much information but consume disk space and RAM thus affecting the overall performance. ZSON extension [1] partially solved this issue. However I wouldn't call it particularly convenient and the whole approach of compressing JSON seems to me more like a dirty hack, not a solution. The problem appeared because of using the wrong data format in the first place. 3) Unlike JSON, formats like Protobuf or Thrift are binary formats and most importantly don't store any field names. Thus they don't create a problem described above. However, PostgreSQL is not capable to access Protobuf fields out-of-the-box, for instance to index these fields. This is what pg_protobuf is for. Hopefully this answers you question. If you have other questions please don't hesitate to ask! [1]: https://github.com/postgrespro/zson -- Best regards, Aleksander Alekseev
Attachment
2018-03-05 1:42 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
Hello Charles,
> Went through the documents listed by you, and they are helpful!
> It seems the main purpose of extension pg_protobuf is to parse
> a protobuf struct and return the decoded field. May I ask how these kinds
> of extensions are used in postgreSQL (or in other words, the scenarios to
> use these plugins)?
There are a few ideas behind all of this.
1) Sometimes people are not quite happy with strict relational schema by
various reasons and prefer something more agile, like XML or JSON. These
formats are indeed more convenient under certain circumstances, for
instance in terms of ease of changing and migrating the schema.
2) One drawback of JSON is redundancy. For instance, you have to store
the names of all document fields. These names don't carry much
information but consume disk space and RAM thus affecting the overall
performance. ZSON extension [1] partially solved this issue. However I
wouldn't call it particularly convenient and the whole approach of
compressing JSON seems to me more like a dirty hack, not a solution. The
problem appeared because of using the wrong data format in the first
place.
3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
most importantly don't store any field names. Thus they don't create a
problem described above. However, PostgreSQL is not capable to access
Protobuf fields out-of-the-box, for instance to index these fields. This
is what pg_protobuf is for.
The idea of using flexible schema and build index on top of them is awesome!
Will definitely submit a proposal and focus on this if get selected.
Thanks for answering my questions.
Hopefully this answers you question. If you have other questions please
don't hesitate to ask!
[1]: https://github.com/postgrespro/zson
--
Best regards,
Aleksander Alekseev