muspi_merol / blog / ybufp2lsenzas8bw

最后更新于:2023年8月9日

就msgpack压缩后大小的问题的讨论


在 GitHub 上的 msgpack/msgpack 库下面,很早之前我发起了这么一个话题,今天突然得到回复了

链接在此:https://github.com/msgpack/msgpack/issues/328


我个人得出的结论是:

  • Length-Prefixed serialization languages is less friendly to compressors compared to Delimiter-Separated serialization languages indeed
  • More fine-grained research can be done through taking equivalent structures in both JSON and MessagePack, compressing them both with the same algorithm, and then studying the resulting compressed encoding in detail to understand exactly what tradeoffs the compressor made
  • Web developers should consider JSON fist now and in the future, because compressing is almost at zero cost
  • This is not saying Length-Prefixed DML is of no use. I think its usage is representing larger data like a HEAD with 100KB body following. And the performance on speed and RAM usage is still huge advantages of Length-Prefixed DMLs
  • Finally, I think best practice of transmitting data when pursuing ultimate size efficiency, IDLs like protobuf are what you need. IDLs remove the separator of JSON or the HEAD of basic element in MsgPack representing the basic type. If you really want self-explanatory (you want the data can be interpreted without additional type definitions), you can transmit type definition before data 😂

翻译成中文即:

  • 压缩后,长度前缀数据的空间效率比分隔符分隔数据低
  • 可以通过更细致的研究来进行取JSON和MessagePack中的等效结构,用相同的算法压缩它们,然后详细研究结果压缩编码,以理解压缩器做出的具体权衡。
  • 网页开发者现在和未来都应该首先考虑JSON,因为压缩几乎是零成本
  • 这并不是说长度前缀数据没有用。我认为它的用途是表示更大的数据,比如后面跟着100KB体的头。在速度和RAM使用上的性能仍然是长度前缀数据的巨大优势
  • 最后,我认为在追求终极大小效率时传输数据的最佳实践,你需要的是像protobuf这样的IDLs。IDLs去除了JSON的分隔符或MsgPack中表示基本类型的基本元素的头。如果你真的想要自解释(你希望数据可以在没有额外类型定义的情况下被解释),你可以在数据之前传输类型定义😂

本文链接:

就msgpack压缩后大小的问题的讨论