Hornox is a fast BSON serializer, deserializer and node extractor for the JVM.

Related tags

Utility hornox-bson
Overview

Hornox-Bson

Hornox is a fast, simple-stupid BSON serializer, deserializer and node extractor for the JVM.

Features

  • Full implementation of the BSON Specification with serialization and deserialization
  • Implemented in pure Kotlin, should work for all JVM languages
  • DOM elements implement the Jakarta JSON API (former javax.json)
  • Binary format is byte-by-byte identical to the output produced by the BSON implementation in the Java MongoDB Driver
  • Extract individual paths from a byte array in BSON format without deserializing the whole document

Installation

Please check the latest release on our Maven Distribution. It contains a wide variety of instructions for different package managers. As an example, here's the maven dependency:

<dependency>
    <groupId>io.txturegroupId>
    <artifactId>hornox-bsonartifactId>
    <version>{please check for latest version}version>
dependency>

Performance

Hornox is fast. Its parser outperforms other popular JVM-based BSON parsers by almost 50% (or more):

The chart above shows the performance comparison between Hornox and:

The dataset consists of ~3 million individual documents. The documents have a wide variety of different node types and sizes. The collection consists of a grand total of 207MB (binary BSON data on disk). The benchmark was repeated 10 times on a pre-warmed JVM with sufficient heap space. We have observed similar results for smaller datasets.

Hornox owes this speed advantage primarily to one thing: its simplicity.

Building from Source

Hornox uses a standard Gradle project structure. It requires Java 17 as well as Kotlin 1.6.

# build the binary
./gradlew build

# run the tests
./gradlew test

Serialization

val doc = DocumentNode()
doc["firstName"] = TextNode("John")
doc["lastName"] = TextNode("Doe")
doc["age"] = Int32Node(42)

val byteArray = BsonSerializer.serializeBsonDocument(doc)

Deserialization

val bytes: ByteArray = ... // e.g. load it from a file, from a HTTP response...
val document = BsonDeserializer.deserializeBsonDocument(bytes)

Hornox currently supports ByteArrays, ByteBuffers and InputStreams as inputs to the parser / path extractor. While they are equivalent in terms of parser / extractor functionality, please note that ByteArrays are always best in terms of parser performance (but require that the whole input is present in-memory). InputStreams (aside from being a very ubiquitous interface in the Java ecosystem) offer the potential of lazily loading your input data if necessary, but are generally slower to parse.

Individual Node Extraction from BSON Byte Arrays

Hornox allows you to extract a path from the serialized (binary) BSON format. If you're interested only in a single node from the document, this extraction is generally much faster than loading the entire document DOM and then navigating through it.

val byteArray: ByteArray = ... // e.g. load it from a file, from a HTTP response...
/*
  For the example, let's assume the following BSON structure:
  {
      "name": "Txture",
      "addresses": [
          {
              "country": "Austria",
              "city": "Innsbruck",
              "zipCode": "6020"
          }
      ]
  }
*/
val path = listOf("addresses", "0", "city")
val city = BsonDeserializer.extractBsonNode(byteArray, path)
// city will be a TextNode containing "Innsbruck"

A note on Size Markers

The BSON Specification specifies "size" fields in several places. The node extraction algorithm can make use of those indicators in order to significantly speed up the search when it needs to skip over document entries. However, some serializers do not write proper size values into those fields (in order to make the serialization process faster). You can specify with a boolean whether or not Hornox should trust the size fields in the BSON, or if it should ignore them and take the "safe route" through the binary document (at the cost of performance):

// by default, Hornox will not trust the size markers.
val city = BsonDeserializer.extractBsonNode(byteArray, path, trustSizeMarkers = false)
// if you are sure that your binary BSON contains valid size markers, 
// Hornox can use them for enhanced extraction performance.
val city2 = BsonDeserializer.extractBsonNode(byteArray, path, trustSizeMarkers = true)

If a size marker contains a negative value, Hornox will always ignore it. The size markers of Binary Data Nodes always need to be accurate, because there is no other delimiter.

When serializing a document, Hornox offers three options when it comes to size markers:

  • USE_MINUS_1: Always write -1 in all size fields, effectively invalidating them. This is the faster to write than recomputing the accurate sizes, but slower to scan through later.
  • TRUST_DOCUMENT: Trust the length field which is present in the DocumentNode or ArrayNode, and write its contents into the byte array. This is generally not recommended, as the length field is not automatically updated when the content of the document or array changes. However, it can be used to quickly serialize a document with known size fields.
  • RECOMPUTE: Recompute all sizes prior to serialization. This is the default. Please note that this also changes the length fields of the DocumentNodes and ArrayNodes as a side-effect.

Bring your own DOM classes

Since version 1.1, Hornox supports parsing / serializing custom DOM tree nodes. Your classes do not need to implement any particular interfaces for this to work; all you have to do is to provide a valid implementation of the BsonDomModule interface, and pass this module to the BsonSerializer or BsonDeserializer method of your choice. Of course, you can still use the DOM nodes that come with Hornox itself. For a reference implementation of the interface, please have a look at HornoxDomModule.

You might also like...
Matches incoming and/or outgoing text messages against set rules and sends them over to webhook.

Textmatic If you ever wanted a tool to simply push the SMS (or text messages) from your phone to somewhere remote, this is it. This app matches all in

Command framework built around Kord, built to be robust and scalable, following Kord's convention and design patterns.

Command framework built around Kord, built to be robust and scalable, following Kord's convention and design patterns.

DiskCache - Simple and readable disk cache for kotlin and android applications

DiskCache Simple and readable disk cache for kotlin and android applications (with journaled lru strategy) This is a simple lru disk cache, based on t

a simple cache for android and java

ASimpleCache ASimpleCache 是一个为android制定的 轻量级的 开源缓存框架。轻量到只有一个java文件(由十几个类精简而来)。 1、它可以缓存什么东西? 普通的字符串、JsonObject、JsonArray、Bitmap、Drawable、序列化的java对象,和 b

UPnP/DLNA library for Java and Android

Cling EOL: This project is no longer actively maintained, code may be outdated. If you are interested in maintaining and developing this project, comm

WebSocket & WAMP in Java for Android and Java 8

Autobahn|Java Client library providing WAMP on Java 8 (Netty) and Android, plus (secure) WebSocket for Android. Autobahn|Java is a subproject of the A

Collection of source codes, utilities, templates and snippets for Android development.

Android Templates and Utilities [DEPRECATED] Android Templates and Utilities are deprecated. I started with this project in 2012. Android ecosystem ha

A support library for VectorDrawable and AnimatedVectorDrawable classes introduced in Lollipop

vector-compat A support library for VectorDrawable and AnimatedVectorDrawable introduced in Lollipop with fully backwards compatible tint support (api

A Virtual Machine For Assessing Android applications, Reverse Engineering and Malware Analysis

Androl4b AndroL4b is an android security virtual machine based on ubuntu-mate includes the collection of latest framework, tutorials and labs from dif

Releases(1.1)
  • 1.1(May 19, 2022)

    New Features in this release:

    • Even better read performance thanks to good old bit manipulation techniques.
    • Support for more types of inputs (e.g. ByteArray, ByteBuffer and InputStream).
      • Please note that read performance may vary between inputs, even if they deliver the same data. Raw ByteArrays are always fastest to process.
    • New BsonDomModule API decouples the serializer / deserializer API from the actual DOM nodes. You can create your own DOM classes by implementing the BsonDomModule interface and still use the other features of Hornox.

    Fixes:

    • Some BsonNodes had wrong or misleading toString implementations.
    Source code(tar.gz)
    Source code(zip)
  • 1.0(May 13, 2022)

Owner
Txture
Txture
A lightning fast, transactional, file-based FIFO for Android and Java.

Tape by Square, Inc. Tape is a collection of queue-related classes for Android and Java. QueueFile is a lightning-fast, transactional, file-based FIFO

Square 2.4k Dec 30, 2022
Multiplaform kotlin library for calculating text differences. Based on java-diff-utils, supports JVM, JS and native targets.

kotlin-multiplatform-diff This is a port of java-diff-utils to kotlin with multiplatform support. All credit for the implementation goes to original a

Peter Trifanov 51 Jan 3, 2023
📃 Turn Google Spreadsheet to JSON endpoint (for Android and JVM) for FREE (100%)

retrosheet ?? Turn Google Spreadsheet to JSON endpoint. [For Android and JVM]. Benefits ?? No worries about server health (because you're using Google

Sifar 687 Jan 4, 2023
Fuzzy string matching for Kotlin (JVM, native, JS, Web Assembly) - port of Fuzzy Wuzzy Python lib

FuzzyWuzzy-Kotlin Fuzzy string matching for Kotlin (JVM, iOS) - fork of the Java fork of of Fuzzy Wuzzy Python lib. For use in on JVM, Android, or Kot

WillowTree, LLC 54 Nov 8, 2022
gRPC and protocol buffers for Android, Kotlin, and Java.

Wire “A man got to have a code!” - Omar Little See the project website for documentation and APIs. As our teams and programs grow, the variety and vol

Square 3.9k Dec 31, 2022
General purpose utilities and hash functions for Android and Java (aka java-common)

Essentials Essentials are a collection of general-purpose classes we found useful in many occasions. Beats standard Java API performance, e.g. LongHas

Markus Junginger 1.4k Dec 29, 2022
Access and process various types of personal data in Android with a set of easy, uniform, and privacy-friendly APIs.

PrivacyStreams PrivacyStreams is an Android library for easy and privacy-friendly personal data access and processing. It offers a functional programm

null 269 Dec 1, 2022
A simple and easy to use stopwatch and timer library for android

TimeIt Now with Timer support! A simple and easy to use stopwatch and timer library for android Introduction A stopwatch can be a very important widge

Yashovardhan Dhanania 35 Dec 10, 2022
Trail is a simple logging system for Java and Android. Create logs using the same API and the library will detect automatically in which platform the code is running.

Trail Trail is a simple logging system for Java and Android. Create logs using the same API and the library will detect automatically in which platfor

Mauricio Togneri 13 Aug 29, 2022
General purpose utilities and hash functions for Android and Java (aka java-common)

Essentials Essentials are a collection of general-purpose classes we found useful in many occasions. Beats standard Java API performance, e.g. LongHas

Markus Junginger 1.4k Dec 29, 2022