A tool to convert & query Apache Calcite data sources as GraphQL API's

Overview

Apache Calcite <-> Distributed, Federated GraphQL API

This repo contains a work-in-progress prototype and research project on using Apache Calcite as the backbone of GraphQL services.

Similar work has been done by LinkedIn on Coral, though the GraphQL implementation is not yet publically available and Coral uses an internal form of IR that is slightly modified from Calcite's. Additionally, the goals and usecase of Coral are somewhat different from those of this project.

Goals

As a user I want:

  • To be able to access and modify data stored anywhere, given it has a well-defined structure (schema)
    • This includes Relational Databases, Non-relational Databases, data stored on-disk (CSV, JSON, etc)
  • To be able to query data from multiple databases in a single query (distributed/federated querying)
  • Ability to join across data sources by defining virtual relationships
  • Performance
    • Low latency, p90 for queries competitive with a hand-written implementation
    • Federated queries should complete quickly enough to be usable in day-to-day clientside operations

As an engineer, I want:

  • An industrial-grade query planner and optimizer backing the query execution
  • Extensibility. Ability to easily write new engines/adapters to run operations on
    • IE: Exposing Airtable's API as a GraphQL/SQL-queryable source
  • To work with a widely-used and mature set of tools.
  • Standardization. To not invent anything myself, because:
    • A) I'm probably not qualified to do so
    • B) There are likely people who have spent the length of my lifetime thinking about and solving a similar problem

Roadmap and Current Progress

As of today, given a Calcite schema (IE from a JDBC connection, or any other adapter), this project can generate the corresponding GraphQL schema and execute queries successfully. (See "Walkthrough of Current Progress" below)

GraphQL queries against this schema are able to be converted into their corresponding Calcite Relational Algebra expression, then executed against the Calcite adapter giving the proper results.

The Roadmap

  • Convert a Calcite Schema into corresponding GraphQL Schema types
    • Generate GraphQL object types for tables
    • Generate where boolean expression type to allow filtering
  • Convert a GraphQL query AST into matching Calcite Relational Algebra expression (RelNode) for the table
    • Where (critical/most important)
    • Limit
    • Offset
    • Distinct
    • Group By/Aggregations (nice to have)
  • Execute the converted Relational Algebra expression against data source, returning correct results
  • Automatically generate resolvers for the generated GraphQL query operations that perform the execution of the query (current execution is invoked manually)
  • Support Queries
  • Support Mutations
  • Figure out whether it is possible to support Subscriptions with Calcite's adapter model
    • Support subscriptions if so (nice to have)
  • Support JOIN / nested GraphQL field access and queries
  • Design system for dynamic Calcite schema registration and modification while program is running
  • Figure out how to let users implement their own data sources via HTTP (or similar)

Walkthrough of Current Progress

To give a sense of what exactly the above translates into, here's an illustration of the current functionality.

Given the schema of some Calcite data source table, like:

CREATE TABLE "EMPS"
(
    EMPNO    int,
    DEPTNO   int,
    ENAME    text,
    HIREDATE timestamptz,
    JOB      text,
    MGR      int,
    SAL      numeric,
    COMM     numeric
);

A GraphQL schema is generated:

type Query {
    EMP(limit: Int, offset: Int, order_by: String, where: EMP_bool_exp): [EMP!]
}

type EMP {
    EMPNO: Int
    # other fields...
}

input EMP_bool_exp {
    EMPNO: Int_comparison_exp
    # other fields...
    _and: [EMP_bool_exp!]
    _not: EMP_bool_exp
    _or: [EMP_bool_exp!]
}

Now, if we write a GraphQL query against this generated schema, something like below:

query {
    EMP(
        limit: 2,
        offset: 1,
        where: {
            _or: [
                { DEPTNO: { _eq: 20 } },
                { DEPTNO: { _eq: 30 } }
            ]
            _and: [
                { SAL: { _gte: 1500 } }
                {
                    _or: [
                        { JOB: { _eq: "SALESMAN" } },
                        { JOB: { _eq: "MANAGER" } }
                    ]
                }
            ]
        }
    ) {
        ... columns
    }
}

We can execute it. Calcite allows us to do a lot of things now. For instance, here is the query plan, at various stages of planning:

-- Logical Plan
LogicalSort(offset=[1], fetch=[2])
  LogicalFilter(condition=[AND(SEARCH($7, Sarg[20, 30]), >=($5, 1500), SEARCH($2, Sarg['MANAGER':CHAR(8), 'SALESMAN']:CHAR(8)))])
    JdbcTableScan(table=[[JDBC_SCOTT, EMP]])

-- Mid Plan
LogicalSort(subset=[rel#9:RelSubset#2.ENUMERABLE.[]], offset=[1], fetch=[2])
  LogicalFilter(subset=[rel#6:RelSubset#1.NONE.[]], condition=[AND(SEARCH($7, Sarg[20, 30]), >=($5, 1500), SEARCH($2, Sarg['MANAGER':CHAR(8), 'SALESMAN']:CHAR(8)))])
    JdbcTableScan(subset=[rel#4:RelSubset#0.JDBC.JDBC_SCOTT.[]], table=[[JDBC_SCOTT, EMP]])

-- Best Plan
EnumerableLimit(offset=[1], fetch=[2])
  JdbcToEnumerableConverter
    JdbcFilter(condition=[AND(SEARCH($7, Sarg[20, 30]), >=($5, 1500), SEARCH($2, Sarg['MANAGER':CHAR(8), 'SALESMAN']:CHAR(8)))])
      JdbcTableScan(table=[[JDBC_SCOTT, EMP]])

And here is the relational expression built out of the GraphQL query, represented as a (simplified + optimized) SQL query. This is immensely useful for debugging purposes and understanding what's happening:

SELECT *
FROM "SCOTT"."EMP"
WHERE "DEPTNO" IN (20, 30)
  AND "SAL" >= 1500
  AND "JOB" IN ('MANAGER', 'SALESMAN')
OFFSET 1 ROWS FETCH NEXT 2 ROWS ONLY

And finally, the results of our query:

EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO

7566,JONES,MANAGER,7839,1981-02-04,2975.00,null,20,
7698,BLAKE,MANAGER,7839,1981-01-05,2850.00,null,30,

We can use Calcite's query plan visualizer to understand what's going on:

calcite query plan web visualizer sample

There is still much left to prove out, but hopefully this should give some insight as to "what the heck is it you're trying to build"?

Technical Architecture

Approach and Design

The high level approach that has been pursued here can be broken down by some explanatory images.

Note: These images are taken from Stamatis Zampetakis fantastic presentation on YouTube

"An introduction to query processing & Apache Calcite"

The following image gives a high-level overview of the pieces of most query processors:

query-processor-general-architecture

Below, we can see how these high-level pieces map to Calcite's API classes and interfaces, as well as the boundaries between the "core" pieces, and which pieces are open to being written by developers as extensions.

Circled in blue are the two areas we are most interested in:

  • The region depicting the SqlParser and SqlToRelConverter shows how regular SQL queries are converted/translated into Relational Algebra expressions. We should in theory (and in practice) be able to do a similar thing to convert GraphQL queries into relational expressions.
  • The region on the righthand side, containing Schema and CatalogReader have been circled to call attention to how the Server's GraphQL API is auto-generated. We can ask Calcite to give us the Schema for any of it's data sources, and we are able to use the metadata from it to generate GraphQL types and resolvers.

query-processor-calcite-interfaces

With these pieces in place, you can see below how rather than SQL query, we might be able to write a GraphQL query with identical semantics, and continue using Calcite as though we were a "plain-old SQL query"

sql-query-to-calcite-ast-example

Some restrictions and assumptions made about the shape of the GraphQL API and corresponding queries, and this is what lets this entire thing be possible.

IE, the generated GraphQL schema operations only allow queries which have behaviors/semantics that can be mapped 1-to-1 to SQL.

With this, we can restrict the "domain" of GraphQL to the "domain" of standard SQL, and then our work is just one of writing the facades/conversions.

This is a 10,000ft view of the technical architecture and approach taken to the problem in this project. For details, see code.

Why Kotlin

This project is written in Kotlin.

If you were to check the commit history, you would find that there were, at one point, functioning prototypes in both Java and Scala 3 too.

Ultimately, Kotlin struck the best balance between language features and tooling + support + ecosystem. It is an incredibly productive and pragmatic language.

This project is performance-critical, and benchmarks showed that the latest Kotlin (1.6.10) is competitive with the latest Java (JDK 17) in terms of performance, and in some cases idiomatic code would perform somewhat better than the Java equivalent.

If further developments show negative performance impact from using a language other than Java, I will rewrite it all in Java.

Related Projects and Reference Material

You might also like...
A tool to convert unit tests from one standard to another

unit-test-exchange-maven-plugin This plugin works as a tool to convert unit test

A CLI tool to convert multi-module Jetpack Compose compiler metrics into beautiful HTML reports
A CLI tool to convert multi-module Jetpack Compose compiler metrics into beautiful HTML reports

A CLI tool to convert multi-module Jetpack Compose compiler metrics into beautiful HTML reports 1. What are Jetpack Compose compiler metrics? The Comp

Trying to play with Jetpack compose low level animations APIs, which are animate*AsState APIs.
Trying to play with Jetpack compose low level animations APIs, which are animate*AsState APIs.

ComposeSimpleAnimation Trying to play with Jetpack compose low level animations APIs, which are animate*AsState APIs that I needed in another project.

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

requery - modern SQL based query & persistence for Java / Kotlin / Android
requery - modern SQL based query & persistence for Java / Kotlin / Android

A light but powerful object mapping and SQL generator for Java/Kotlin/Android with RxJava and Java 8 support. Easily map to or create databases, perfo

Androidicons is a professional, handcrafted icon set with 250 icons in 14 different colors and 5 sizes, including an icon font and all sources.

Android Developer Icons Android Developer Icons is a custom icon set, created by Opoloo. Included are: 250 hand-crafted, pixel-perfect icons in 5 size

🚀 A very customizable library that allows you to present menu items (from menu resource and/or other sources) to users as a bottom sheet.
🚀 A very customizable library that allows you to present menu items (from menu resource and/or other sources) to users as a bottom sheet.

SlidingUpMenu A library that allows you to present menu items (from menu resource and/or other sources) to users as a bottom sheet. Gradle Dependency

Noice is an android app that allows you to create your own set of background sounds by mixing clips from environmental sources.
Noice is an android app that allows you to create your own set of background sounds by mixing clips from environmental sources.

A native Android app to relax, improve focus and boost productivity with minimal background noise.

Sources for the LiveBoot app for rooted Android devices

This is the sauce for the LiveBoot app. License Copyright © 2011-2020 Jorrit Chainfire Jongma This code is released under the GPLv3. LICENSE, COPYING.

Telegram client based on official Android sources

Telegram messenger for Android Telegram is a messaging app with a focus on speed and security. It’s superfast, simple and free. This repo contains the

App for searching unsplash photos library with query.
App for searching unsplash photos library with query.

android-imagesearch-app App for searching unsplash photos library with query. Used API for images: Unsplash Libraries used in the project: Paging 3 fo

NewsFeedApp - An android news-feed-app which shows latest news of various sources
NewsFeedApp - An android news-feed-app which shows latest news of various sources

NewsFeedApp An android news-feed-app which shows latest news of various sources

Project allowing to query products (languages, libraries, databases, etc) by their properties.

Products app This project allow to search products (mostly software products for now such as languages, libraries etc) based on their properties. For

Convert your YouTube channel into a native Android app using YouTube Data API v3.
Convert your YouTube channel into a native Android app using YouTube Data API v3.

Convert your YouTube channel into an app. Screenshots • Description • Features • Configuration • Documentation Screenshots Description Channelify is a

A Common RecyclerView.Adapter  implementation which supports all kind of items and has useful data operating APIs such as remove,add,etc.
A Common RecyclerView.Adapter implementation which supports all kind of items and has useful data operating APIs such as remove,add,etc.

##PowerfulRecyclerViewAdapter A Common RecyclerView.Adapter implementation which supports any kind of items and has useful data operating APIs such as

Access and process various types of personal data in Android with a set of easy, uniform, and privacy-friendly APIs.
Access and process various types of personal data in Android with a set of easy, uniform, and privacy-friendly APIs.

PrivacyStreams PrivacyStreams is an Android library for easy and privacy-friendly personal data access and processing. It offers a functional programm

Scrobble is a wip music tracking and browsing app. It uses the Lastf.fm and spotify APIs to deliver data. The whole UI is created using Jetpack compose.
Scrobble is a wip music tracking and browsing app. It uses the Lastf.fm and spotify APIs to deliver data. The whole UI is created using Jetpack compose.

Scrobble (WIP, name not final) Scrobble is a wip music tracking and browsing app. It uses the Lastf.fm API to realize music tracking and browsing and

Weatherapp: a simple weather forecast app, which uses some APIs to fetch 5 day / 3 hour forecast data from the OpenWeatherMap
Owner
Gavin Ray
Technical Evangelist @hasura
Gavin Ray
Collection of Kotlin APIs/tools to make using Realm Mobile database easier

Compass Kotlin API and tools to make working with Realm easier Components Compass is designed to make working with Realm easier through collection of

Arunkumar 16 Oct 4, 2022
Core Data for Android

NexusData Core Data for Android NexusData is an object graph and persistence framework for Android. It allows for organizing and managing relational d

Dia Kharrat 71 Nov 11, 2022
Implementation of MVVM , Live Data and Room DAO for a robust materialistic design

Someday App to manage Weekly tasks Because who needs to remind you every week to do Samething Preview Main Layout Light Dark Main Layout (Expanded) Li

Anshul Saraf 2 May 13, 2021
Spring-graphql-getting-started - Spring for GraphQL provides support for Spring applications built on GraphQL Java

Getting Started with GraphQL and Spring Boot Spring for GraphQL provides support

Shinya 0 Feb 2, 2022
Speech-Text Converter is a simple task that enable the user to convert the speech to text or convert text to speech (by Mic)

Speech-Text Converter About Speech-Text Converter is a simple task that enable the user to convert the speech to text or convert text to speech (by Mi

Kareem Saeed 1 Oct 21, 2021
GraphQL Jetpack - A collection of packages for easily writing Java GraphQL server implementations

GraphQL Jetpack A collection of packages for easily writing Java GraphQL server

Ryan Yang 18 Dec 2, 2022
Kotlin-GraphQL-Apollo - Sencillo cliente para consumir una API GraphQL con Apollo usando Kotlin

Kotlin GraphQL Apollo Sencillo cliente para consumir una API GraphQL con Apollo

José Luis González Sánchez 2 Jan 25, 2022
Victor Hugo 1 Feb 2, 2022
Apache Cordova Android

Cordova Android Cordova Android is an Android application library that allows for Cordova-based projects to be built for the Android Platform. Cordova

The Apache Software Foundation 3.4k Dec 27, 2022
KoltinPulsar - A collection of experiments using Kotlin with Apache Pulsar

Some Experiments of using Kotlin and Apache Pulsar This is a collection of exper

Wayne Ellis 0 Jan 11, 2022