Naive one-pass recursive descent, scannerless parser framework for Kotlin

Overview

ParserKt

ParserKt
versionkotlinexamplescomparison

Introduction

ParserKt is a naive one-pass recursive descent, scannerless parser framework for Kotlin (mainly JVM, compatible for JS)

A parser is a kind of program extracting data from input sequences:

NOTE: Using REPL from command line is not a good practice, you can create a project with dependency on JitPack or use IDEA Kotlin REPL instead.

git clone https://github.com/ParserKt/ParserKt.git && cd ParserKt
gradle shadowJar
kotlinc -cp build/libs/ParserKt-*.jar

Don't be scared! ParserKt is just a small library, even full-build won't take too much time. (or you can download pre-built jars here)

// Pattern.read(String), CharInput.STDIN, ...
// item(x), elementIn('a'..'z'), satisfy<Int> { it % 2 == 0 }, ...
// asList(), asString(), ...
import org.parserkt.*
import org.parserkt.util.*
import org.parserkt.pat.*

Import parserkt, util, pat, then we can implement our own combined pattern! (for a more in depth sample, see link at header)

val digits = Repeat(asList(), elementIn('0'..'9')) //{[0-9]}
digits.read("123") //[1, 2, 3]
digits.read("12a3") //[1, 2]
digits.read("a3") //null //(= notParsed)

A parser mainly recognize, and transform inputs into a simpler form (like AST) for post-processing (preetify, evaluate, type-check, ...), and it can be used to create (DSL) language tools / compilers / transpilers / interpreters.

// Generic parsing for [1, a, b] where (b > a)
val ints = Seq(::IntTuple, item(1), *Contextual(item<Int>()) { i -> satisfy<Int> { it > i } }.flatten().items() )
ints.rebuild(1,2,3) //[1, 2, 3]
ints.rebuild(1,2,1) //notParsed
i.rebuild(1,0,1) //[1, 0, 1]
// Using Convert patterns
fun digitItem(digit: SatisfyPattern<Char>) = Convert(digit, {it-'0'}, {'0'+it})
val digit = digitItem(elementIn('0'..'9'))
val digitsInt = Repeat(JoinFold(0) { this*10 + it }, digit)
digitsInt.read("233") //233

(click here if you want to try out more)

ParserKt divides complex grammar into simple subpart val definitions. "entry rule" could be all public structure — like item, elementIn, Repeat, and thus it's very easy to debug, and to reuse implemented syntax.

What means "one-pass"?

See FeedModel.kt:

interface Feed<out T> {
  val peek: T; fun consume(): T
  class End: NoSuchElementException("no more")
}

That's all one subparser can see, no mark/reset, not even one character supported for lookahead.

What means "recursive descent"?

Take a look of these expression: a + b, a + (b + c)

sealed class PlusAst {
  data class Plus(val left: PlusAst, val right: PlusAst): PlusAst()
  data class IntLiteral(val n: Int): PlusAst()
}

(sealed class are just classes with determinable subclass, in their inner name scope)

This data structure implies, every symbol of a + b could be an integer (like 0, 9, 16, ...IntLiteral(n)), or other a + b (1 + 2, 9 + 0, ...Plus(left, right))

PlusAst is recursive, so it's best to implement it's parser in recurse function form, that's "recursive".

Plus := (Plus '+' Plus) | Int  ; left associative
Int := {[0-9]}

({a} denotes repeat, a|b denotes alternative)

When reading input "1+2+3" by rule Plus, results Plus(Plus(Int(1), Int(2)), Int(3)).

Parser keep going to the state of reading substructure Int from reading Plus.

From rule Plus to its subrule Int, that's "descent".

——

ParserKt cannot make any lookaheads, so it's looks like impossible to parse syntax like //, /*, or 0x 0b (and error when 0(octal)).

In fact, "lookahead" can be stored in call stack of Pattern.read by pattern Contextual, so write parser for such syntax is possible (but much more complicated, so it's better to create new Pattern subclass, or fall back to tokenizer-parser LexerFeed, or use TriePattern)

What is the difference between "parser combinator" and "parser compiler"?

Let me clarify first: I really don't know what "combinator" is :(

I think combinators must be related to "functional programming", or at least "declarative programming" — Define what it is, not how computers actually solve it.

Parser combinator is mainly about code reuse, parser compiler is mainly about pattern matching algorithms.

Parser compilers and parser combinators solves the same problem in different ways. A parser compiler have better portability, and better performance (I'm not sure). A parser combinator integrates better with the host language, and it's really easy to write/debug, since it's just hand-wrote program.

For example, keywords about parser compilers: LL(k), LR(k), LALR, NFA, DFA, KMP, Aho–Corasick

😨 Scared? Recursive descent method are the most popular practice of handwriting parser used by many well-known projects like Lua, and it's also the best choice for code quality — source files generated by parser compilers offen a mess!

Features

  • Minimal boilerplate code & high reusability
  • DSL style & fully object-oriented library design
  • Rebuild parse result back to input (REPL friendly)
  • Generic parsing input (Char, String, XXXToken, ...)
  • One-pass input stream design without hasNext
  • ReaderFeed for implementating interactive shells (JVM only)
  • The framework manages parsed data in a flexible and extensible way (Tuple2, Sized-Tuple, Fold)
  • Supports complex contextual syntax structure that cannot be supported directly in parser compilers (LayoutPattern, NumUnitPattern, ...)
  • Extensible error messages using clam {"message"}, clamWhile(pat, defaultValue) {"message"}
  • Parser input stream Feed can have state storage: withState(T), stateAs<T>()
  • <500K compiled JVM library, no extra runtime needed (except kotlin-stdlib, Kotlin sequences are optional)
  • No magics, all code has been rewritten at least 9 times by the author, ugly/ambiguous/disordered codes are removed

Great thanks to Kotlin, for its strong expressibility and amazing type inference.

Provided combinators

Modules

  • :parserkt for Feed/Input model and basic (org.parserkt.pat) / complex (org.parserkt.pat.complex) patterns
  • :parserkt-util Fundamental data types (Slice, Tuple, Trie, ...) and helper class (Preety, AnyBy, ...) used/prepared for ParserKt and friends
  • :parserkt-ext Extension library for real-world parsers, eliminating boilerplate code (LayoutPattern, LexicalBasics, ...)

abbreviations

  • POPCorn (Pattern, OptionalPattern, PatternWrapper, ConstantPattern)
  • SURDIES (Seq, Until, Repeat, Decide, item, elementIn, satisfy)
  • CCDP (Convert, Contextual, Deferred, Piped)
  • SJIT (SurroundBy, JoinBy, InfixPattern, TriePattern)

More runnable REPL snippets

Note that for frequently-used pattern combinations, we have org.parserkt.pat.ext:

// Using pat.ext and LexicalBasics
import org.parserkt.pat.ext.*
val digitsInt = Repeat(asInt(), LexicalBasics.digitFor('0'..'9'))
digitsInt.read("180") //180
// Implementing complex pattern
import org.parserkt.pat.complex.*
// Patterns with constant values are OK to perform rebuild, even ignored in parse result
val letter = elementIn('A'..'Z', 'a'..'z', '0'..'9') or item('_')
val sep = elementIn(',',':').toConstant(',') named "sep"
val xsv = JoinBy(sep, Repeat(asString(), letter)) // try replace letter using !sep
xsv.read("monkey,banana,melon") //Tuple2(first=[monkey, banana, melon], second=[,, ,])

xsv.concatCharJoin().read("猴:雀:瓜") //Tuple2(first=[猴, 雀, 瓜], second=::)
val goodXsv = xsv.mergeConstantJoin()
goodXsv.read("she,her,herself") //[she, her, herself]
goodXsv.rebuild("she,her,herself") //she,her,herself

ParserKt provides mergeFirst/Second, discardFirst/Second, flatten for Tuple2 pattern converting 😘

import org.parserkt.*
import org.parserkt.util.*
import org.parserkt.pat.*
import org.parserkt.pat.complex.*

val dict = TriePattern<Char, String>().apply {
  mergeStrings("hello" to "こんにちは")
  mergeStrings("world" to "世界")
}
val noun = Repeat(asList(), dict)
noun.read("helloworld") //[こんにちは, 世界]

val pharse = JoinBy(Decide(elementIn('0'..'9'), elementIn(' ', '\t', '\n', '\r')), dict)
pharse.read("hello6world hello") //Tuple2(first=[こんにちは, 世界, こんにちは], second=[Tuple2(first=0, second=6), Tuple2(first=1, second= )])

It's OK (and also easy) to read recursive structure, just defined lateinit var and use Deferred{name} to reference:

// Back converts (third argument for Convert) are optional
sealed class Sexp { data class Term(val name: String): Sexp(); data class Nest(val list: List<Sexp>): Sexp() }
lateinit var sexp: Pattern<Char, Sexp>
val str = Repeat(asString(), !elementIn(' ', '(', ')'))
val atom = Convert(str, { Sexp.Term(it) }, { it.name })

val nestItems = SurroundBy(parens.toCharPat(), JoinBy(item(' '), Deferred{sexp}).mergeConstantJoin())
val nest = Convert(nestItems, { Sexp.Nest(it) }, { it.list })
sexp = Decide(nest, atom).mergeFirst { if (it is Sexp.Nest) 0 else 1 }

sexp.read("((monkey banana) (deep parser))") //Nest(list=[Nest(list=[Term(name=monkey), Term(name=banana)]), Nest(list=[Term(name=deep), Term(name=parser)])])
sexp.rebuild("((monkey banana) (deep parser))")  //((monkey banana) (deep parser))

NOTE: when using pattern Until, think if it can be expressed by Repeat(..., !SatisfPattern) first

References

References on Historic

(script for GitHub pages) <script src="https://parserkt.github.io/scripts/github_corner.js"></script>

You might also like...
Cross-platform framework for building truly native mobile apps with Java or Kotlin. Write Once Run Anywhere support for iOS, Android, Desktop & Web.
Cross-platform framework for building truly native mobile apps with Java or Kotlin. Write Once Run Anywhere support for iOS, Android, Desktop & Web.

Codename One - Cross Platform Native Apps with Java or Kotlin Codename One is a mobile first cross platform environment for Java and Kotlin developers

Android MVVM framework write in kotlin, develop Android has never been so fun.

KBinding 中文版 Android MVVM framework write in kotlin, base on anko, simple but powerful. It depends on my another project AutoAdapter(A library for sim

A framework for writing composable parsers based on Kotlin Coroutines.

Parsus A framework for writing composable parsers based on Kotlin Coroutines. val booleanGrammar = object : GrammarBooleanExpression() { val ws

Kotlin Multiplatform lifecycle-aware business logic components (aka BLoCs) with routing functionality and pluggable UI (Jetpack Compose, SwiftUI, JS React, etc.), inspired by Badoos RIBs fork of the Uber RIBs framework
Kotlin Multiplatform lifecycle-aware business logic components (aka BLoCs) with routing functionality and pluggable UI (Jetpack Compose, SwiftUI, JS React, etc.), inspired by Badoos RIBs fork of the Uber RIBs framework

Decompose Please see the project website for documentation and APIs. Decompose is a Kotlin Multiplatform library for breaking down your code into life

Kotlin Multiplatform Mobile + Mobile Declarative UI Framework (Jetpack Compose and SwiftUI)

Kotlin Multiplatform Mobile + Mobile Declarative UI Framework (Jetpack Compose and SwiftUI)

Framework for quickly creating connected applications in Kotlin with minimal effort
Framework for quickly creating connected applications in Kotlin with minimal effort

Ktor is an asynchronous framework for creating microservices, web applications and more. Written in Kotlin from the ground up. import io.ktor.server.n

A modular object storage framework for Kotlin multiplatform projects.

ObjectStore A modular object storage framework for Kotlin multiplatform projects. Usage ObjectStore provides a simple key/value storage interface whic

This is powerful android framework
This is powerful android framework

FlairFramework This is an android framework for build complex application with different architectures (MVC ready/MVP/MVVM/MVI ets). It's create on to

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Comments
  • SourceLocation.toPreetyDoc acts wrong & add checks for tupleOf

    SourceLocation.toPreetyDoc acts wrong & add checks for tupleOf

    >>> elementIn('a'..'c').clam().read(input)
    res111: kotlin.Char? = null
    >>> es
    res112: kotlin.collections.List<org.parserkt.pat.LocatedError /* = kotlin.Pair<org.parserkt.SourceLocation, org.parserkt.ErrorMessage /* = kotlin.String */> */> = [(<string>10#0, expecting [a-c])]
    
    >>> tupleOf(::Point, 3, 4, 5)
    java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
    	at org.parserkt.util.Tuple.set(ArrangeModel.kt:48)
    
    
    bug enhancement 
    opened by duangsuse 4
  • Celebrate for publishing on OSSRH & move Feed/Input into parserkt-util

    Celebrate for publishing on OSSRH & move Feed/Input into parserkt-util

    here

    :thinking: expected to create a new library with better performance & less code (strip all code unrelated to parsing), Feed and Input should be reused extending from parserkt-util

    and parserkt-util can be renamed into parserkt-base after that

    opened by duangsuse 0
Releases(2.6)
Owner
Naive one-pass recursive descent parser framework for Kotlin
null
A TOML 1.0 parser library for Kotlin

4koma A small, stand-alone, easy to use TOML parser library for Kotlin. 4koma supports an array of convenient features, such as full TOML 1.0 complian

Anton Ekblad 53 Dec 12, 2022
Create kotlin android project with one line of command.

README This is an android application template project built with kotlin language and some useful libraries. It provides a creator script to quickly c

nekocode 1.6k Dec 20, 2022
app conversor de moedas/cambio com Kotlin, no Bootcamp Carrefour Android Developer na plataforma da Digital Innovation One

Status do Projeto: ✔️ concluído a proposta de criação um app conversor de moedas/cambio com Kotlin, no Bootcamp Carrefour Android Developer na plataforma da Digital Innovation One

Luiz Correa 5 Dec 23, 2022
One-stop-shop for Social Network integrations

Roguin One stop shop for Social Network integrations Use the same code for Google, Facebook and Twitter What is Roguin Social Network integrations can

Fanis Veizis 17 Aug 22, 2022
Clean MVVM with eliminating the usage of context from view models by introducing hilt for DI and sealed classes for displaying Errors in views using shared flows (one time event), and Stateflow for data

Clean ViewModel with Sealed Classes Following are the purposes of this repo Showing how you can remove the need of context in ViewModels. I. By using

Kashif Mehmood 22 Oct 26, 2022
Main goal of this project is to find the best route from one country to another

Route-service Main goal of this project is to find the best route from one country to another. Data is presented as json format. I've implemented A* p

Teyyihan Aksu 4 Aug 2, 2022
WorkManager ,One time,Sequential Execution, Periodic time Execution

WokManagerSample WorkManager ,One time,Sequential Execution, Periodic time Execu

Chhote Lal Pal 0 Dec 21, 2021
A pair of applications provide a direct means of integrating with one another via application programming interfaces (APIs)

What is a native integration? It's when a pair of applications provide a direct means of integrating with one another via application programming interfaces (APIs). Once integrated, data can flow between the apps and become more readily available to your employees.

Behruz Hurramov 2 Jan 17, 2022
Mars is an all-in-one plugin for PGM servers

Mars Mars is an all-in-one plugin for PGM servers. Using Mars Contributing to Mars Why? Mars was built to handle everything that PGM isn't supposed to

Warzone 6 Feb 13, 2022
A native android app that shows how much calories one must consume based on their profile

Healtify is a native android app which allows the user to track the amout of Calories they are consuming. It not only tracks the calories but also shows how much of fat, protein and carbs they have consumed and how much they should be doing.

Anindya Ray 9 Aug 20, 2022