xls2json
Read in Excel file (.xls, .xlsx, .xlsm) and output JSON. Evaluates formulas where possible. Preserve type information from Excel via JSON types.
❯ xls2json --help
Open an xls(x|m) file and transform to json.
Usage: xls2json [-hlmsvV] [-D=<dtfmt>] [-T=<tfmt>] [-t=<tables>]... [<files>...]
If no `--table`s are provided, then all
tables will be extracted.
All files will be processed one by one,
each outputting on line of json.
[<files>...] xls(x|m)-file(s) to transform
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
-m, --memory Show memory usage information.
-v, --verbose Show more information.
--pretty Pretty print the JSON.
-l, --list-tables List all tables.
-t, --table=<tables> Specify the tables to transform
-p, --password=<password> Password for opening the input file(s).
-s, --strip Strip empty columns and empty rows.
-D, --datetime-format=<dtfmt>
The datetime format.
[default: 'yyyy-MM-dd'T'HH:mm:ss.SSS']
-T, --time-format=<tfmt> The time format.
[default: 'HH:mm:ss.SSS']
By Tammo Ippen <[email protected]>
Issues: https://github.com/tammoippen/xls2json/issues
Usage
# read a XLS file
❯ ./dist/xls2json-amd64 src/test/resources/empty.xls
{"Sheet1":[]}
# read a XLS file with content
❯ xls2json src/test/resources/sample.xls
{"Sheet1":[["empty",null],["String","hello",null],["StringNumber","14.8",null],["Int",1234,null,null,null,null],["bool",true,null,null,null,null],["bool",false,null,null,null,null],["float",23.12345,null,null,null,null],["datetime","2021-05-18T21:19:53.040",null,null,null,null],["time","21:19:32.000",null],["formulars",null],["string","hello",null],["float",-0.34678748622465627,null],["int",5,null],["datetime","2021-06-03T16:45:56.709",null],["time","13:37:00.000",null],[],[],[],[null,null,null],[null,null,null],[null,null,null]]}
# strip empty cells from from the end of columns and empty rows from the bottom
❯ xls2json -s src/test/resources/sample.xls
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:47:22.789"],["time","13:37:00.000"]]}
# pretty print the output
❯ xls2json -s --pretty src/test/resources/empty.xls
{
"Sheet1": []
}
# use jq (https://stedolan.github.io/jq/)
# or gojq (https://github.com/itchyny/gojq)
# to get some nice output and / or to process the json
❯ xls2json -s src/test/resources/sample.xls | jq
{
# we have a dict sheetname -> list of rows
"Sheet1": [
# each row is a list of cell-values
["empty"],
["String", "hello"],
["StringNumber", "14.8"],
["Int", 1234],
["bool", true],
["bool", false],
["float", 23.12345],
# format for datetime and time can be specified using
# the -D and -T options. The format-string documentation:
# https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/format/DateTimeFormatter.html
# Please be aware, that Excel dates have no timezone attached,
# so if you provide a format with timezone, your locally configured
# timezone will be used.
["datetime", "2021-05-18T21:19:53.040"],
["time", "21:19:32.000"],
# formulars are evaluated
["formulars"],
["string", "hello"],
["float", -0.34678748622465627],
["int", 5],
["datetime", "2021-06-03T16:48:26.710"],
["time", "13:37:00.000"]
]
}
# (format is not true to the jq-output for brevity)
# XLSX works the same
❯ xls2json -s src/test/resources/sample.xlsx
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:59:06.466"],["time","13:37:00.000"]]}
# works on multiple files as well
❯ xls2json -s src/test/resources/sample.xls*
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:59:06.466"],["time","13:37:00.000"]]}
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:47:22.789"],["time","13:37:00.000"]]}
# list the available tables
❯ xls2json -l src/test/resources/sampleTwoSheets.xls
["Sheet1","Sheet2"]
# only output some table(s)
❯ xls2json -t Sheet2 -s src/test/resources/sampleTwoSheets.xls | jq
{
"Sheet2": [
["empty"],
["String", "hello"],
["StringNumber", "14.8"],
["Int", 1234],
["bool", true],
["bool", false],
["float", 23.12345],
["datetime", "2021-05-18T21:19:53.040"],
["time", "21:19:32.000"],
["formulars"],
["string", "hello"],
["float", -0.34678748622465627],
["int", 5],
["datetime", "2021-06-03T16:48:26.710"],
["time", "13:37:00.000"]
]
}
# if you are running the native-image executable, you can configure the
# garbage collector with the usual options
❯ xls2json -Xmx1k --help
Exception in thread "main" java.lang.OutOfMemoryError: Garbage-collected heap size exceeded.
# 1kb is a bit low
# if you are running with the installable distribution, you can set
# the environment variable XLS2JSON_OPTS
❯ XLS2JSON_OPTS="-Xmx1k" xls2json --help
Error occurred during initialization of VM
Too small maximum heap
# still 1kb is too small :D
Output JsonSchema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"minProperties": 1,
"patternProperties": {
".*": {
"type": "array",
"items": {
"type": "array",
"items": {
"type": ["number", "string", "integer", "boolean", "null"]
}
}
}
}
}
Installation
The Releases contain various formats for installation:
-
An executable fat jar (build with the shadowJar plugin). Put the jar wherever you like and call it like:
❯ java -jar path/to/xls2json-1.0.0-all.jar --help
-
An installable distribution as .zip / .tar.gz. Extract it, put it wherever you like and update your
PATH
, e.g.:❯ tar xf xls2json-1.0.0.tar ❯ mv xls2json-1.0.0 "$HOME/.local/share" ❯ export PATH="$PATH:$HOME/.local/share/xls2json-1.0.0/bin" ❯ xls2json --help
-
Native executable build with native-image from GraalVM using the native-image plugin from mike_neck. Put the executable for your operating system into your path and call it like:
xls2json --help
.No JAVA installation needed.
Another Excel Reader?
There are already some great Excel readers out there, but most do not satisfy my requirements:
- commandline program with (in the best case) no external dependencies
- XLS (2003 and earlier) and XLSX (2007 and later) support
- evaluate formulas within the spreadsheet
- keep type information from Excel
- open source
- (fast and multi-platform)
The existing programs / libraries I found and why I think they do not satisfy my needs:
- pyexcel-cli: can do much more in and output formats, but lacks formula evaluation. Same for the corresponding pyexcel library.
- unoconv and libreoffice: can do much more in and output formats, can do formula evaluation, but requires a LibreOffice installation.
- ssconvert: requires a gnumeric installation, I am not sure about the capabilities.
- Spreadsheet::XLSX: no cli afaik, xlsx only. Also perl is not a language I am eager to learn.
- PhpSpreadsheet: no cli afaik, but from the docs looks as if you can evaluate formulas. But again, PHP is not a language I am eager to learn.
I am already using Apache POI for quite some time and found it quite complete for my needs. We still have the Java VM dependency, but with GraalVM native-image we can build self-contained executables
Standing on the shoulders of giants
- Apache POI: the Java API for Microsoft Documents
- Picocli: a mighty tiny command line interface
- gson: A Java serialization/deserialization library to convert Java Objects into JSON and back
- Gradle Build Tool
- kotlin: A modern programming language that makes developers happier.
- GraalVM native-image: ahead-of-time compile Java code to a standalone executable
Some linkes I found invaluable for this project:
Building and Development
Prerequisits:
- Make sure you have a Java (11+) SDK in your path.
- If you want to build the native binary, use the GraalVM as your Java SDK and install native-image:
gu install native-image
Building:
❯ git clone https://github.com/tammoippen/xls2json.git
❯ cd xls2json
# build all java related targets
# => build/distributions/xls2json-{version}.[tar|zip]
❯ ./gradlew build
# build the native executable for your system
# => build/executable/xls2json
❯ ./gradlew nativeImage
# build the fat jar only (already in build)
# => build/libs/xls2json-{version}-all.jar
❯ ./gradlew shadowJar
# tests and run
❯ ./gradlew test
❯ ./gradlew run --args="--help"
Issues with the native-image executables
Apache POI uses many resources and reflections when working with the excel files and the native-image executable needs to be configured during building to include those resources and reflections.
When issues arise reading an Excel file, please try to use the fat-jar or the installable distribution and run the Excel file with that. If it works, please open an issue with the failing Excel file (or a minimal reproducing Excel file). I will generate the configuration then. If it still does not work, please again open an issue including the Excel file, stack trace, xls2json and java version.
If you want to generate the configuration, consider running:
❯ ./gradlew shadowJar
❯ $(JAVA_HOME)/bin/java -agentlib:native-image-agent=config-merge-dir=native-image-config \
-jar build/libs/xls2json-1.0.0-all.jar \
the-problematic-excel-file.xls(x)