GitHub - coady/graphique: GraphQL service for python dataframes and parquet datasets.

GraphQL service for ibis dataframes, arrow tables, and parquet datasets. The schema for a query API is derived automatically.

Roadmap

When this project started, there was no out-of-core execution engine with performance comparable to PyArrow. So it effectively included one, based on datasets and Acero.

Since then the ecosystem has grown considerably: DuckDB, DataFusion, and Ibis. The next major version plans to reuse ibis, because it provides a common expression API for multiple backends. Graphique can similarly offer a default but configurable backend.

Usage

There is an example app which reads a parquet dataset.

env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/ to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema

outputs the graphql schema.

Configuration

The example app uses Starlette's config: in environment variables or a .env file.

PARQUET_PATH: path to the parquet directory or file
FEDERATED = '': field name to extend type Query with a federated Table
METRICS = False: include timings from apollo tracing extension
COLUMNS = None: list of names, or mapping of aliases, of columns to select
FILTERS = None: json filter query for which rows to read at startup

Configuration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.

App

For more options create a custom ASGI app. Call graphique's GraphQL on an ibis Table or arrow Dataset. Supply a mapping of names to datasets for multiple roots, and to enable federation.

import ibis
from graphique import GraphQL

source = ibis.read_*(...)  # or ibis.connect(...).table(...) or pyarrow.dataset.dataset(...)
# apply initial projections or filters to `source`
app = GraphQL(source)  # Table is root query type
app = GraphQL.federated({<name>: source, ...}, keys={<name>: [], ...})  # Tables on federated fields

Start like any ASGI app.

uvicorn <module>:app

API

types

Dataset: interface for an ibis table or arrow dataset.
Table: implements the Dataset interface. Adds typed row, columns, and filter fields from introspecting the schema.
Column: interface for an ibis column. Each data type has a corresponding column implementation: Boolean, Int, BigInt, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, Array, Struct. All columns have a values field for their list of scalars. Additional fields vary by type.
Row: scalar fields. Tables are column-oriented, and graphique encourages that usage for performance. A single row field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.

selection

slice: contiguous selection of rows
filter: select rows by predicates
join: join tables by key columns
take: rows by index
dropNull: remove rows with nulls

projection

project: project columns with expressions
columns: provides a field for every Column in the schema
column: access a column of any type by name
row: provides a field for each scalar of a single row
cast: cast column types
fillNull: fill null values

aggregation

group: group by given columns, and aggregate the others
distinct: group with all columns
runs: provisionally group by adjacency
unnest: unnest an array column
count: number of rows

ordering

order: sort table by given columns
options limit and dense: select rows with smallest or largest values

Performance

Performance is dependent on the ibis backend, which defaults to duckdb. There are no internal Python loops. Scalars do not become Python types until serialized.

PyArrow is also used for partitioned dataset optimizations, and for any feature which ibis does not support. Table fields are lazily evaluated up until scalars are reached, and automatically cached as needed for multiple fields.

Installation

pip install graphique[server]

Dependencies

ibis-framework (with duckdb or other backend)
strawberry-graphql[asgi,cli]
pyarrow
isodate
uvicorn (or other ASGI server)

Tests

100% branch coverage.

pytest [--cov]

Name		Name	Last commit message	Last commit date
Latest commit History 690 Commits
.github		.github
docs		docs
graphique		graphique
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Roadmap

Usage

Configuration

App

API

types

selection

projection

aggregation

ordering

Performance

Installation

Dependencies

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

coady/graphique

Folders and files

Latest commit

History

Repository files navigation

Roadmap

Usage

Configuration

App

API

types

selection

projection

aggregation

ordering

Performance

Installation

Dependencies

Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages