Features¶
For Infrastructure/Ops¶
De-centralized worker deployment¶
Scripts are deployed from the web UI
Versioned, can co-exist in many versions at once
If you can publish to PyPi, it can be deployed
Real-time monitoring¶
Jobs send their status (success / fail) and logs in real time
Notifications can be configured per job to be triggered in case of failure or missed mark (should be running but is not)
Centralized secrets management¶
Scripts pull their configuration from the “vault”
Secrets are stored in a central place
Secrets are never persisted on disk
ACL apply to secret access
Multiple authentication backends (local, ldap, …)¶
Authentication
LDAP
Okta
Local accounts
Authorization
Groups
Roles
For Developers¶
Workers as simple python scripts¶
Scripts are simple Python scripts
You can use whatever library you want, as long as it can be pip-installed
No hard requirement on using Kirby at all
Configuration is passed through env variables
File-based local unit-testing¶
Kirby is meant to be testable
All input / output in Kirby is a message
Messages are simulated using folders and files
Comes with integrated testing helpers
Simple integration to web frameworks¶
Designed to allow swap-in replacement of Celery
ex: produce data in an ETL script, use it straight in your app, or the other way around
Chord / group logic is replaced by stream piping
Higher throughput and parallelism
Simpler status management and failure recovery
Not limited to running jobs, integrations can also serve as a view on all Kirby-managed data
Can be used in templates, views, etc.
For Data Engineers¶
Scheduled tasks with exceptions¶
Schedule can be configured with cron-like syntax
Add exceptions to pause or slow down a job from running (when source is broken for instance)
Automatically back to normal schedule at the end of the exception, no need to undo them
Rewind/replay of tasks¶
Rewind a data source and all downstream processing will flow naturally (but will flag the data as the result of a re-run in case it matters)
Replay production data in preprod or even on a developer laptop → testing with prod data = easier development / debugging
Data traceability with data dependency graph documentation¶
Each process adds itself in the message headers, to allow end-to-end data traceability
Visualize data flows in the web UI to understand
where your data comes from
when it was processed
what transformations were applied
Computed tables¶
Kirby State objects allows you to keep a persistent object that can be updated by scripts
By default, a state only shows its latest version, but each mutation is versioned and can be recovered (only limited by disk space)
Useful for “hot” data with expensive computation costs