There is an excellent discussion in apache#30 about the future of the project, and we encourage you to participate and add your feedback there if you are interested in using or contributing to Ballista.
The current focus is on the following items:
- Make production ready
- Shuffle file cleanup
- Periodically (#185)
- Add gRPC & REST interfaces for clients/UI to actively call the cleanup for a job or the whole system
- Fill functional gaps between DataFusion and Ballista
- Improve task scheduling and data exchange efficiency
- Better error handling
- Scheduler restart
- Improve monitoring, logging, and metrics
- Auto scaling support
- Better configuration management
- Support for multi-scheduler deployments. Initially for resiliency and fault tolerance but ultimately to support sharding for scalability and more efficient caching.
- Shuffle file cleanup
- Shuffle improvement
- Scheduler Improvements
- All-at-once job task scheduling
- Executor deployment grouping based on resource allocation
- Cloud Support
- Performance and scalability
- Python Support
- Support Python UDFs (#173)