CDAP 6.1.2
ajainarayanan
released this
14 Apr 18:23
·
466 commits
to release/6.1
since this release
Summary
This release primarily focuses on bugfixes and performance improvements. Some of the highlights include,
-
Performance improvements
- Improve preview performance & limits concurrent preview runs to 10 by default
- Shift in polling logic to UI to avoid polling leaks in Nodejs server
- Batch API usage in UI to reduce the load on backend services
-
Pipeline and Plugin fixes
- Support Field Level Lineage for Streaming pipelines
- Improve Field Level Lineage computation algorithm
- Added support for Spark 2.4
- Improve memory consumption during pipeline execution
New Features
- Added the ability for SparkCompute and SparkSink to record field lineage. (CDAP-15579)
- Added support for Spark 2.4. (CDAP-16107)
- Added the ability to record field lineage for streaming pipelines. (CDAP-13643)
Bug Fixes
- Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files.(CDAP-16002)
- Fixed the BigQuery sink to properly allow certain types as clustering fields.(CDAP-16526)
- Fixed a bug that would cause zombie processes when using the Remote Hadoop Provisioner.(CDAP-16471)
- Fixed a bug that getSchema is not working for database plugins.(CDAP-16472)
- Fixed a bug that made DBSource plugin fail in preview mode.(CDAP-16453)
- Fixed a race condition bug that can cause failure when running Spark program.(CDAP-16309)
Improvements
- Added an option to skip header in the files in delimited, csv, tsv and text formats.(CDAP-16517)
- Added an option for database source to replace the characters in the field names.(CDAP-16525)
- Reduce preview startup by by 60%. Also adds limit to max concurrent preview runs (10 by default).(CDAP-16308)
- Reduce memory footprint for StructureRecord which improves overall memory consumption for pipeline execution.(CDAP-16509)
- Introduced a new REST endpoint for fetching scheduled time for multiple programs.(CDAP-16339)