Skip to content

CDAP 6.1.2

Compare
Choose a tag to compare
@ajainarayanan ajainarayanan released this 14 Apr 18:23
· 466 commits to release/6.1 since this release
e9bf1f6

Summary

This release primarily focuses on bugfixes and performance improvements. Some of the highlights include,

  1. Performance improvements

    • Improve preview performance & limits concurrent preview runs to 10 by default
    • Shift in polling logic to UI to avoid polling leaks in Nodejs server
    • Batch API usage in UI to reduce the load on backend services
  2. Pipeline and Plugin fixes

    • Support Field Level Lineage for Streaming pipelines
    • Improve Field Level Lineage computation algorithm
    • Added support for Spark 2.4
    • Improve memory consumption during pipeline execution

New Features

  • Added the ability for SparkCompute and SparkSink to record field lineage. (CDAP-15579)
  • Added support for Spark 2.4. (CDAP-16107)
  • Added the ability to record field lineage for streaming pipelines. (CDAP-13643)

Bug Fixes

  • Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files.(CDAP-16002)
  • Fixed the BigQuery sink to properly allow certain types as clustering fields.(CDAP-16526)
  • Fixed a bug that would cause zombie processes when using the Remote Hadoop Provisioner.(CDAP-16471)
  • Fixed a bug that getSchema is not working for database plugins.(CDAP-16472)
  • Fixed a bug that made DBSource plugin fail in preview mode.(CDAP-16453)
  • Fixed a race condition bug that can cause failure when running Spark program.(CDAP-16309)

Improvements

  • Added an option to skip header in the files in delimited, csv, tsv and text formats.(CDAP-16517)
  • Added an option for database source to replace the characters in the field names.(CDAP-16525)
  • Reduce preview startup by by 60%. Also adds limit to max concurrent preview runs (10 by default).(CDAP-16308)
  • Reduce memory footprint for StructureRecord which improves overall memory consumption for pipeline execution.(CDAP-16509)
  • Introduced a new REST endpoint for fetching scheduled time for multiple programs.(CDAP-16339)