Skip to content

Cask Data Application Platform v2.6.0

Compare
Choose a tag to compare
@jwang47 jwang47 released this 10 Jan 23:18
· 34139 commits to develop since this release

API Changes

  • API for specifying Services and MapReduce Jobs has been changed to use a "configurer"
    style; this will require modification of user classes implementing either MapReduce
    or Service as the interfaces have changed (CDAP-335).

New Features

General

  • Health checks are now available for CDAP system services
    (CDAP-663).

Applications

  • Jar deployment now uses a chunked request and writes to a local temp file
    (CDAP-91).

MapReduce

  • MapReduce jobs can now read binary stream data
    (CDAP-331).

Datasets

  • Added FileSet, a new core dataset type for working with sets of files
    (CDAP-1).

Spark

  • Spark programs now emit system and custom user metrics
    (CDAP-346).
  • Services can be called from Spark programs and its worker nodes
    (CDAP-348).
  • Spark programs can now read from Streams
    (CDAP-403).
  • Added Spark support to the CDAP CLI (Command-line Interface)
    (CDAP-425).
  • Improved speed of Spark unit tests
    (CDAP-600).
  • Spark Programs now display system metrics in the CDAP Console
    (CDAP-652).

Procedures

  • Procedures have been deprecated in favor of Services
    (CDAP-413).

Services

  • Added an HTTP endpoint that returns the endpoints a particular Service exposes
    (CDAP-412).
  • Added an HTTP endpoint that lists all Services
    (CDAP-469).
  • Default metrics for Services have been added to the CDAP Console
    (CDAP-512).
  • The annotations @QueryParam and @DefaultValue are now supported in custom Service handlers
    (CDAP-664).

Metrics

  • System and User Metrics now support gauge metrics
    (CDAP-484).
  • Metrics can be queried using a Program’s run-ID
    (CDAP-620).

Documentation

CDAP Bug Fixes

  • Fixed a problem with readless increments not being used when they were enabled in a Dataset
    (CDAP-383).
  • Fixed a problem with applications, whose Spark or Scala user classes were not extended
    from either JavaSparkProgram or ScalaSparkProgram, failing with a class loading error
    (CDAP-599).
  • Fixed a problem with the CDAP upgrade tool not preserving—for
    tables with readless increments enabled—the coprocessor configuration during an upgrade
    (CDAP-1044).
  • Fixed a problem with the readless increment implementation dropping increment cells when
    a region flush or compaction occurred (CDAP-1062).

Known Issues

  • When running secure Hadoop clusters, metrics and debug logs from MapReduce programs are
    not available CDAP-64 and CDAP-797.

  • When upgrading a cluster from an earlier version of CDAP, warning messages may appear in
    the master log indicating that in-transit (emitted, but not yet processed) metrics
    system messages could not be decoded (Failed to decode message to MetricsRecord). This
    is because of a change in the format of emitted metrics, and can result in a small
    amount of metrics data points being lost (CDAP-745).

  • Writing to datasets through Hive is not supported in CDH4.x
    (CDAP-988).

  • A race condition resulting in a deadlock can occur when a TwillRunnable container
    shutdowns while it still has Zookeeper events to process. This occasionally surfaces when
    running with OpenJDK or JDK7, though not with Oracle JDK6. It is caused by a change in the
    ThreadPoolExecutor implementation between Oracle JDK6 and OpenJDK/JDK7. Until Twill is
    updated in a future version of CDAP, a work-around is to kill the errant process. The Yarn
    command to list all running applications and their app-ids is

    yarn application -list -appStates RUNNING
    

    The command to kill a process is

    yarn application -kill <app-id>
    

    All versions of CDAP running Twill version 0.4.0 with this configuration can exhibit this
    problem (TWILL-110).