Cask Data Application Platform v2.6.0
API Changes
- API for specifying Services and MapReduce Jobs has been changed to use a "configurer"
style; this will require modification of user classes implementing either MapReduce
or Service as the interfaces have changed (CDAP-335).
New Features
General
- Health checks are now available for CDAP system services
(CDAP-663).
Applications
- Jar deployment now uses a chunked request and writes to a local temp file
(CDAP-91).
MapReduce
- MapReduce jobs can now read binary stream data
(CDAP-331).
Datasets
Spark
- Spark programs now emit system and custom user metrics
(CDAP-346). - Services can be called from Spark programs and its worker nodes
(CDAP-348). - Spark programs can now read from Streams
(CDAP-403). - Added Spark support to the CDAP CLI (Command-line Interface)
(CDAP-425). - Improved speed of Spark unit tests
(CDAP-600). - Spark Programs now display system metrics in the CDAP Console
(CDAP-652).
Procedures
- Procedures have been deprecated in favor of Services
(CDAP-413).
Services
- Added an HTTP endpoint that returns the endpoints a particular Service exposes
(CDAP-412). - Added an HTTP endpoint that lists all Services
(CDAP-469). - Default metrics for Services have been added to the CDAP Console
(CDAP-512). - The annotations
@QueryParam
and@DefaultValue
are now supported in custom Service handlers
(CDAP-664).
Metrics
- System and User Metrics now support gauge metrics
(CDAP-484). - Metrics can be queried using a Program’s run-ID
(CDAP-620).
Documentation
- A Quick Start Guide has been added to the
CDAP Administration Manual
(CDAP-695).
CDAP Bug Fixes
- Fixed a problem with readless increments not being used when they were enabled in a Dataset
(CDAP-383). - Fixed a problem with applications, whose Spark or Scala user classes were not extended
from eitherJavaSparkProgram
orScalaSparkProgram
, failing with a class loading error
(CDAP-599). - Fixed a problem with the CDAP upgrade tool not preserving—for
tables with readless increments enabled—the coprocessor configuration during an upgrade
(CDAP-1044). - Fixed a problem with the readless increment implementation dropping increment cells when
a region flush or compaction occurred (CDAP-1062).
Known Issues
-
When running secure Hadoop clusters, metrics and debug logs from MapReduce programs are
not available CDAP-64 and CDAP-797. -
When upgrading a cluster from an earlier version of CDAP, warning messages may appear in
the master log indicating that in-transit (emitted, but not yet processed) metrics
system messages could not be decoded (Failed to decode message to MetricsRecord). This
is because of a change in the format of emitted metrics, and can result in a small
amount of metrics data points being lost (CDAP-745). -
Writing to datasets through Hive is not supported in CDH4.x
(CDAP-988). -
A race condition resulting in a deadlock can occur when a TwillRunnable container
shutdowns while it still has Zookeeper events to process. This occasionally surfaces when
running with OpenJDK or JDK7, though not with Oracle JDK6. It is caused by a change in the
ThreadPoolExecutor
implementation between Oracle JDK6 and OpenJDK/JDK7. Until Twill is
updated in a future version of CDAP, a work-around is to kill the errant process. The Yarn
command to list all running applications and theirapp-id
s isyarn application -list -appStates RUNNING
The command to kill a process is
yarn application -kill <app-id>
All versions of CDAP running Twill version 0.4.0 with this configuration can exhibit this
problem (TWILL-110).