-
-
Notifications
You must be signed in to change notification settings - Fork 25
PipeOp Specifications
mb706 edited this page Aug 12, 2019
·
4 revisions
General rules:
- Inherit from
PipeOp
for general pipeops,PipeOpTaskPreproc
for preprocessing pipeops that have one task input, one task output, and fromPipeOpTaskPreprocSimple
for a subset of these that perform exactly the same operation during training and prediction. - Overwrite the
train_internal()
andpredict_internal()
functions when inheritingPipeOp
. Overwrite thetrain_task()
/train_dt()
andpredict_task()
/predict_dt()
as well as possiblyselect_cols()
(for..._dt()
) functions when inheritingPipeOpTaskPreproc
. Overwrite theget_state()
/get_state_dt()
,transform()
/transform_dt()
as well as possiblyselect_cols()
(for..._dt()
) functions when inheriting PipeOpTaskPreprocSimple. - Set the
$input
and$output
train
andpredict
columns to the acceptable types for these operations. Do not check input values for types that are already specified in the$input
and$output
tables. Ok:Bad (because the input typetrain_internal(inputs) { if (inputs$nrow < 1) stop("Input too small")
"Task"
is already checked by thetrain()
function):train_internal(inputs) { assert_task(inputs[[1]])
- Inputs in
train_internal()
/predict_internal()
are always given by-reference, so if any R6 objects are modified, they must be cloned before. This is not the case fortrain_task
,train_dt
, ... in PipeOpTaskPreproc[Simple]: The PipeOpTaskPreproc[Simple] takes care of cloning so Tasks/data.tables can be modified in-place. - PipeOpTaskPreproc[Simple]
$state
must always be a named list; The machinery in PipeOpTaskPreproc[Simple] adds a few slots:$affected_cols
,$intasklayout
,$outtasklayout
,$dt_columns
(only iftrain_task
/predict_task
/get_state
/transform
are not overwritten). Therefore, these names are "reserved" and should not be set by the class inheriting byPipeOpTaskPreproc[Simple]
. Even thoughPipeOp
$state
can be anything, it is recommended to also keep it a named list. - Every change done by the
$train()
method must be reflected by the$state
variable. I.e.must leavepo2 = po1$clone(deep = TRUE) po1$train(input) po2$state = po1$state po1 = po1$clone(deep = TRUE)
po1
andpo2
identical. (The lastclone
call is necessary to mirror effects done bypo2 = po1$clone()
) -
$predict()
must be idempotent, i.e.must leavepo2 = po1$clone(deep = TRUE) po1$predict(input1) po1$predict(input2) po2$predict(input3) po1 = po1$clone(deep = TRUE)
po1
andpo2
identical. (The lastclone
call for the same reason as above.)