-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrapping Final Model #716
base: main
Are you sure you want to change the base?
Conversation
@S-Karnik please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
@microsoft-github-policy-service agree company="Microsoft Research" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the progress here is great. However, there are a number of small items to address.
econml/_ortho_learner.py
Outdated
return arg | ||
if arg is None: | ||
return None | ||
if isinstance(arg, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why might we need to handle tuples instead of just arrays? Should the same change be made to _bootstrap.py?
More generally, it would be good to have a single implementation used in both _bootstrap.py and here rather than defining one (very similar) implementation in each; I think _bootstrap.py is the most appropriate home for such a function, but it could also go into our utilities if it seems generally useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the handling of tuples is that the nuisances are passed in as tuples, which means that the individual nuisances in the tuple need to also be indexed.
econml/inference/_bootstrap.py
Outdated
""" | ||
For DynamicDML only | ||
Take n_bootstrap sets of samples of length n_panels among arange(n_panels) and then each sample corresponds with the chunk | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a comment rather than a docstring, and put it with the corresponding logic.
econml/tests/test_bootstrap.py
Outdated
elapsed_time = end_time - start_time | ||
final_bootstrapping_time = elapsed_time | ||
|
||
assert (all_bootstrapping_time > final_bootstrapping_time) or np.isclose(all_bootstrapping_time, final_bootstrapping_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For numeric results, we might tolerate near-equality rather than forcing a strict comparison, but here we really do expect the times to be significantly faster.
assert (all_bootstrapping_time > final_bootstrapping_time) or np.isclose(all_bootstrapping_time, final_bootstrapping_time) | |
assert (all_bootstrapping_time > final_bootstrapping_time) |
econml/inference/_bootstrap.py
Outdated
def convertArg(arg, inds): | ||
def convertArg_(arg, inds): | ||
arr = np.asarray(arg) | ||
if arr.ndim > 0: | ||
return arr[inds] | ||
else: # arg was a scalar, so we shouldn't have converted it | ||
return arg | ||
if arg is None: | ||
return None | ||
arr = np.asarray(arg) | ||
if arr.ndim > 0: | ||
return arr[inds] | ||
else: # arg was a scalar, so we shouldn't have converted it | ||
return arg | ||
|
||
self._instances = Parallel(n_jobs=self._n_jobs, prefer='threads', verbose=self._verbose)( | ||
delayed(fit)(obj, | ||
*[convertArg(arg, inds) for arg in args], | ||
**{arg: convertArg(named_args[arg], inds) for arg in named_args}) | ||
for obj, inds in zip(self._instances, indices) | ||
) | ||
if isinstance(arg, tuple): | ||
converted_arg = [] | ||
for arg_param in arg: | ||
converted_arg.append(convertArg_(arg_param, inds)) | ||
return tuple(converted_arg) | ||
return convertArg_(arg, inds) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably worth reworking the API here a bit to pull this and some of the neighboring code out to a a new top-level function in this module (say, fit_with_subsets(clones, strata, *args, **kwargs)
) that calls fit on each clone in parallel with the correctly modified arguments. This could then be used both here as well as directly within OrthoLearner for the final_only case.
for i in range(self._n_bootstrap_samples): | ||
if self._only_final: | ||
obj._set_current_cloned_ortho_learner_model_final(i) | ||
else: | ||
obj = self._instances[i] | ||
instance_results.append(f(obj, name)) | ||
instance_results = np.array(instance_results) | ||
results = instance_results, f(self._wrapped, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to continue to use parallelism here if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our discussion, it seems like keeping this in _bootstrap.py
and using parallel processes or threads will not be possible. In the future, we would like to make this call in _ortho_learner.py
, just as we do with fit.
@microsoft-github-policy-service agree company="Microsoft Research" |
This PR will add the capability to bootstrap the final model only for estimators with final models.