Big Data testing #309

zachmayer · 2024-08-10T19:23:44Z

This is a placeholder for Someone (myself or a volunteer) to do small project to test and improve caretEnsemble with large datasets.

Functions to test:

caretList
caretStack
caretEnsembles

At least 3 test cases:

Tall data: 1,000,000+ rows
Wide data: 10,000+ columns
Many models:caretList of 1,000+ models
Others optional

These tests should be run via a script stored somewhere in this repo, and the data should be added via github lfs. The test results should be analyzed to identify bottlenecks in:

RAM
run time

Based on those results, we may do things like e.g. replace do.call, or use data.table in more places, or trim more data out of the model object, but it is premature to decide what to do until we've done some analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big Data testing #309

Big Data testing #309

zachmayer commented Aug 10, 2024

Big Data testing #309

Big Data testing #309

Comments

zachmayer commented Aug 10, 2024