Replies: 1 comment
-
I don't think thread pooling policies are batch loader specific really. It always depends on your load profile as you say. I think the style of your application architecture matters however. Will the calls during the batch load be reactive or imperative? eg will they block during batch loader call and hence use up the thread while it completes? If you are reactive then you should not have to worry about thread pools much as all. Use the "Schedulers" of you reactive framework (Spring Project Reactor / Quarkus Mutiny) and the event loop thread pools are set for you. Typically they end up with thread pools of If you are using an imperative architecture and the thread blocks to get data then your need more threads. The number will be driven by the amount of work that needs to get done. You haven't outlined how many concurrent loads will be going on. Each threads in the JVM costs something like 1MB of memory - so as you add more threads to the pool to get more concurrency you incur a fixed memory cost. If you add 1000s of threads it will hurt you in memory terms even if you get better concurrency. So unfortunately this typically means a testing mechanism to discover the right balance... eg adding more threads increase your concurrency but it also increases the memory usage and switching overhead costs. You have to discover via testing the point that gives the best return for the least resources JDK 21 just came out and nominally this allows you to use virtual threads which are significantly cheaper to create and hence nominally each batch load could be a virtual thread. You would NOT use thread pooling in JDK 21 - you would use the new https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newVirtualThreadPerTaskExecutor() |
Beta Was this translation helpful? Give feedback.
-
Are there suggestions for general best practices when configuring/choosing a thread pool(s) to be used by batch loaders?
I understand that thread pool selection and tuning is very much project specific, but knowing of any common gotchas or tips for common situations or common use cases would be very welcomed. In particular for scenarios where several batches (eg 100+ or 1000+) are expected to be run for a double data loader while running just as many for other data loaders that are running simultaneously.
Beta Was this translation helpful? Give feedback.
All reactions