Fair performance comparison with QuantLib #80

DmitriGoloubentsev · 2022-09-12T16:58:26Z

Hi guys,

In the "Monte Carlo via Euler Scheme" example you compare TF with QuantLib pricing and conclude that TF finance is x100 times faster(or more).

I want to note that in QL you evolve 100 time steps of Log Normal process, but in TF you work in log space and only apply exp() at the end.
I agree QL may not be very fast, but in this example you compare 100 exponents per path in QL to just 1 exponent in TF...

Thank you!

cyrilchim · 2022-09-12T17:38:28Z

Thanks for reaching out, Dmitri!

I think the point was to demonstrate GPU speed up rather than direct comparison to QL. We would very much welcome a contribution for a better CPU benchmark!

DmitriGoloubentsev · 2022-09-12T18:22:46Z

Sounds good! I'll come back to you later on CPU benchmark for this.

Also, you do not include graph optimization time into reporting.
// # Second run (excludes graph optimization time)

I know, it's not dependent on number of paths, but it's still part of total pricing time. And for QL CPU execution it's 0.

Shouldn't you report this separately?

DmitriGoloubentsev · 2022-09-12T18:41:00Z

On second thought, if you simulate 100 time steps and only apply 1 exp() at the end, you don't really do much of calculations per path.

So your problem is basically reduced to RNG algo competition.

You should somehow increase complexity in your SDE. Perhaps, use the Heston local vol model to make this benchmark more relevant to real world. With flat vols, flat rates and a simple normal process, I don't know how relevant this benchmark is for practitioners.

DmitriGoloubentsev · 2022-09-12T20:32:23Z

What random generator is used if "PSEUDO_ANTITHETIC" is set?
For QL you don't use antithetic. I suspect antithetic reduces number of required random numbers by 2... Am I correct in this?

SergK13GH · 2022-11-17T15:11:12Z

There is an additional question about memory consumption, especially when run with XLA optimization.
I hit with error message when run the example just with num_timesteps= 5000 without XLA:
2022-11-17 14:50:46.371456: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4000000000 exceeds 10% of free system memory.
Memory available 16G + 22G Swap.
And there is a kernel crash with XLA run with these parameters.
(ResourceExhaustedError: Out of memory while trying to allocate 72003200088 bytes. [Op:__inference_price_eu_options_1037])
Are there settings which controls limit for the memory allocation?

cyrilchim · 2022-12-05T15:42:53Z

To answer @DmitriGoloubentsev question, yes, antithetic sampling uses fewer samples indeed. I think we could measure time it takes to simulate random numbers and then subtract that from runtime. I think at the time of writing that colab I was mainly motivated by GPU speed up and not comparing CPU performance. The colab can be extended to sample from Heston model as well. (just need to update GenericItoProcess drift and volatility definitions).

As for graph compilation time, normally you would deploy a TensorFlow graph to avoid any compilation time overhead.

@SergK13GH , The samples are precomputed for vectorization purpose. You could switch to tff.math.random.RandomType.PSEUDO and set precompute_normal_draws=False in the sampler. We try to vecotrize computations where possible to ensure good GPU performance. One could ,of course, rewrite the whole thing using while loops but then you'd lose benefits of vectorization. As for memory controlling measures, I think you'd need to control it on your side.

DmitriGoloubentsev · 2022-12-05T21:44:57Z

As for graph compilation time, normally you would deploy a TensorFlow graph to avoid any compilation time overhead.

Sorry, can you please elaborate on what "deploy a TensorFlow graph" means?

Do you assume you can compile graph once and use it for all valuations in the future?

cyrilchim · 2022-12-06T13:23:12Z

Yes, you could build a graph in Python and save its' proto definition that you then can deploy using TensorFlow serving.
See, e.g., here. You'd need to wrap your function in a tf.Module like here

DmitriGoloubentsev · 2022-12-06T20:19:29Z

I can see how it may work for simple case (flat model parameters and the same number of time steps).

But am I right that in real problems you need to recompile graph everyday for all models and all trades?
I.e. as trades age, model parameter interpolations change, trades cash flows are paid, simulation time points move (they are usually defined w.r.t. current time), you need to redefine valuation graph and hence recompile it.

I think you can only reuse valuation graph on the same trading day and it's still a good idea to report how much time and memory needed for this step.

Simulating normal process using Euler scheme for 1000 time step is a very basic problem. What happens when you have 1000 IR swaps to price for xVA? Your graph is going to be huge and compilation time significant regardless if you use GPU or CPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fair performance comparison with QuantLib #80

Fair performance comparison with QuantLib #80

DmitriGoloubentsev commented Sep 12, 2022

cyrilchim commented Sep 12, 2022

DmitriGoloubentsev commented Sep 12, 2022 •

edited

Loading

DmitriGoloubentsev commented Sep 12, 2022 •

edited

Loading

DmitriGoloubentsev commented Sep 12, 2022

SergK13GH commented Nov 17, 2022 •

edited

Loading

cyrilchim commented Dec 5, 2022

DmitriGoloubentsev commented Dec 5, 2022

cyrilchim commented Dec 6, 2022

DmitriGoloubentsev commented Dec 6, 2022

Fair performance comparison with QuantLib #80

Fair performance comparison with QuantLib #80

Comments

DmitriGoloubentsev commented Sep 12, 2022

cyrilchim commented Sep 12, 2022

DmitriGoloubentsev commented Sep 12, 2022 • edited Loading

DmitriGoloubentsev commented Sep 12, 2022 • edited Loading

DmitriGoloubentsev commented Sep 12, 2022

SergK13GH commented Nov 17, 2022 • edited Loading

cyrilchim commented Dec 5, 2022

DmitriGoloubentsev commented Dec 5, 2022

cyrilchim commented Dec 6, 2022

DmitriGoloubentsev commented Dec 6, 2022

DmitriGoloubentsev commented Sep 12, 2022 •

edited

Loading

DmitriGoloubentsev commented Sep 12, 2022 •

edited

Loading

SergK13GH commented Nov 17, 2022 •

edited

Loading