- Joined
- 12/6/18
- Messages
- 26
- Points
- 13
Indeed, there is a lot of confusion around AD, AAD, automatic/algorithmic/adjoint forward/reverse diff, backprop, and so forth. and all the names don't help. You may find Leif Andersen's preface to my book (Modern Computational Finance: AAD and Parallel Simulations (Table of Contents and Preface) by Antoine Savine, Leif Andersen :: SSRN) entertaining and informative on that point.
It doesn't change the fact that it is only adjoint differentiation (also called reverse mode, backprop and probably many other names) that makes a tremendous difference, not only in finance, but also deep learning, meteorology, and probably many other fields with a magical, impossibly fast computation of thousands of differentials.
Yes, the entire computation graph must be in memory, but this is not as bad as it sounds, and it is easily mitigated by checkpointing. With Monte-Carlo risks, the memory load is typically under 100MB, the differentials being computed pathwise and the tape being wiped between paths.
On the contrary, we have experimented with many kinds of AD for years and found forward-mode to be absolutely useless. Maybe it is an elegant construction, but it is even slower than bumping, harder to implement and debug and more prone to error.
Maybe my 15 min movie above can help navigate and address the main ideas and programming considerations? Note that I don't even cover forward-mode, considering it a waste of time and attention and an unnecessary potential additional confusion.
My experience with Boost is very different from yours. I engaged discussions to develop Boost.AAD with them (with proper reverse-mode diff, in a general and efficient implementation, platform independent and header only in good Boost order) and I never had a reply.
I suppose I contacted the wrong people. Perhaps you would be so kind as introduce me to the right people to discuss these things?
Many thanks in advance,
Kind regards,
Antoine
It doesn't change the fact that it is only adjoint differentiation (also called reverse mode, backprop and probably many other names) that makes a tremendous difference, not only in finance, but also deep learning, meteorology, and probably many other fields with a magical, impossibly fast computation of thousands of differentials.
Yes, the entire computation graph must be in memory, but this is not as bad as it sounds, and it is easily mitigated by checkpointing. With Monte-Carlo risks, the memory load is typically under 100MB, the differentials being computed pathwise and the tape being wiped between paths.
On the contrary, we have experimented with many kinds of AD for years and found forward-mode to be absolutely useless. Maybe it is an elegant construction, but it is even slower than bumping, harder to implement and debug and more prone to error.
Maybe my 15 min movie above can help navigate and address the main ideas and programming considerations? Note that I don't even cover forward-mode, considering it a waste of time and attention and an unnecessary potential additional confusion.
My experience with Boost is very different from yours. I engaged discussions to develop Boost.AAD with them (with proper reverse-mode diff, in a general and efficient implementation, platform independent and header only in good Boost order) and I never had a reply.
I suppose I contacted the wrong people. Perhaps you would be so kind as introduce me to the right people to discuss these things?
Many thanks in advance,
Kind regards,
Antoine