C++ techniques for $600K high frequency trading jobs

Andy Nguyen · 9/8/24

Jobs involving C++ programming for high-frequency trading (HFT) firms and hedge funds can be highly lucrative. Recruiters estimate that total compensation, including salary and bonus, for these positions can reach over $600k, a figure that remains relevant today. However, simply being proficient in C++ isn’t sufficient. While the language is inherently fast, optimizing it for low-latency trading requires advanced expertise to achieve maximum performance.

An article on efinancialcareers quoted Paul Bilokon (former director at DB and teaching at Imperial College quant master program) as saying if you want an integral role as a C++ developer in an HFT team, familiarity with low latency C++ is usually mandatory. Paul and his student released a paper recently listing 12 techniques for reducing latency in C++ code.

Lock-free programming: a concurrent programming paradigm involving multi-threaded algorithms which, unlike their traditional counterparts, do not employ the usage of mutual exclusion mechanisms, such as locks, to arbitrate access to shared resources.
Single mix multiple data (SMID) instructions: Instructions that take advantage of the parallel processing power of contemporary CPUs, allowing simultaneous execution of multiple operations.
Mixing data types: When a computation involves both float and double types, implicit conversions are required. If only float computations are used, performance improves.
Signed vs unsigned: Ensuring consistent signedness in comparisons to avoid conversions.
Prefetching: Explicitly loading data into cache before it is needed to reduce data fetch delays, particularly in memory-bound applications
Branch reduction: predicting conditional branch outcomes to allow speculative code execution
Slowpath removal: minimize execution of rarely executed code paths.
Short-circuiting: Logical expressions cease evaluation when the final result is determined.
Inlining: Incorporating the body of a function at each point the function is called, reducing function call overhead and enabling further optimisation by the compiler
Constexpr: Computations marked as constexpr are evaluated at compile time, enabling constant folding and efficient code execution by eliminating runtime calculations
Compile-time dispatch: Techniques like template specialization or function overloading so that optimised code paths are chosen at compile time based on type or value, avoiding runtime dispatch and early optimisation decision.
Cache warming: To minimize memory access time and boost program responsiveness, data is preloaded into the CPU cache before it’s needed.

MikeLawrence · 9/9/24

What is the difference?

Andy Nguyen said:
Prefetching: Explicitly loading data into cache before it is needed to reduce data fetch delays, particularly in memory-bound applications

Cache warming: To minimize memory access time and boost program responsiveness, data is preloaded into the CPU cache before it’s needed.

Andy Nguyen · 9/9/24

Per the author's paper:
Prefetching
Prefetching is a technique used by computer processors to boost execution performance by fetching data and instructions from the main memory to the cache before it is actually needed for execution [22, 59]. It anticipates the data needed ahead of time, and therefore, when the processor needs this data, it is readily available from the cache, rather than having to fetch it from the slower main memory. Prefetching reduces latency as it minimises the time spent waiting for data fetch operations, allowing a more efficient use of the processor’s execution capabilities. The technique is beneficial in scenarios where data access patterns as predictable, such as traversing arrays, processing matrices, or accessing data structures in a sequential manner. In these scenarios, prefetching can significantly lower the latency, resulting in faster code.

Cache Warming
Cache warming, in computing, refers to the process of pre-loading data into a cache memory from its original storage disk, with the intent to accelerate data retrieval times. The logic behind this practice is that accessing data directly from the cache, which is considerably faster than the hard disk or even solid state drives, significantly reduces latency and improves overall system performance. By pre-loading or“warming up” the cache, the system is prepared to serve the req

Daniel Duffy · 9/9/24

How do you set up a scalable software architecture?

List of software architecture styles and patterns - Wikipedia

en.wikipedia.org

to avoid a software ball of mud?

It's one thing having building blocks but you need some kind of design blueprint, otherwise there's no point.

Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture, Volume 2, Patterns for Concurrent and Networked Objects", Wiley, 2000

Amazon.com

Friendly advice from real life

“A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.”

Daniel Duffy · 9/10/24

Here is a catalogue of reusable software solutions for all kinds of applications.

Domain classification is like ornithology and design blueprints: does my application look familiar? Ask yourself this question before jumping headfirst into code!!

zephyrus · 9/10/24

Some background on parallel and distributed computing + Advanced Networks is always helpful.

Daniel Duffy · 9/11/24

zephyrus said:
Some background on parallel and distributed computing + Advanced Networks is always helpful.

C++ has support for multithreading and (parallel) promises.
Boost C++ Asio etc.

NEW LIBRARY

Boost.Redis: Boost.Redis - 1.86.0

www.boost.org

Quasar Chunawala · 12/7/24

Being able to write/talk about implementations of important STL containers - vector and list , memory management constructs shared_ptr and unique_ptr, synchronization mechanisms - std::mutex and std::semaphore is super-imp. D of DSA is often overlooked. I think, it also shows one's investment into the language.

Paul Lopez · 12/7/24

Quasar Chunawala said:
Being able to write/talk about implementations of important STL containers - vector and list , memory management constructs shared_ptr and unique_ptr, synchronization mechanisms - std::mutex and std::semaphore is super-imp. D of DSA is often overlooked. I think, it also shows one's investment into the language.

1000% I have been asked and regularly ask folks to implement things such as a smartpointer in interviews

Daniel Duffy · 12/8/24

Paul Lopez said:
1000% I have been asked and regularly ask folks to implement things such as a smartpointer in interviews

those with COM experience know how.
And Boost aka intrusive ptr.

C++ techniques for $600K high frequency trading jobs

Andy Nguyen

MikeLawrence

Andy Nguyen

Daniel Duffy

C++ author, trainer

List of software architecture styles and patterns - Wikipedia

Daniel Duffy

C++ author, trainer

Attachments

zephyrus

Daniel Duffy

C++ author, trainer

Boost.Redis: Boost.Redis - 1.86.0

Quasar Chunawala

Paul Lopez

Daniel Duffy

C++ author, trainer

Similar threads