C++ techniques for $600K high frequency trading jobs

Joined
5/2/06
Messages
12,165
Points
273
Jobs involving C++ programming for high-frequency trading (HFT) firms and hedge funds can be highly lucrative. Recruiters estimate that total compensation, including salary and bonus, for these positions can reach over $600k, a figure that remains relevant today. However, simply being proficient in C++ isn’t sufficient. While the language is inherently fast, optimizing it for low-latency trading requires advanced expertise to achieve maximum performance.

An article on efinancialcareers quoted Paul Bilokon (former director at DB and teaching at Imperial College quant master program) as saying if you want an integral role as a C++ developer in an HFT team, familiarity with low latency C++ is usually mandatory. Paul and his student released a paper recently listing 12 techniques for reducing latency in C++ code.
  1. Lock-free programming: a concurrent programming paradigm involving multi-threaded algorithms which, unlike their traditional counterparts, do not employ the usage of mutual exclusion mechanisms, such as locks, to arbitrate access to shared resources.
  2. Single mix multiple data (SMID) instructions: Instructions that take advantage of the parallel processing power of contemporary CPUs, allowing simultaneous execution of multiple operations.
  3. Mixing data types: When a computation involves both float and double types, implicit conversions are required. If only float computations are used, performance improves.
  4. Signed vs unsigned: Ensuring consistent signedness in comparisons to avoid conversions.
  5. Prefetching: Explicitly loading data into cache before it is needed to reduce data fetch delays, particularly in memory-bound applications
  6. Branch reduction: predicting conditional branch outcomes to allow speculative code execution
  7. Slowpath removal: minimize execution of rarely executed code paths.
  8. Short-circuiting: Logical expressions cease evaluation when the final result is determined.
  9. Inlining: Incorporating the body of a function at each point the function is called, reducing function call overhead and enabling further optimisation by the compiler
  10. Constexpr: Computations marked as constexpr are evaluated at compile time, enabling constant folding and efficient code execution by eliminating runtime calculations
  11. Compile-time dispatch: Techniques like template specialization or function overloading so that optimised code paths are chosen at compile time based on type or value, avoiding runtime dispatch and early optimisation decision.
  12. Cache warming: To minimize memory access time and boost program responsiveness, data is preloaded into the CPU cache before it’s needed.
 
Per the author's paper:
Prefetching
Prefetching is a technique used by computer processors to boost execution performance by fetching data and instructions from the main memory to the cache before it is actually needed for execution [22, 59]. It anticipates the data needed ahead of time, and therefore, when the processor needs this data, it is readily available from the cache, rather than having to fetch it from the slower main memory. Prefetching reduces latency as it minimises the time spent waiting for data fetch operations, allowing a more efficient use of the processor’s execution capabilities. The technique is beneficial in scenarios where data access patterns as predictable, such as traversing arrays, processing matrices, or accessing data structures in a sequential manner. In these scenarios, prefetching can significantly lower the latency, resulting in faster code.

Cache Warming
Cache warming, in computing, refers to the process of pre-loading data into a cache memory from its original storage disk, with the intent to accelerate data retrieval times. The logic behind this practice is that accessing data directly from the cache, which is considerably faster than the hard disk or even solid state drives, significantly reduces latency and improves overall system performance. By pre-loading or“warming up” the cache, the system is prepared to serve the req
 
How do you set up a scalable software architecture?


to avoid a software ball of mud?

It's one thing having building blocks but you need some kind of design blueprint, otherwise there's no point.

  1. Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture, Volume 2, Patterns for Concurrent and Networked Objects", Wiley, 2000

Friendly advice from real life

“A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.”
 
Last edited:
Here is a catalogue of reusable software solutions for all kinds of applications.

Domain classification is like ornithology and design blueprints: does my application look familiar? Ask yourself this question before jumping headfirst into code!!
 

Attachments

Some background on parallel and distributed computing + Advanced Networks is always helpful.
 
Back
Top Bottom