How do people prepare for ML/Stats?

LucasKoc · 3/13/24

I am a PhD candidate in applied math doing ML at a university that is very well-known for deep learning. I was very lucky to have been given the chance to interview at a number of top firms for their quant researcher position.

So what is wrong? The problem is, after my 3rd interview, it has become clear that I am definitely not going to pass the technical interviews without filling my knowledge gaps in ML/Stats(ironic considering that's supposed to be my expertise.

For context, I transitioned to deep learning at a fairly late stage: One year AFTER passing my quals. Before that, my background was in pure math.

While I managed to become a fairly competent coder(especially at DSA type of stuff) over the course of my PhD, I didn't do any "real data science", by this I mean an end-to-end pipeline for computational statistics on raw data. What we do(or I do), is more like probability, where various assumptions are assumed, and the focus was to derive implications that are less obvious. So it is unlike statistics, which is more like inverse probability.

Big picture aside, the heart of the matter is, I think there is a mismatch between what they expect from me and what I am actually prepared for. They expect me to be more like other CS DL graduates from my school, but that's not what I was trained for.

Without breaching the NDAs, I can say the kind of questions that stumped me are more like classical ML, the kind of stuff covered in ESL. But the questions are more specific: The models, the set-ups, the parameters, the observables, etc. So everything is very concrete and computable(by hands). Unfortunately these were things that I crammed in order to start doing deep learning as a late starter. And now they come back to haunt me because my entire CV is built around ML.

TL;DR: What additional references can you use to prepare for ML/Stats besides ESL? I am looking for something that's more problem-oriented like the green book and Fifty Challenging Problems. I don't mind learning more about the theories but I think ESL is already pretty comprehensive.

MikeLawrence · 3/14/24

Wasserman's All of statistics is recommended, I think it covers a good amount of what you'd be talking about.
There's a sequel as well - all of non-parametric statistics

LucasKoc · 3/16/24

MikeLawrence said:
Wasserman's All of statistics is recommended, I think it covers a good amount of what you'd be talking about.
There's a sequel as well - all of non-parametric statistics

These indeed cover a lot of relevant topics. It is missing ML theory and ML models. I find Pattern Recognition And Machine Learning by Bishop to be a good supplement.

marcusaurelius · 3/16/24

Perhaps Applied Predictive Modeling? I found it quite complementary to ESL. Else, I could always share with you my Amazon textbook list I am planning to go through, and you could get additional ideas? Feel free to DM if that is of interest

I think what would be useful, after rereading your post, is to actually apply those models to real-life data you can grab and then type up the different roadblocks you've encountered along the way. At least, that's what I am currently doing and then blending it in Obsidian so that I can easily link my various theory notes with my various exercise files.

marcusaurelius · 3/16/24

Also just came across this post from Frank Nielsen (Polytechnique):

Paul Lopez · 3/16/24

Woooo that's a hell of a list!

How do people prepare for ML/Stats?

LucasKoc

MikeLawrence

LucasKoc

marcusaurelius

marcusaurelius

Paul Lopez