@Cuchullain, for me, Gradient Descent is a swiss-knife methods. Always produce results, but can be stuck in local minima.
Local minima, if it is lucky, That's the least of your worries. GD has a whole lot of issues: Off the top of my head
0. Inside GD lurks a nasty Euler method.
1. Initial guess must be close to real solution (Analyse Numerique 101).
2. No guarantee that GD is applicable in the first place (assumes cost function is smooth).
3. "Vanishing gradient syndrome" Vanishing ... nt_problem
4. Learning rate parameter... so many to choose from (ad hoc/trial and error process).
5. Use Armijo and Wolfe to improve convergence.
6. Modify algorithm by adding momentum.
7. Any you have to compute gradient 1) exact, 2) FDM, 3) AD, 4) complex step method.
8. Convergence to local minimum.
9. The method is iterative, so no true reliable quality of service (QOS).
10. It's not very robust (cf. adversarial examples). Try regularization.