Train/Test split

kobo · 6/2/21

Hi,

Playing with Random Forest Classifier, I am wondering what could cause in a 80:20 split the test results to perform better than in a 90:10 split?

With 2000+ data points and:
- with 80:20 split, considering only the test set, the model generates 150 signals with around 55% accuracy
- with 90:10 split, considering only the test set, the model generates 77 signals with around 49% accuracy

From the images, it seems like the more the model 'sees', the worse it gets.

80:20

90:10

And with 20:80:
53% accuracy with 784 generated signals

What could be the problem?

LeFed · 6/2/21

Erm...you are testing and comparing your strategy across different periods. You can't come to the conclusion "that the more the model 'sees', the worse it gets" because you are testing it on different periods. Train the model on different amounts of data but test it on the same period to test your hypothesis.

Train/Test split

kobo

LeFed