- Joined
- 9/14/12
- Messages
- 2
- Points
- 11
Hi all, I have a machine learning algorithm question, specifically about the Random Forests (RF) algorithm. This may not be the right forum to ask it, but I've already tried the kaggle ML forum, and haven't gotten an answer (yet). If you can point a more appropriate forum for me to ask it, that'd be much appreciated, too.
So, my (perhaps limited) understanding of RF is that the original data set, S, is randomly split into a training subset, St(k) and a classification subset Sc(k), differently for each k-th tree (k=1,2,...,M=#trees).
St(k) is about 2/3 of S; and Sc(k), used to compute the so-called out-of-bag (OOB) error, is the rest, about 1/3 of S. Bottom line, though, St(k) + Sc(k) = S. Hence, the entire set S is used for each tree in the forest, just differently split.
My question is the following: instead of passing the entire set S (just differently split) to each tree, can I pre-partition S into smaller buckets, S(k), k=1,2,...,M and "give" each tree a different bucket, S(k), which will be further split into (St(k), Sc(k)) but this time St(k) + Sc(k) = S(k) instead of whole S, where |S(k)| ~= |S|/M << |S| ? (where |.| denotes cardinality of the set)?
Would the underlying RF theory still hold, from a stochastic standpoint? Thank you in advance.
So, my (perhaps limited) understanding of RF is that the original data set, S, is randomly split into a training subset, St(k) and a classification subset Sc(k), differently for each k-th tree (k=1,2,...,M=#trees).
St(k) is about 2/3 of S; and Sc(k), used to compute the so-called out-of-bag (OOB) error, is the rest, about 1/3 of S. Bottom line, though, St(k) + Sc(k) = S. Hence, the entire set S is used for each tree in the forest, just differently split.
My question is the following: instead of passing the entire set S (just differently split) to each tree, can I pre-partition S into smaller buckets, S(k), k=1,2,...,M and "give" each tree a different bucket, S(k), which will be further split into (St(k), Sc(k)) but this time St(k) + Sc(k) = S(k) instead of whole S, where |S(k)| ~= |S|/M << |S| ? (where |.| denotes cardinality of the set)?
Would the underlying RF theory still hold, from a stochastic standpoint? Thank you in advance.