so, to summarize
you use random forests
you give it data like this:
age 18 - no
age 25 - yes
age 30 - no
blood pressure 100 - no
blood pressure 120 - yes
blood pressure 140 - no
female - no
male - yes (as a category?)
at the end you want to receive a chance of cancer for this data entry.
and you have a data values of real patients who were suffering cancer or not, and you check the performance of your forest against this data of real patients