Телеграмм чат группы ai

so, you want your forest to solve a task that human cannot do, even given a long time for analysis?

22:46пожаловаться #1

I am not sure I understand

22:47пожаловаться #2

Кстати, хороший вопрос, самому стало интересно - как находить зависимости между фичами?

22:47пожаловаться #3

Орхан

Кстати, хороший вопрос, самому стало интересно - как находить зависимости между фичами?

удалять одну и смотреть ухудшился ли результат модели - самый дорогой и точный метод

22:48пожаловаться #4

Я думаю это грубо, т.к. может быть парная зависимость, коррелирующая в точность предсказания

22:48пожаловаться #5

Luke Skywalker

I am not sure I understand

you want to guess a cancer status

lets say a doctor wants to do the same

how would a doctor do this job?

22:48пожаловаться #6

Yes deleting 1 by 1 is also knows as backward feature elimination technique

22:50пожаловаться #7

I tried that already gave me 39 features out of 150

22:50пожаловаться #8

Defragmented Panda

you want to guess a cancer status

lets say a doctor wants to do the same

how would a doctor do this job?

Machine learning is all about finding pattern in data than humans cannot. Otherwise why use machine learning at all if humanly possible

22:51пожаловаться #9

Орхан

Я думаю это грубо, т.к. может быть парная зависимость, коррелирующая в точность предсказания

можно сделать автоэнкодер

он конкретно эту задачу решает - восстановление потерянных данных имея большую часть данных

чем лучше восстанавливается фича если ее удалить - тем больше она предсказуема из других

это намного дешевле чем тренировать новую модкль на каждую удаленную фичу

22:51пожаловаться #10

Autoencoder is in neural networks right ? Unfortunately cannot use neural as data very less

22:52пожаловаться #11

Luke Skywalker

Machine learning is all about finding pattern in data than humans cannot. Otherwise why use machine learning at all if humanly possible

im not saying its not possible

its just a question to determine how complex is your task

22:52пожаловаться #12

There were quite good count of materials when I googled feature selection

22:58пожаловаться #13

One I liked is: Correlation Matrix with Heatmap

22:58пожаловаться #14

I tried : hybrid feature selection like genetic algorithm/simulated annealing and wrapper feature selection like recursive feature elimination. But performance was best with accuracy of 69 only. Problem is most feature selection don't work good with mixed data

22:59пожаловаться #15

What do you mean - mixed?

23:00пожаловаться #16

Mixed : numeric+ categorical

23:00пожаловаться #17

so, to summarize

you use random forests
you give it data like this:
age 18 - no
age 25 - yes
age 30 - no
blood pressure 100 - no
blood pressure 120 - yes
blood pressure 140 - no
female - no
male - yes (as a category?)

at the end you want to receive a chance of cancer for this data entry.

and you have a data values of real patients who were suffering cancer or not, and you check the performance of your forest against this data of real patients

23:00пожаловаться #18

Can I share photo here of my data

23:01пожаловаться #19

i think yes