지도학습과 비지도 학습

ARTIFICIAL INTELLIGENCE

by 스터디로거 2019. 12. 27. 13:55

Supervised Learning (지도학습)

회귀문제로 풀 것인지 클래서피케이션으로 풀 것인지 물어보는 문제

수많은 재고의 판매량 예측은 회귀문제.

계정의 해킹 유무는 분류 문제.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Example 1:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

Example 2:

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

====================================================================

Unsupervised Learning (비지도학습)

클러스터링 알고리즘이라고도 한다.

구글 뉴스가 이 방식이다. 수천가지 기사를 연관성으로 묶고 있다.

유전자 지도로 사람들을 클러스터링 할 수 있다.

데이터 센터 운영, 마케팅 세그멘테이션, 소셜 네트워크 분석, 천문학 데이터 분석에도 이용됩니다.

칵테일 파티 문제

시끄러운 파티룸에 두 사람이 마이크로 이야기한다고 가정하자. 각 마이크가 두 사람의 목소리를 레코딩해서 비지도 학습 알고리즘에 넣어 구조화 해달라고 해서 음성을 분리하거나, 음악소리가 있는 곳에서 사람 목소리를 분류한다.

옥타브나 맷랩을 사용하면 쉽게 이용할 수 있다.

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).