Supervised Learning
In this post, we are going to discuss an interesting topic in the world of computer science especially Artificial Intelligence. Machine Learning is a concept which changes ways of looking at problems. In the past, people dreamed of creating expert systems to solve problems without any interference. At first, scientists thought that they can create an expert system with the simulation of the human brain. But at that time there was no scientific knowledge to simulate the human brain so they couldn’t. After several years and by the development of technology and the emergence of various intelligent devices and social media the idea of creating an expert system comes to scientists’ mind but this time by not simulating the human brain but with the new concept which is called Big Data!
Big data
Big data is defined as large pools of data that can be captured, communicated, aggregated, stored, and analyzed.
Data continues to grow: In mid-2010, the information universe carried 1.2 zettabytes and 2020 predictions expect nearly 44 times more at 53 zettabytes coming our way. also, applications are becoming data-intensive.
- The rapid growth of data due to various smart devices caused a huge amount of produced data.
- There are approximately 6.4 billion smart devices that are used all around the world and their number and capabilities keep growing rapidly.
- According to Gartner (http://www.gartner.com/newsroom/id/ 3165317), the number of IoT devices will reach 20.8 billion by 2020, and, by then, IoT service spending will reach $1,534 billion and hardware spending $1,477 billion.
The World of Data
Based on IBM report:
- 2.9 million number of emails sent every second
- 375 megabytes of data consumed by households each day
- 20 hours video uploaded to YouTube every minute
- 24 petabyte data per day processed by google
- 50 million tweets per day
- 700 billion total minutes spent on Facebook each month
- 1.3 exabytes data sent and received by mobile internet users
- 72.9 products ordered on Amazon per second
How will we manage our data
- Manage it ourselves? Personal, but time-consuming.
- How would you get access to your data wherever you are? Would you keep it on your devices? OR would you keep it online?
- What if it is managed by someone else? and you can get this “service” for by someone else? free or with a subscription?
Supervised Learning
As we said in previous sections, nowadays due to the rapid growth of data around us, instead of creating intelligent systems by looking at the human brain, scientists strive to make models which they can be trained by numerous and various kinds of data which we are surrounded by them. Despite supervised learning, there are other ways of learning such as unsupervised learning, semi-supervised learning, deep learning, reinforcement learning, and deep reinforcement learning.
How supervised learning works?
As we said earlier, an important prerequisite of supervised learning is data. In this type of learning, we have a bunch of data that our model has to be learned from them……
One of the most reputable models of Machine learning is the decision tree model there are other and fancier models but the decision tree model is the simplest one which we will explain.
as you can see in the above diagram, this prediction is based on previously captured data about the worth of houses that are divided into two categories. We use data to put houses into two groups. This step of capturing patterns of data called fitting or training the model.
Now look at the following decision trees:
Which one makes sense? probably decision tree 1 makes more sense than the second one because it captures the fact that a house with more bedrooms would be probably more expensive. Maybe decision tree 2 takes other factors for predicting the price of a house like the number of bathrooms, lot size, location, etc. If we want to consider more factors for predicting house price, the decision tree should be split more like the below picture:
Supervised learning algorithms
- Support Vector Machines
- linear regression
- logistic regression
- naive Bayes
- linear discriminant analysis
- decision trees
- k-nearest neighbor algorithm
- Neural Networks (Multilayer perceptron)
- Similarity learning
Supervised learning applications
- Bioinformatics
- Cheminformatics
- Database marketing
- Handwriting recognition
- Information retrieval
- Information extraction
- Object recognition in computer vision
- Optical character recognition
- Spam detection
- Pattern recognition
- Speech recognition
- Supervised learning is a special case of Downward causation in biological systems
- Landform classification using Satellite imagery