How To Get Started With Machine Learning Using Python
Machine learning is a specialized field of artificial intelligence that deals with reinforcement learning. Frontiers in Big Data notes that both AI and ML are concerned with developing complex behaviors from machines. We can define a complex behavior as one that mirrors how a human thinks and reasons. Thus, machine learning seeks to bridge the gap between machine-thinking and human-thinking. One of the most amazing things to come out of modern computer science is creating one of these machine learning programs using a relatively simple coding language – Python.
Introducing Python
Net Guru informs us that Python is a modern, interpreted language that forms the basis for several applications. Python is both modular and open-source, meaning that users can use it for free and expand it as they see fit. The reason it’s such a flexible language is that it’s interpreted. For the non-specialist, this means that the language is changed into machine-code immediately, allowing for faster working environments. The rapidity with which the code is translated into machine language also makes it highly responsive.
Machine Learning With Python
For you to start off developing in Python, you’ll first need to install the Python interpreter. Real Python has a handy step-by-step installation guide to get you up to speed with Python 3.x. If you already have Python installed, you should be ready to go. These instructions assume that you have Python 2.7 or 3.6+. If your Python version is older than these, you need to upgrade to continue this guide. You can go visit dermani Medspa Franchising while you do this.
Step 1: Install libraries
Python works through code that has already been developed, allowing you to pull directly from the compiled li9braries. The libraries you’ll need to install for this project are:
o numpy
o scipy
o numplotlib
o sklearn
o pandas
Installing libraries can be a bit complicated for a beginner, but SciPy offers a decent guide to helping newcomers to the language install libraries on their page.
Once you start Python, you’ll probably want to check your versions to make sure everything is compatible. A handy code snippet you can use to do that is:
<code>
# Check the versions of libraries
# Python version
import sys
print(‘Python: {}’.format(sys.version))
# scipy
import scipy
print(‘scipy: {}’.format(scipy.__version__))
# numpy
import numpy
print(‘numpy: {}’.format(numpy.__version__))
# matplotlib
import matplotlib
print(‘matplotlib: {}’.format(matplotlib.__version__))
# pandas
import pandas
print(‘pandas: {}’.format(pandas.__version__))
# scikit-learn
import sklearn
print(‘sklearn: {}’.format(sklearn.__version__))
</code>
The output versions that you’ll need for the following sections are:
scipy: 1.5.2
numpy: 1.19.1
matplotlib: 3.3.0
pandas: 1.1.0
sklearn: 0.23.2
Once these match or you have higher versions than the ones stated, you’re ready to start loading data to teach your ML agent.
Step 2: Loading Data
The data set we’ll use here is known as the flowers dataset. It’s akin to writing your first “Hello World” program in other languages. To load the flowers dataset, we’ll start with loading the libraries this dataset is contained within. For that, we’ll use this code snippet:
<code>
# Load libraries
from pandas import read_csv
from pandas.plotting import scatter_matrix
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
…
</code>
If you get an error, then you need to stop immediately. You’ll need a working SciPy install to continue from here. If no errors pop up, then the requisite libraries are all loaded and ready to go. Next, we’ll load up the dataset:
<code>
…
# Load dataset
url = “https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv”
names = [‘sepal-length’, ‘sepal-width’, ‘petal-length’, ‘petal-width’, ‘class’]
dataset = read_csv(url, names=names)
</code>
The dataset should load without any problems. Loading the dataset allows us to move on to summarizing the data contained inside of it.
Step 3: Summarizing Data In the Dataset
Here, we’ll look at the data from several angles. First, we’ll look at the shape of the data with the command:
print(dataset.shape)
This command should give you the output:
(150, 5)
which we interpret as there being one hundred and fifty records with five attributes.
We can also “peek” at the data using the command:
print(dataset.head(20))
A table showing twenty records with each of the five attributes should be your result for this command.
Summarizing the data is a simple matter of running the command:
print(dataset.describe())
This command creates output in a table:
sepal-length sepal-width petal-length petal-width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Class distribution is our next summarization, and to get this we’ll use:
print(dataset.groupby(‘class’).size())
This statement will give us the output:
class
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
The language sorts the data into easily understandable classes that the machine can use later in its decision-making processes.
Where To Go From Here
Loading the datasets allows you to delve into the world of machine learning by applying several algorithms. Among the most common machine learning algorithms, you’re likely to encounter are:
Ø Support Vector Machines (SVM)
Ø Gaussian Native Bayes (NB)
Ø Classification and Regression Trees (CART)
Ø K-Nearest Neighbors (KNN)
Ø Linear Discriminant Analysis (LDA)
Ø Logistic Regression (LR)
These may sound complicated, but they’re just ways of interpreting machine learning data and strengthening the model. All machine learning projects go through several stages. From here, you should learn how to visualize the data using box-plots and scatter graphs so you can have a feel for what the data says in a visual sense. The algorithms can then be applied to the data, and we can make some predictions. Based on the output, we’d refine our methods and try again. This sort of machine learning is known as structured learning, where a human evaluates the machine’s ability to understand the dataset. Unstructured learning lets the device determine how it learns and is a bit more complex to implement. For now, this introduction should at least allow you to load and see datasets in preparation for visualizing and training your own machine learning agent.