How To Get Started With Machine Learning Using Python

May 3, 2021 Gemma Comments Off

Machine learning is a specialized field of artificial intelligence that deals with reinforcement learning. Frontiers in Big Data notes that both AI and ML are concerned with developing complex behaviors from machines. We can define a complex behavior as one that mirrors how a human thinks and reasons. Thus, machine learning seeks to bridge the gap between machine-thinking and human-thinking. One of the most amazing things to come out of modern computer science is creating one of these machine learning programs using a relatively simple coding language – Python.

Introducing Python

Net Guru informs us that Python is a modern, interpreted language that forms the basis for several applications. Python is both modular and open-source, meaning that users can use it for free and expand it as they see fit. The reason it’s such a flexible language is that it’s interpreted. For the non-specialist, this means that the language is changed into machine-code immediately, allowing for faster working environments. The rapidity with which the code is translated into machine language also makes it highly responsive.

Machine Learning With Python

For you to start off developing in Python, you’ll first need to install the Python interpreter. Real Python has a handy step-by-step installation guide to get you up to speed with Python 3.x. If you already have Python installed, you should be ready to go. These instructions assume that you have Python 2.7 or 3.6+. If your Python version is older than these, you need to upgrade to continue this guide. You can go visit dermani Medspa Franchising while you do this.

Step 1: Install libraries

Python works through code that has already been developed, allowing you to pull directly from the compiled li9braries. The libraries you’ll need to install for this project are:

o numpy

o scipy

o numplotlib

o sklearn

o pandas

Installing libraries can be a bit complicated for a beginner, but SciPy offers a decent guide to helping newcomers to the language install libraries on their page.

Once you start Python, you’ll probably want to check your versions to make sure everything is compatible. A handy code snippet you can use to do that is:

<code>

# Check the versions of libraries

# Python version

import sys

print(‘Python: {}’.format(sys.version))

# scipy

import scipy

print(‘scipy: {}’.format(scipy.__version__))

# numpy

import numpy

print(‘numpy: {}’.format(numpy.__version__))

# matplotlib

import matplotlib

print(‘matplotlib: {}’.format(matplotlib.__version__))

# pandas

import pandas

print(‘pandas: {}’.format(pandas.__version__))

# scikit-learn

import sklearn

print(‘sklearn: {}’.format(sklearn.__version__))

</code>

The output versions that you’ll need for the following sections are:

scipy: 1.5.2

numpy: 1.19.1

matplotlib: 3.3.0

pandas: 1.1.0

sklearn: 0.23.2

Once these match or you have higher versions than the ones stated, you’re ready to start loading data to teach your ML agent.

Step 2: Loading Data

The data set we’ll use here is known as the flowers dataset. It’s akin to writing your first “Hello World” program in other languages. To load the flowers dataset, we’ll start with loading the libraries this dataset is contained within. For that, we’ll use this code snippet:

<code>

# Load libraries

from pandas import read_csv

from pandas.plotting import scatter_matrix

from matplotlib import pyplot

from sklearn.model_selection import train_test_split

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import StratifiedKFold

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.naive_bayes import GaussianNB

from sklearn.svm import SVC

…

</code>

If you get an error, then you need to stop immediately. You’ll need a working SciPy install to continue from here. If no errors pop up, then the requisite libraries are all loaded and ready to go. Next, we’ll load up the dataset:

<code>

…

# Load dataset

url = “https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv”

names = [‘sepal-length’, ‘sepal-width’, ‘petal-length’, ‘petal-width’, ‘class’]

dataset = read_csv(url, names=names)

</code>

The dataset should load without any problems. Loading the dataset allows us to move on to summarizing the data contained inside of it.

Step 3: Summarizing Data In the Dataset

Here, we’ll look at the data from several angles. First, we’ll look at the shape of the data with the command:

print(dataset.shape)

This command should give you the output:

(150, 5)

which we interpret as there being one hundred and fifty records with five attributes.

We can also “peek” at the data using the command:

print(dataset.head(20))

A table showing twenty records with each of the five attributes should be your result for this command.

Summarizing the data is a simple matter of running the command:

print(dataset.describe())

This command creates output in a table:

sepal-length sepal-width petal-length petal-width

count 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.054000 3.758667 1.198667

std 0.828066 0.433594 1.764420 0.763161

min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

max 7.900000 4.400000 6.900000 2.500000

Class distribution is our next summarization, and to get this we’ll use:

print(dataset.groupby(‘class’).size())

This statement will give us the output:

class

Iris-setosa 50

Iris-versicolor 50

Iris-virginica 50

The language sorts the data into easily understandable classes that the machine can use later in its decision-making processes.

Where To Go From Here

Loading the datasets allows you to delve into the world of machine learning by applying several algorithms. Among the most common machine learning algorithms, you’re likely to encounter are:

Ø Support Vector Machines (SVM)

Ø Gaussian Native Bayes (NB)

Ø Classification and Regression Trees (CART)

Ø K-Nearest Neighbors (KNN)

Ø Linear Discriminant Analysis (LDA)

Ø Logistic Regression (LR)

These may sound complicated, but they’re just ways of interpreting machine learning data and strengthening the model. All machine learning projects go through several stages. From here, you should learn how to visualize the data using box-plots and scatter graphs so you can have a feel for what the data says in a visual sense. The algorithms can then be applied to the data, and we can make some predictions. Based on the output, we’d refine our methods and try again. This sort of machine learning is known as structured learning, where a human evaluates the machine’s ability to understand the dataset. Unstructured learning lets the device determine how it learns and is a bit more complex to implement. For now, this introduction should at least allow you to load and see datasets in preparation for visualizing and training your own machine learning agent.