Documenting my journey of learning machine learning. Starting off with a mini project I found online. The goal is to train a program to identify the species of an Iris based off its morphological traits. They will be separated into Iris-setosa, Iris-versicolor, and Iris-virginica.

Step 1: Make sure all modules are in order and import. Modules I use:

- sklearn
- scipy
- pandas
- matplotlib
- numpy

1
2
3

import sklearn
print('Python: {}'.format(sklearn.version))
#and so forth with other modules

Step 2: Load Dataset.

1
2
3

names = ['sepal-length', 'sepal-width', \
'petal-length', 'petal-width', 'class']
dataset = read_csv(url, names=names)

Step 3: Visualize Data:

1
2

scatter_matrix(dataset)
pyplot.show()

Which will appear like the following: Step 4: Split the Data Set into a training and test set.

1
2
3
4

x = array[:,0:4]
y = array[:,4]
x_train, x_test, y_train, y_test = train_test_split(x, y, \
test_size=0.20, random_state=1)

Step 5: Create different models

**Gaussian Naive Bayes (NB)** uses prior probability to predict the probability of the hypothesis.

**Logistic Regression** is conceptually similar to linear regression. A logistical model is fit to produce a categorical output.

**Linear Discrimination Analysis (LDA)** is similar to PCA. LDA maximizes the difference of separation of known categories for identification.

**K-nearest Neighbors** uses the closest k number of neighbors to generate a prediction.

**Decision Trees** employ a tree of decision-nodes to create a prediction.

**Support Vector Machines** create a support vector classifier to create predictions.

Step 6: Test Models

`Results:`

`NB: 0.951166 (0.052812)`

`LR: 0.955909 (0.044337)`

`LDA: 0.975641 (0.037246)`

`KNN: 0.950524 (0.040563)`

`CART: 0.958858 (0.053754)`

`SVM: 0.983333 (0.033333)`