Biology Cluster Analysis
This project uses the Irish dataset which refers us back to 1939. This dataset was created by a biology and statistics practitioner, Ronald Fischer, with measurements of various iris plants. With it, we can find 150 instances, 5 attributes and a class named Species. The class attribute can have three values: iris setosa, iris versicolor and iris virginica.
To understand any concept of data analysis, we must start with a question that we want the data to answer for us. In this case, the question was “Is there a pattern in the data by which we can group the three species of iris so that if we see a new sample we can identify the species to which it belongs?”.
In order to identify and observe the clusters, I’ve gone through the following points:
1. Environment setup and load data;
2. View the data;
3. Build the K Means model;
4. View cluster results.