NLP Cats Vs Dogs: Clustering

AngelCakePie

--

As an experiment I compared cats and dogs using NLP with unsupervised learning. I used a dataset I got from the COCO (Common Objects in Context) dataset. This is mainly used for image processing projects using things like OpenCV. In my case I wanted to use the annotation for an NLP project. When looking specifically at cats and dogs I had 3610 total sentences of annotations. I used TfIdf vectorization to turn the sentences into numeric data. Starting off I wanted to see how many clusters I should use by seeing if there is a dip in the sum of squared error graph. There was no dip so I decided to try this first with 24 clusters.

I used PCA and T-SNE to create my graphs. There is very clear clustering for T-SNE. The clusters all had certain themes like being outdoors or having plants. Using NMF with 2 topics we also got a clear theme of cats vs dogs. It was very interesting to see because the generic theme of cats and dogs came up. Dogs displayed a more active image than cats with some keywords being frisbee, grass and play. Cats took a more relaxed image with keywords like sit, ontop, and lay.

--

--

No responses yet

Write a response