I often see aspiring Data Scientists jump to using scaled-up algorithms to infer the relationships between many different features simultaneously. I notice the field as a whole is somewhat indifferent towards this approach. "If your only tool is a hammer then every problem better be a nail". What's wrong with just using a giant hammer? After all, we can just scale up our algorithm using large virtual instances and then proceed to wrap the algorithm into a meta-learning function that tries every combination of feature set and parameter configuration? Let me compare and contrast the two:
|Current use:||Supervised learning is in its glory days.||Unsupervised seems unpopular at the moment.|
|History:||It has seen serious improvements and use within the last couple of decades.||Unsupervised learning has been in use for the better part of 4 decades.|
|Popularity:||Popularity is based upon ability to solve many different types of problems without much thought about the correct domain model.||They are unpopular because it can take time to discover the domain model underlying a certain business problem and we assume that we already know the domain model.|
|Basic idea:||Every new example contributes both the the structure of the model and to the information within the model.||Every new example offers as much new information as is present within the example.|
|Role of domain expertise:||Domain models have minimal say in the final model.||The structure of the model is determined both by a domain model and by mutual co-occurrences within examples.|
|Robust:||Supervised models are fragile to unseen use cases.||Unsupervised models are robust to unseen use cases.|
|Training data:||They do not work well in contexts where there has been no training data.||They can operate upon any data.|
|Class learning:||Their output is discrete. We are forced to teach these models using labels.||Their output is probabilistic and they can learn from labelled data.|
|Final thoughts:||This is problematic since we often do not want to have to learn from a horrible event before being able to prevent it from happening.||Later, upon seeing a set of labels, they can immediately perform inferences on the new class.|
Supervised learning is like putting your model in a classroom where you have a teacher with a lesson plan. The teacher has their own perspective. The lesson plan has its own scope. The student's knowledge may not be very broad, but it will be quite deep in the areas where you have prepared the lessons.
Unsupervised learning is like putting putting your student (the model) on the streets. The student forced to learn things as a part of its daily exploration. At first it wanders around unsure its surroundings. Over time it is able to guide its own exploration.
At the end of the day, the two students meet. The student in the classroom knows a lot about esoteric concepts and taunts the other student. The student from the street has become a grade A bad ass. Upon hearing the taunt the student from the street hits the student from the classroom in the stomach and steals his lunch money. Apparently the student from the classroom had never had a training example that prepared him for that eventuality.
At first glance, we believe that it is easier to perform supervised learning. However, upon inspecting the trade-offs, I argue that unsupervised learning is not only easier, it is more philosophically grounded. At this point the reader may be asking: "So, are you opposed to supervised learning?" No! I'm amazed by the capabilities of some of the modern supervised approaches. I'm merely offering an expose on what I see is a problem in the approach commonly used by Data Science practitioners.
I believe it is better to first start with an unsupervised approach and to take the learning from this approach as input for the supervised approach. We can even use our unsupervised model to generate faux data for our supervised learner. The powerful use-case for supervised learning systems is to powerfully manifest those inferences which we have already successfully grounded through the use of I believe that using the two together in a staged approach represents a sort of synergy that will decrease the fragility of supervised models while offering increased inferencing capabilities over what an unsupervised model could.