A Data Scientist's Definition of Data Science

Data Science can either be a discipline or it can be a profession. The choice is up to the practitioners. It is convenient to move in the direction of a discipline because it is easy. I intend to pursue it as a profession.

Data science is a combination of three approaches: research, domain, and computation

We have to choose. We can perform the same repetitive tasks, the same cookie-cutter analysis using the same algorithms over and over again in an effort to provide the fastest, most generic results possible. Or we can do something different. We can treat it as a profession worthy of the responsibility and accountability that other professions maintain. As a profession, it is more about the practice, the methods, and the contribution to the field. As a discipline it is more about consistent, repetitive results. Disciplines have a history of becoming obsolete. We used to outsource mechanical workers' disciplines to machines. Then we outsourced engineering workers' disciplines to foreign countries. Now we are starting to outsource knowledge workers' disciplines to algorithms. While there is nothing wrong with powerful algorithms, they only have power when contextualized with meaning. For example, the mechanical workers' discipline can only be outsourced successfully if the resultant machine performs a specific task that has meaning based upon the activities of the worker whom it is augmenting.

I recently read an article that contained collaboration from a Physicist, a Computational Biologist, a Virologist, and a Computer Scientist. At first I thought this should be surprising. But, in realizing my own lack of surprise, I began to think about the direction that the field of applied research has been moving. We are witnessing the merging of what used to be three distinct areas of expertise. Each area has its own foundation and each has several examples of occupational specialties.

  • Computational expertise: These fields focus upon establishing a foundation based upon the axioms and certainties of formal logic systems. Reasoning is powerful and certain yet difficult to generalize. Examples disciplines include: computational theory, data integration, database administration, information systems, mathematics, electrical engineering, mechanical engineering, etc.
Computational approach
  • Research design skills: These fields focus upon establishing a foundation based upon the notion of social consensus and causal inference. Reasoning is grounded and generalizes well to external, unseen events. However, reasoning is also limited in expressive power. Examples disciplines include: psychology, industrial engineering, sociology, anthropology, archaeology, civil engineering, etc.
Research approach
  • Domain knowledge: These fields focus upon establishing a foundation based upon the notion of practical value and political will. Reasoning is clear and cohesive but risks becomes dogmatic. Examples disciplines include: business, accounting, education, technology, manufacturing, politics, etc.
Domain approach

The elegance of merging these approach comes, I argue, from the power of merging the established basis upon which each approach is based. This blending allows for the strengths of each to be combined while tending to ameliorate the weaknesses that any one approach offers.
Computation + Research + Domain
When we combine these approach, there are hybrid approach that form at the edges between any two approaches. These interesting combinations of approaches are also becoming more common:

  • Technical specialist: At the edge between domain knowledge and computational expertise, this approach utilizes the advantages of reasoning as it relates to accomplishing a specific, known or defined goal. Example disciplines include: financial analysis, security analysts, DBA, computational biology, applied machine learning, etc.
  • Statistician: At the edge between research design skills and computational expertise, this approach utilizes the advantages of reasoning and computational inference to infer causality and to predict. Example disciplines include: actuarial sciences, machine learning,
  • Applied scientist: At the edge between domain knowledge and research design skills, this approach utilizes the advantages of clearly defined problems and research methods to discover new phenomenon. Example disciplines include: climatology, consumer behavior, marketing research

At this point the reader is of course wondering what about the intersection of all three approaches. This interesting intersection is one which we have seen before, but which has become increasingly common lately.

  • Data Sciences: At the intersection between domain knowledge, computational expertise, and research design skills this approach utilizes the advantages of reasoning, problem definition, and research design to loosen the stranglehold that hard problems. Examples include: human factors engineering, human machine systems engineering (HMSE), human-computer interaction (HCI). Previously we may have called these individuals applied scientists or research engineers.

As a profession Data Science offers to unite disparate fields. This is not a unique phenomenon. Other professions merge these three approaches. For example, the medical profession merges research, science, and business to treat patients. These three approaches are synergistic. Merging all three approaches will yield unexpected results that are more than the simple summation of the three parts. This means that in the long run the Data Science discipline can become transcendent of any one predecessor approach. When this happens it becomes its own profession. As a profession, it should begin to receive special recognition as its own part of the cultural estate. It should become its own area of research along with honored degrees and esteemed forerunners.

Data science is a combination of three approaches: research, domain, and computation

This whole post may seem a little (or a lot) self-serving since I am a Data Scientist. However, that is not my intent. Instead, I am trying to illustrate how this field is different than how I have recently seen it characterized. It is not simply applying machine learning to a business’ data. It is different than creating a self-organizing algorithm. It is not something that can be taught through the use of one or a few online courses. It is not a 10-credit certificate.

Data Science is the science and practice of applying computational expertise, domain knowledge, and research design skills to define, model, and discover in the context of ill-defined, complex problems. The challenge of course is the complexity and breadth of knowledge required to pursue this discipline. If it really requires someone to become an expert in three different fields, then how can we expect to fulfill the need for Data Scientists? As the profession advances, the required skillsets will solidify and it will become clearer how one can pursue this profession.