Data Scientist?? What’s that?

This is a question I have had to answer not once or twice but almost every time I introduce myself. I have instead developed an easier explanation that more often leaves my audience with even more questions. My default response, it is a science of learning from data that requires skills in Maths, Statistics, Programming and Databases.

Data scientist — the sexiest job of the 21st century as observed by the Harvard Business Review (Davenport and Patil, 2012) — is a profession that has evolved from a combination of well-known professions. It is a merger of what we would presently call business analyst, statistician and a data engineer. It is the intersection of Mathematics, Information Technology, Statistics and Domain expertise. The data scientist job description will, therefore, differ in different fields of expertise. Some domains, such as business, may require more programming skills than traditional scientistic skills, while others such as healthcare require more scientistic and statistical skills like modelling, estimation and prediction capabilities.

A data scientist’s mandate is to find value in the data implying that a deep understanding of the extraction, transformation and storage of data is a basic requirement. With the ongoing information revolution the world’s digital data is huge (big data) and still exploding (Marr, 2018), very recent (ScienceDaily, 2013) and mostly unstructured (Rizkallah, 2017). A lot of data is collected for formalities and referencing and not necessarily for analysis or modelling. As a result, most of the scientist’s time is consumed in structuring the data (converting it into usable formats) for analysis, a process commonly known as Data Cleaning and Preparation. This rigorous step requires intensive programming skills. After cleansing and preparation, data is then explored and depending on the value being mined or insights being sought, scientific methods are applied accordingly.

Machine learning is the most common scientific technique through which the scientist seeks to train an algorithm directly from available data rather than writing the rules (algorithm) by hand. These algorithms learn from the data mostly through trends and classical conditioning — a learning process that occurs when two stimuli are repeatedly paired — and once exposed to new sets of data, makes a prediction of the outcome of interest. Machine learning forms the basis of predictive, business and behavioural analytics. Businesses and other institutions are leveraging on the power of machine learning to make the optimal decisions or predict future. Here are some of the examples:

  1. Donald J Trump 2016 Election Campaign

The current US President was not considered a dark horse in his campaigns. He actually surprised many when he won the primaries despite the resistance among the Republican party leaders. Donald’s success is largely attributed to Cambridge Analytica — a political and marketing firm that provided the ‘Donald J. Trump for President’ campaign with the expertise and insights that helped him win the White House. Cambridge Analytica leveraged on data science to identify the persuadable voters as well as the issues they cared most about. They then used this information to inform custom marketing and political strategies to systematically reach out to potential voters to win their favour or communicate their campaigns messages. The Cambridge Analytica data science team reports having built 20 custom predictive analytics models that could forecast voters behaviour. Using these insights, they could place voters into different categories and determine the best way to influence them through targeted marketing messages.

2. Recommender Engines for e-Commerce (

Do you like it when your online shopping is easy and straightforward? It’s like Amazon knows my type or what I intend to buy. Online shopping has been revolutionized by the power of data analytics. As a customer skims through the products on sale, the system tracks which items the customer clicks on and spends time analyzing. A profile of the items the customer is likely to buy is then generated based on preferences on what they bought before and what is in their wish lists and search histories. Amazon is a perfect example of how e-commerce has succeeded with data. The company had a rough start as a bookseller. Jeff Bezos, the CEO, contended from the start that the site was not just a retailer of consumer products but a technology company whose business was simplifying online transactions for consumers. His strategy was met with scepticism and many believed it would ultimately lose in the marketplace to established bookselling chains, such as Borders and Barnes & Noble, once they had launched competing e-commerce sites. In his article on Amazon’s history, Mark Hall recalls the firm’s initial struggles to make profits until towards the end of 2001 (Hall, 2018). This seemed to validate existing scepticism about the firm’s business model and long-term viability. However as the variety of goods and services Amazon offered expanded, so too did the number of loyal customers who relied on the merchant. And with this customer growth, the firm was able to collect and store massive amounts of rich data about their customers, sellers, purchases and transactions. With this data, Amazon then started developing personalization tools that made it possible to recommend additional products to customers based on their buying behaviour and purchasing history. More recently the firm is increasingly launching new product categories and private labels based on an intimate knowledge of consumer trends and preferences.

3. SuperFluid Labs

There are limitless applications and problems that data science can help solve. I personally work for SuperFLuid Labs based out of Kenya and Ghana. We’re a pioneering African data analytics company that leverages the power of data, machine learning and AI to help businesses harness untapped potential through predictive analytics, business intelligence and dynamic customer insights using both traditional and alternative data sources. For the past year, I have led a data science project with one of the largest energy companies serving multiple countries in East Africa. The client wanted to predict customers’ future payments and utilization rates (the proportion of time the customer has paid and is using the service) over a specified period of time. Our team developed algorithms to predict customers’ future payment behaviours looking at 1, 2 and 3 months into the future. The models are currently in the testing phase with very promising accuracy, precision and recall. These predictions will guide actions such as timely repossession if a defaulting customer is predicted to continue deteriorating over time. Alternatively, suitable interventions such as text message, sales calls or even technician visits to struggling customers could help them recover or remind them to make timeline payments. And finally, customers who are predicted to be suitable candidates for cross-selling or upsell can be better targeted by the sales and marketing teams, making the optimal use of available personnel and time.


Think of how perfect business or life would be if were able to predict the future and take the best actions today. Instead of reacting to circumstances, data science gives us a chance to be proactive with a considerable degree of accuracy and confidence. In whatever business or line of work one may find themselves in, we are no longer consigned to sitting and hoping for better revenues, good weather, loyal customers or even winning elections etc. We have the chance to use data to unearth insights to influence the future to our favour. Listen to the data, heed its often hidden words, and that’s data science.


I would like to thank my colleagues at SuperFluid Labs, especially Erastus KiruiWinifred Kotin and Timothy Kotin, for their comments and feedback in putting together this piece.

Lucy Gathoni Ng’ang’a works for SuperFluid Labs as a Data scientist. She is also an Adjunct Lecturer at the University of Nairobi — Nairobi Kenya, where she lectures Statistics and Mathematics. She holds a Bachelor’s Degree in Statistics and Computing, and a Masters of Science in Mathematical Statistics from the University of Nairobi.


  1. Hall, M. (2018). | History & Facts. [online] Encyclopedia Britannica. Available at: [Accessed 11 Jul. 2018].
  2. Rizkallah, J. (2017). The Big (Unstructured) Data Problem. [online] Available at: [Accessed 11 Jul. 2018].
  3. Marr, B. (2018). How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. [online] Available at: [Accessed 11 Jul. 2018].
  4. Davenport, T. and Patil, D. (2012). Data Scientist: The Sexiest Job of the 21st Century. [online] Harvard Business Review. Available at: [Accessed 11 Jul. 2018].
  5. ScienceDaily. (2013). Big Data, for better or worse: 90% of world’s data generated over last two years. [online] Available at: [Accessed 11 Jul. 2018].

Leave a Reply

Your email address will not be published. Required fields are marked *