According to the World Economic Forum Report data scientist is one of the most sought after professional and on the rise in 2020/2021 and it will be for a while.
In 2012 when Harvard Business Review published the article as the most sexy profession it is strange someone saying that does not want to be a data scientist?
Why not if it is so hot and so cool and it pays so much?
I have worked as a Data Scientist. I had a Data Scientist title and had this label stuck on my back. I worked in a team of data scientists, developed data science models with real world data and have reasonable knowledge about the matter and credentials to perform the role of a Data Scientist.
Is there something wrong with me?
I will explain why there is nothing wrong with me and that it is OK not wanting to be a Data Scientist. Also I hope this will help you to reflect why data science may not be your career path.
Do not get me wrong. I think data science is very interesting subject and I still find it fascinating. This is what got me interested in it and I believe that data science and AI can do a lot for the world to become a better place in so many aspects. Also it is really interesting studding and understanding how it works. I really enjoyed the process as I love learning new things.
Do not want to ramble to much more and I will just list the reasons and develop further each point.
- Kaggle competitions never got me hooked
Since when I started my interest in data science I heard about this thing called Kaggle. I explain my journey into data here. If you are reading this post you probably know what is Kaggle. It is a Data Scientist community where data scientists can share their knowledge by publishing notebooks with data analysis and most important it is a place where you can compete with other data scientists to solve data problems involving predictive analytics. It is a great place to learn and develop your data science career. I have to confess one thing. I never felt compelled to participate in any of those competitions to build model to predict something, to be honest. I should have listened to this big warning at the start before getting too involved in the field and in the theory. I mean, if I am not drawn and excited to this type of problem solving it is a great sign that I had not much interest in becoming a data scientist. Not even doing data science tutorials excited me too much or even browsing notebooks to learn the data science code.
- There is too much ground to cover and eventually it will lead to burn out
It was not only once but many times I simple got overwhelmed by the amount of knowledge you need to accumulate to be up to the challenge to be a real good data scientist. Just the whole theory content is huge like the type of models and algorithms, statistics, probability, maths, linear algebra, Python libraries, evaluation techniques, sampling, munging data and etc and etc . It is not just linear regression but there are random forest, clustering algorithms, deep learning, reinforcement learning and in each of this areas you can go deep from doing things from scratch to using various types of Python and R libraries. Maybe the problem is trying to be a generalist only, I think it is too much you have to know to just keep up. On top of it there are basically 1 paper published daily in a subject that you may be solving a problem. I just did not have the drive to be on top of the latest tech. Like I learned Tensorflow with Keras back end and could scratch it and build and train some models. But no, this is not enough. There was this cool new library called PyTorch I felt that I had to learn to be on top of things. Then if a project involved NLP I learned NLTK and then there was this thing called Spacy which I also learned. I realized I was just happy doing some modelling and play around with SkitLearn on the side as a hobby to fulfil an internal curiosity that I really enjoyed.
I known people that really enjoyed and put the hours to excel at data science and they were really invested in it and had the energy to do a good job and they had the passion to keep pushing and enjoyed the process. If this is your case, by all means you need to go for it and pursue that dream. I once thought it was mine but after going through I realized that this was one aspect that played a good part and made me reflect if I really wanted to be a data scientist.
- I don’t enjoy the detail process of modelling and doing data science
On top of having to keep up and learn all the theory I really did not enjoy the process involved in being a data scientist like building the model, finding the patterns, training the model over and over and experimenting and evaluating parameters. I also am not too invested in the mathematics of it and the knowledge I had was much more the intuition of it. Which was ok to be a applied data scientist and I was not doing any research or building anything from scratch. I just needed to understand the API which in the end was very high level. However, I think to really be a good data scientist you really need to understand what is going on behind the maths and that interest did not spark me at all.
- I enjoy more working in automation and building
In the process of learning Python to do data science I realized that I enjoyed much more on working in building applications and infra structure. I was involved in a project where I worked with Flask where the team was building an API to service a few machine learning models and I found this much more interesting things than doing the model itself. Also I got involved in a project where I was kind the DevOps Engineer and had to deploy infra structure as code using GitLab CI/CD pipelines, Kubernetes and Docker containers. Also building an instance of AirFlow running in Kubernets and building pipelines in pure Python. I find solving this kind of problem of software and data engineer not only more interesting but more useful for an organisation than pure data science and model building.
- Most companies do not need a data scientists
From experience I realized that most of the companies not only cannot afford a data scientist but they just do not need one. There are so many other problems with data that need to be solved before doing any data science with the data. Just to give a few examples, companies would better spending money just analyzing data for process improvement and that would require data analysts, business analyst and of course data engineers to create pipelines and built a basic data warehouse or data platform and a self service BI infrastructure to monitor performance and business processes. This in itself already brings tremendous business return on investment doing zero modelling, machine learning or AI.
- Most data science projects stays either in a ppt or in a jupyter notebook
I have seen a lot of money invested in data science and models that end up only presented in PowerPoint and never go to production. I see great frustration of data scientists never seeing their model in production or the business trusting that the output of the model is any better than the human intelligence. At least in my experience the only AI and machine learning I have seen are those running in FAANG (Facebook, Apple, Amazon, Netflix and Google) companies or some specialist AI company that specialized in a niche AI in a particular subject matter expertize and those really good systems take years, lots of man power, experimenting, failure and obviously million of dollars in investments.
- Integration and operations are more important than the in-house development
Instead of developing data science in-house I do see a trend which is a company ability to purchase a pre-trained model to solve a particular problem and integrate with its systems. Just to give an example a company that wants to optimize its rostering and then buy an AI roster optimization system that can be called with an API. It is just matter of integration work of inputting the company data and doing something internally with the API response. This apply to other problems like computer vision, text analytics, anomaly detection among other problems. And to do this work the pure data science skill is not required, unless you work in a specialized AI company. What is required in a company are more data analyst, data engineering, DevOps and software developers to build the infrastructure to integrate with the organizational systems.
- Most data is not data science ready
As I already mentioned above. There is a lot to do before any data science start in a company. The reason is that sometimes a company does not even have visibility of their own data. There is not even reporting or the systems do not record the data properly in a way that allow any analysis let alone data science. A lot of work in systems design and business improvement need to be made in order to be able to have good data capture and quality to make data science and this is done by data engineers, system architects and software engineers. To be honest once this is all running then the data science part can be the “simplest” part of it all.
I am very happy with this decision and direction I am taking and this is my current drive and it is keeping me motivated, keeping a sane balance in between work and personal life.