Clearing “Mistry” Once for all: Data Engineer v/s Data Scientist?
The other day I finished a meeting with one of my clients whom I am helping to optimize recommendation engine for his online dating app. It’s a small office with around 20 employees and I was discussing my strategy with the technical team. Right from the beginning of my meeting, I noticed one of the new joiners staring & eavesdropping into our conversation. I finished my meeting and went towards coffee wending machine to have a sip of coffee. Suddenly this guy jumped at me and asked: “Are you a Data Scientist?” I said “Yes”.
He: “So do you know Hadoop?”
Me: “No. I don’t”
He: “Oh. Then what kind of Data Scientist you are. Waste fellow!”
I started laughing loudly and asked the boy about his work profile with my client. He explained to me that he is working as an Insider Sales Executive and is very much interested in learning Data Science. He mentioned to me that he is attending free webinars by some of the Online Education Providers. I could see the passion of learning in his eyes so I thought of explaining him different roles across Big Data & Analytics Umbrella.
I am sure many of you must have this confusion and let’s understand this clearly once and for all so that you start taking roles as per what you like.
To begin with, let us have a look at the below Image:
The flow diagram above clearly explains the process through Data Storage, and extraction till Model Deployment. Every profile has its own role to play here. Now let us look at various profiles in detail
Data Engineer majorly deals with data storage aspects. That’s how the process gets the name “Data Warehouse”. Data engineer’s work is getting the data stored correctly using technologies like Hadoop Cluster. He works on creating data pipelines and need to deal with an enormous amount of unstructured data. His role is crucial as Data Analysts and Scientists cannot start the work without him. Therefore, you see him at top of the flow diagram.
Skills & Technologies used by Data Engineer:
- Hadoop ecosystem Scala/Python
- Spark Streaming/Storm/Flink and more
Once the data has been identified correctly, the next step is to perform exploratory data analysis (EDA). Here comes the role of a Data Analyst whose major responsibility is to apply statistical analysis to get insights from the data. Data analysts generally work alongside data scientists or every so often, report directly to management for building visualizations, data extraction, and representation and send reports to various stakeholders.
Skills & Technologies used by Data Analyst:
- Excel for Analytics
- Microsoft PowerBI
- Basic statistical analysis using R or Python.
Now, let us discuss the role of a Data Scientist which essentially covers a wider scope. It encapsulates all the tasks of a Data Analyst. A data scientist integrates multiple aspects like problem identification, formulating the problem in terms of statistical or machine learning models, and building models using programming languages like python, R, etc. A Data Scientist also evaluates various algorithms to check for its accuracy. He also looks for the optimization of models to make them more accurate and efficient. However, the role doesn’t stop there. Once the models are optimized, a Data Scientist has to closely work with a software developer to deploy the model to the front end so that the application comes to real use.
Skills & Technologies used by Data Scientist:
- A Data Scientist needs to have a detailed understanding of the domain for which he/she is solving problems.
- He should possess excellent problem identification and problem-solving skills.
- He should have a good understanding of statistical and mathematical modeling
- Should be excellent at Machine Learning and deep learning algorithms
- Have the expertise of programming languages such as R and Python.
- A Data Scientist should know how to extract data using SQL, NoSQL, and other databases.
- He should be proficient at data visualization and reporting tools like Excel, PowerPoint, and Tableau/Microsoft PowerBI.
Now, you must be wondering which roles are growing and lucrative so that you can pursue them? Well, we all know the velocity at with data is getting generated from multiple sources every second. There are many problems and challenges to be addressed for each role and across various sectors. That’s the reason Harvard Business Review said these are “sexiest jobs of 21st century”.
Well, I think I have enlightened your mind with a small piece of knowledge to decide on your career path. If you want to know more about these career paths, get ahold of us so that we can mentor you towards the right career direction.
Business Toys provides online as well as offline training programs for you to get your first job or to make a career transition in Data Science and Big Data. Our programs are designed to enable the complete practical approach of learning blended with Industry case studies and capstone projects. Business Toys provides complete career guidance and solution which includes resume transformation, job portal rating, interview preparation and hand-holding until you get placed in the desired job profile.