Why a Data Scientist is not a Data Engineer


I found an interesting article on O'Reilly.com ,written by Jesse Anderson, which explains the differences between Data Science and Data Engineering. The two disciplines are not interchangeable—and misperceptions of their roles can hurt teams and compromise productivity.

Big Data Institute

Source: Big Data Institute

What are Data Scientists and Data Engineers?

Data scientists’ skills

At their core, data scientists have a math and statistics background (sometimes physics). Out of this math background, they’re creating advanced analytics. On the extreme end of this applied math, they’re creating machine learning models and artificial intelligence.

Just like their software engineering counterparts, data scientists will have to interact with the business side. This includes understanding the domain enough to make insights. Data scientists are often tasked with analyzing data to help the business, and this requires a level of business acumen. Finally, their results need to be given to the business in an understandable fashion. This requires the ability verbally and visually communicate complex results and observations in a way that the business can understand and act on them.

My one sentence definition of a data scientist is: a data scientist is someone who has augmented their math and statistics background with programming to analyze data and create applied mathematical models.

Data engineers’ skills

At their core, data engineers have a programming background. This background is generally in Java, Scala, or Python. They have an emphasis or specialization in distributed systems and big data. A data engineer has advanced programming and system creation skills.

My one sentence definition of a data engineer is: a data engineer is someone who has specialized their skills in creating software solutions around big data.

Using these engineering skills, they create data pipelines. Creating a data pipeline may sound easy or trivial, but at big data scale, this means bringing together 10-30 different big data technologies. More importantly, a data engineer is the one who understands and chooses the right tools for the job. A data engineer is the one who understands the various technologies and frameworks in-depth, and how to combine them to create solutions to enable a company’s business processes with data pipelines.

Overlapping skills

There is an overlap between a data scientist and a data engineer. However, the overlap happens at the ragged edges of each one’s abilities.

Continue reading on O'Reilly.com


Ready to set off on a BIG journey?

The top notch technologies we use set us apart from other consultancies