Tendencies of Data Engineers and Scientists

In a previous post, I defined the key difference between data engineers and data scientists: data engineers apply engineering methodologies to data problems, while data scientists apply the scientific method to data problems. This fundamental difference in approach leads to interesting dynamics when these roles interact within organizations.

The Core Dynamic

The relationship between data engineering and data science teams is characterized by a natural tension:

Data engineers work by restricting the domain
Data scientists work by expanding it

This creates an interesting challenge: data scientists add value through their ability to apply a vast universe of options to data problems, while data engineers provide stability by limiting that universe to something manageable.

Customer Relationship

A key principle to understand is that data engineering serves data science as its customer. The relationship flows like this:

Data engineering provides the platform and tools
Data science uses these to create insights
These insights are pushed to production (dashboards, scoring systems, etc.)
Business value is derived from the results

Organizational Challenges

The tension between these roles becomes more complex when considering organizational structure. Common approaches include:

Housing both under engineering
Creating an independent data group
Embedding data scientists in business units
Centralizing data engineering with distributed data science teams

Best Practices

Based on experience, here are some guiding principles for managing these teams:

Keep Teams Close But Distinct
- Separate planning meetings/scrums
- Common management structure
- Shared incentives
Maintain Communication Flows
- Cross-team representatives in planning meetings
- Dedicated liaison role between groups
- Regular joint sessions for knowledge sharing
Foster Innovation Exchange
- New techniques can come from either group
- Avoid silos that trap improvements
- Encourage cross-pollination of ideas

Practical Implementation

To make this work effectively:

Have representatives from each group attend the other’s planning sessions
Designate a person to act as a bridge between teams
Maintain separate day-to-day management while ensuring aligned goals
Create formal channels for sharing innovations and improvements

Conclusion

The relationship between data engineering and data science is complex but crucial to get right. Success comes from acknowledging their different methodologies while creating structures that allow them to work together effectively. The key is finding the right balance between independence and integration.