Tendencies of Data Engineers and Scientists
In a previous post, I defined the key difference between data engineers and data scientists: data engineers apply engineering methodologies to data problems, while data scientists apply the scientific method to data problems. This fundamental difference in approach leads to interesting dynamics when these roles interact within organizations.
The Core Dynamic
The relationship between data engineering and data science teams is characterized by a natural tension:
- Data engineers work by restricting the domain
- Data scientists work by expanding it
This creates an interesting challenge: data scientists add value through their ability to apply a vast universe of options to data problems, while data engineers provide stability by limiting that universe to something manageable.
Customer Relationship
A key principle to understand is that data engineering serves data science as its customer. The relationship flows like this:
- Data engineering provides the platform and tools
- Data science uses these to create insights
- These insights are pushed to production (dashboards, scoring systems, etc.)
- Business value is derived from the results
Organizational Challenges
The tension between these roles becomes more complex when considering organizational structure. Common approaches include:
- Housing both under engineering
- Creating an independent data group
- Embedding data scientists in business units
- Centralizing data engineering with distributed data science teams
Best Practices
Based on experience, here are some guiding principles for managing these teams:
Keep Teams Close But Distinct
- Separate planning meetings/scrums
- Common management structure
- Shared incentives
Maintain Communication Flows
- Cross-team representatives in planning meetings
- Dedicated liaison role between groups
- Regular joint sessions for knowledge sharing
Foster Innovation Exchange
- New techniques can come from either group
- Avoid silos that trap improvements
- Encourage cross-pollination of ideas
Practical Implementation
To make this work effectively:
- Have representatives from each group attend the other’s planning sessions
- Designate a person to act as a bridge between teams
- Maintain separate day-to-day management while ensuring aligned goals
- Create formal channels for sharing innovations and improvements
Conclusion
The relationship between data engineering and data science is complex but crucial to get right. Success comes from acknowledging their different methodologies while creating structures that allow them to work together effectively. The key is finding the right balance between independence and integration.