Tendencies of Data Engineers and Scientists

In a previous post, I defined the key difference between data engineers and data scientists: data engineers apply engineering methodologies to data problems, while data scientists apply the scientific method to data problems. This fundamental difference in approach leads to interesting dynamics when these roles interact within organizations.

The Core Dynamic

The relationship between data engineering and data science teams is characterized by a natural tension:

  • Data engineers work by restricting the domain
  • Data scientists work by expanding it

This creates an interesting challenge: data scientists add value through their ability to apply a vast universe of options to data problems, while data engineers provide stability by limiting that universe to something manageable.

Customer Relationship

A key principle to understand is that data engineering serves data science as its customer. The relationship flows like this:

  1. Data engineering provides the platform and tools
  2. Data science uses these to create insights
  3. These insights are pushed to production (dashboards, scoring systems, etc.)
  4. Business value is derived from the results

Organizational Challenges

The tension between these roles becomes more complex when considering organizational structure. Common approaches include:

  • Housing both under engineering
  • Creating an independent data group
  • Embedding data scientists in business units
  • Centralizing data engineering with distributed data science teams

Best Practices

Based on experience, here are some guiding principles for managing these teams:

  1. Keep Teams Close But Distinct

    • Separate planning meetings/scrums
    • Common management structure
    • Shared incentives
  2. Maintain Communication Flows

    • Cross-team representatives in planning meetings
    • Dedicated liaison role between groups
    • Regular joint sessions for knowledge sharing
  3. Foster Innovation Exchange

    • New techniques can come from either group
    • Avoid silos that trap improvements
    • Encourage cross-pollination of ideas

Practical Implementation

To make this work effectively:

  • Have representatives from each group attend the other’s planning sessions
  • Designate a person to act as a bridge between teams
  • Maintain separate day-to-day management while ensuring aligned goals
  • Create formal channels for sharing innovations and improvements

Conclusion

The relationship between data engineering and data science is complex but crucial to get right. Success comes from acknowledging their different methodologies while creating structures that allow them to work together effectively. The key is finding the right balance between independence and integration.