Data Science Things Roundup #13

In this edition of Data Science Things Roundup, I’m sharing three interesting developments from the world of data science that caught my attention recently.

IBM Granite: Enterprise AI with Clear IP Rights

IBM has launched Granite, their third generation of AI language models, with a strong focus on responsible development and clear licensing terms. What sets it apart is IBM’s approach to IP protection and data rights:

  1. Clean Data: Implements strict content filtering and quality checks for training data
  2. Clear IP Terms: Provides standard contractual IP indemnification
  3. Open Source: Available under Apache 2.0 license, from sub-billion to 34B parameters

This comes at a crucial time, as the industry grapples with training data controversies like Meta’s recent copyrighted books allegations.

Le Chat: European Open Source AI

Mistral AI’s Le Chat emerges as Europe’s answer to US-dominated AI chatbots. Built in Paris, it demonstrates that high-performance AI can thrive outside Silicon Valley while adhering to European values:

  1. European-First: Built with EU principles around AI and privacy
  2. Open Source: Transparent development and community involvement
  3. High Performance: Competitive speed and capabilities

Open Deep Research: DIY Deep Research Tool

Open Deep Research offers an open-source alternative to proprietary deep research tools from OpenAI and Google. Instead of expensive fine-tuned models, it cleverly combines:

  1. Firecrawl: For web data extraction and search
  2. Flexible Models: Support for various AI providers including Deepseek
  3. Modern Stack: Built on Next.js with extensible architecture

This project shows how the open-source community can build sophisticated research tools that rival proprietary alternatives.


Did you enjoy this? Check out these previous editions: