Artificial Intelligence & Machine Learning

Artificial Intelligence & Machine Learning

This section contains my writings on artificial intelligence and machine learning, drawing from years of experience leading AI initiatives across various organizations. Topics range from practical implementation advice to theoretical discussions and industry trends.

Stargazers CLI Update: Nested Commands, Account Trends, and Plotting!

May 20, 2025

Tags:stargazers, github, cli, open source, python

Announcing the latest stargazers CLI update: all commands now under 'stargazers', plus new account-trend analysis and plotting features.

Read more →

Mutation Testing with mumut for Pygeohash

May 19, 2025

Tags:python

Mutation testing checks if your tests catch real bugs by making small code changes. Learn how it works and why it matters.

Read more →

Digging into Code Churn with GitPandas

May 16, 2025

Tags:code churn, data-analysis, git, gitpandas, python

Quantify code churn in your Git repositories with the gitpandas Python library. Analyze file change rates and spot areas of high activity or instability.

Read more →

Refactoring Library Interfaces

May 15, 2025

Tags:api-design, best-practices, library-development, programming, python, refactoring

Discover techniques for improving library interfaces through thoughtful refactoring, using real-world examples while maintaining backward compatibility.

Read more →

Context-Aware Library Design: Build for Your Users

May 14, 2025

Tags:api-design, best-practices, library-development, programming, python, user-experience

Learn to design Python libraries that adapt to various user needs and experience levels, ensuring simplicity and effectiveness for all users.

Read more →

Who Holds the Keys? Calculating Bus Factor with GitPandas

May 13, 2025

Tags:bus factor, data-analysis, git, gitpandas, python, risk management

Explore the concept of bus factor and how gitpandas can help you quantify knowledge distribution and risk in your software project.

Read more →

The ECF Rating System: The British Approach to Chess Ratings

May 5, 2025

Tags:algorithms, elote, open-source, python, rating-systems

Explore the English Chess Federation (ECF) rating system, its unique performance-based approach, and its distinct history from Elo.

Read more →

Writing Tools MCP: A Toolkit for Better Writing

May 5, 2025

Tags:mcp, writing, tools, open-source, python

Boost your writing workflow with custom tools for Markdown and content analysis. Discover scripts and automations for faster, better publishing.

Read more →

Glicko-2: Adding Volatility to the Rating Equation

April 30, 2025

Tags:algorithms, elote, glicko, open-source, python, rating-systems

Explore Glicko-2, an enhancement over Glicko-1, by incorporating player volatility for more precise and accurate ratings.

Read more →

The Glicko Rating System: When Confidence Matters

April 29, 2025

Tags:algorithms, elote, glicko, open-source, python, rating-systems

Explore the Glicko rating system, an Elo enhancement adding Rating Deviation (RD) to measure confidence. Learn its mechanics and implementation with Elote.

Read more →

Handling Deprecation: Gracefully Retiring Features

April 28, 2025

Tags:api-design, best-practices, development, library, maintenance, python

Learn to deprecate Python library features gracefully with warnings, clear communication, and migration paths that minimize disruption to your users.

Read more →

McCabe Complexity: The Python Metric You Should Care About

April 24, 2025

Tags:best-practices, code-quality, development, python, tools

Learn about McCabe Complexity, a key metric for code complexity. Understand, measure with tools like Ruff, and manage complexity in Python projects.

Read more →

Python Logging Best Practices for Library Developers

April 20, 2025

Tags:best-practices, development, library, python

A comprehensive guide to implementing logging in Python libraries - from basic setup to advanced patterns and common pitfalls to avoid

Read more →

Introducing 'stargazers': A Tool to Understand Your GitHub Audience

April 16, 2025

Tags:cli, community, github, python

Announcing 'stargazers': A CLI tool to fetch, analyze & summarize GitHub stargazers/forkers for any public repo. Inspired by Cockroach Labs' analysis.

Read more →

HashingEncoder: Tackling Extreme Cardinality with the Hashing Trick

April 15, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

Explore HashingEncoder: learn how it handles extreme cardinality, its mechanism, ideal use cases, and implementation with the category_encoders library.

Read more →

BinaryEncoder: The Space-Efficient Alternative to One-Hot Encoding

April 13, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

Explore the BinaryEncoder from category_encoders: a space-efficient alternative to one-hot encoding. Learn how it works, when to use it, and see implementation.

Read more →

OrdinalEncoder: When Order Matters in Categorical Data

April 10, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

Explore OrdinalEncoder for categorical data where order matters. Learn how it works, its benefits, and implementation using the category_encoders library.

Read more →

Makefiles: The Unsung Hero of Python Development

April 8, 2025

Tags:automation, development, python, tools

Explore why Makefiles, though old-school, are invaluable for Python projects. Learn how they provide consistent, simple shortcuts for common dev tasks.

Read more →

Modern Python Package Publishing: PyGeoHash's New CI/CD Pipeline

April 6, 2025

Tags:development, github-actions, pygeohash, python

PyGeoHash uses GitHub Actions, cibuildwheel, and PyPI's Trusted Publisher for automated, secure multi-platform Python package builds and publishing.

Read more →

PyGeoHash Gets Type Hints: A Journey into Modern Python

April 3, 2025

Tags:development, open-source, pygeohash, python, type-hints

Enhance PyGeoHash with comprehensive type hints and a new types module, improving developer experience and code quality for modern Python.

Read more →

Optimal Bankroll Management with Keeks: The Kelly Criterion

April 1, 2025

Tags:betting, finance, keeks, kelly-criterion, open-source, python

Dive into the Kelly Criterion, the foundation of optimal betting and bankroll management. Learn its theory, pros, cons, and implementation with Keeks.

Read more →

Documenting Your Library's API: Best Practices

March 30, 2025

Tags:autodoc, best-practices, development, docstrings, documentation, library, python, sphinx

Build a clear, comprehensive API reference with Sphinx & autodoc. Learn best practices for structure, content, cross-referencing your Python library docs.

Read more →

OneHotEncoder: The Workhorse of Categorical Encoding

March 27, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

A comprehensive guide to OneHotEncoder in category_encoders, exploring its core functionality, advantages, and practical limitations in machine learning.

Read more →

Elo Rating System: The Grandfather of Competitive Rankings

March 25, 2025

Tags:algorithms, elo, elote, open-source, python, rating-systems

Deep dive into the Elo rating system: history, the math behind it (K-factor, expected score), and practical implementation with the Elote Python library.

Read more →

Automating Docs Deployment with GitHub Actions and Pages

March 23, 2025

Tags:automation, ci-cd, development, documentation, github-actions, library, python, sphinx

Keep docs current automatically! Learn to set up GitHub Actions to build Sphinx docs & deploy to GitHub Pages, ensuring sync with your code changes.

Read more →

Crafting Code Examples: From Snippets to Real-World Scenarios

March 22, 2025

Tags:development, documentation, library, python, sphinx

Master the art of crafting clear, runnable code examples for documentation using doctest, tutorials, and scripts to enhance user understanding.

Read more →

Keeks 0.1.0 Release: Optimal Bankroll Management Made Simple

March 18, 2025

Tags:betting, finance, keeks, kelly-criterion, open-source, python

Keeks 0.1.0 is here! Python library for optimal bankroll management & betting strategies (Kelly Criterion). Includes simulation and visualization tools.

Read more →

Getting Started with Sphinx for Python Project Documentation

March 15, 2025

Tags:development, docstrings, documentation, library, python, restructuredtext, sphinx

Learn how to use Sphinx to generate professional, cross-referenced documentation for your Python projects, covering installation, configuration, and autodoc.

Read more →

Elote 1.0.0 Release: Rating Systems Made Simple

March 13, 2025

Tags:elo, elote, open-source, python, rating-systems

Announcing Elote 1.0.0, a Python library that simplifies the implementation of rating systems like Elo, Glicko, and TrueSkill for competitive ranking.

Read more →

PyGeoHash v3.0.0: Faster, Freer, and More Pythonic

March 11, 2025

Tags:geospatial, licensing, open-source, performance, pygeohash, python

Deep dive into PyGeoHash v3.0.0: a major release with a pure CPython rewrite, MIT relicensing, and dramatic performance gains. Faster & freer geohashing!

Read more →

Using Cursor for Open Source Library Maintenance

March 9, 2025

Tags:ai, cursor, elote, keeks, open-source, pygeohash, python

How AI tools like Cursor simplify open source library maintenance by assisting with code understanding, environment setup, housekeeping, and project standards.

Read more →

Effective Docstrings: Google vs. NumPy vs. reStructuredText Styles

March 6, 2025

Tags:autodoc, development, docstrings, documentation, library, python, restructuredtext, sphinx

Learn to write clear docstrings Sphinx understands. Compares Google, NumPy, and reStructuredText formats for effective Python library documentation.

Read more →

PyGeoHash 2.1.0: Modernizing a Geospatial Python Library

March 4, 2025

Tags:cursor, geohash, geospatial, open-source, python

A look at the latest updates to PyGeoHash, a lightweight Python library for working with geohashes, and how modern AI tools helped revitalize the project.

Read more →

Geohash: When Clever Isn't Always Smart

March 2, 2025

Tags:algorithms, data-engineering, geohash, python

A deep dive into the limitations of the Geohash algorithm, including boundary issues, non-uniform cell sizes, and common implementation mistakes to avoid.

Read more →

Where Did All the RAM Go? Memory Profiling with Memray

March 1, 2025

Tags:development, library, optimization, performance, profiling, python, testing

High CPU isn''t the only performance issue. Learn how Memray helps track memory leaks and excessive allocation in your Python library to optimize usage.

Read more →

Claude 3.7 and new Cursor: first impressions

February 26, 2025

Tags:ai, data-science, tools

First impressions of Anthropic's Claude 3.7 model and the latest Cursor AI coding assistant update. Exploring improved reasoning, speed, and new features.

Read more →

Finding the Slowdown: Profiling Python Code with Pyinstrument

February 25, 2025

Tags:development, library, optimization, performance, profiling, python, testing

Your benchmark says a function is slow, but why? Profilers like Pyinstrument help you pinpoint exactly where your Python code is spending its time.

Read more →

How Fast Is It? Benchmarking Your Code with Pytest-Benchmark

February 22, 2025

Tags:development, library, performance, pytest, python, testing

Performance matters! Easily measure Python library speed with pytest-benchmark. Track performance, find regressions, and optimize effectively with benchmarks.

Read more →

Silos to Shared Libraries: Guide to Inner Source Adoption

February 18, 2025

Tags:best-practices, development-practices, inner-source, library-development, python, security

Guide for transitioning from team-specific code to shared libraries, covering governance models, security, and standardized development practices.

Read more →

Mastering Mocking in Python with pytest-mock

February 16, 2025

Tags:pytest, python, testing

A practical guide to mocking in Python testing - from basic concepts to advanced techniques with pytest-mock and other helpful libraries

Read more →

Building Your Internal Library Developer Community

February 15, 2025

Tags:best-practices, corporate-culture, development-practices, inner-source, library-development, python

Explore strategies for building a thriving community of library developers in your organization through effective incentives, recognition, and collaboration.

Read more →

Will It Blend? Testing Across Environments with Tox

February 13, 2025

Tags:ci-cd, development, library, pytest, python, testing

Works on your machine? Great, but what about Python 3.9 or 3.12? Tox ensures library compatibility across different Python versions and dependency sets easily.

Read more →

Inner Source: Bringing Open Source Culture Inside Your Organization

February 11, 2025

Tags:best-practices, corporate-culture, development-practices, inner-source, library-development, open-source, python

Learn how to harness the power of open source development practices within your organization through inner source principles and practices.

Read more →

Data Science Things Roundup #13

February 10, 2025

Tags:data-science, machine-learning, resources, roundup

Data Science Roundup #13: Exploring IBM Granite for enterprise AI, Mistral AI's Le Chat for Europe, and Open Deep Research for DIY AI tools.

Read more →

Are Your Tests Enough? Measuring Coverage with Coverage.py

February 9, 2025

Tags:code-quality, development, library, pytest, python, testing

Writing tests is step one. Step two is knowing what parts of your library code those tests actually exercise. Enter Coverage.py.

Read more →

Designing for Developer Joy: Python Library Ergonomics

February 6, 2025

Tags:api-design, best-practices, developer-experience, library-development, programming, python

What makes Python libraries joyful? Explore API ergonomics, from naming conventions & sensible defaults to helpful error messages that guide users to solutions.

Read more →

Why Your Library Needs Pytest (And How to Get Started)

February 4, 2025

Tags:code-quality, development, library, pytest, python, testing

Testing is vital for Python libraries. Explore why it''s crucial and how Pytest simplifies writing powerful tests with less boilerplate and better assertions.

Read more →

The Art of API Design: Making the Right Things Easy

February 3, 2025

Tags:api-design, best-practices, developer-experience, library-development, programming, python

Learn principles of intuitive Python API design that make common operations simple and guide users toward best practices, while preserving advanced features.

Read more →

Secure Coding Practices for Python Library Developers

February 2, 2025

Tags:best-practices, development, input validation, library, python, secure-coding, security

Beyond tools, what principles guide secure Python library development? Explore essential practices: input validation, least privilege, error handling, and more.

Read more →

Taming the Python Chaos: Linting & Formatting with Ruff

January 30, 2025

Tags:ci-cd, code-quality, development, github-actions, python, ruff

What linting and formatting actually are, why they matter (a lot!), and how the speedy tool Ruff can save your Python project (and your sanity).

Read more →

Handling Sensitive Data Securely Within Your Python Library

January 29, 2025

Tags:development, library, python, secure-coding, security

Handle sensitive data in Python libraries securely. Learn best practices for managing API keys, passwords, PII, and other secrets without exposing them in code.

Read more →

Decoding Library Updates: Understanding Semantic Versioning (SemVer)

January 28, 2025

Tags:dependencies, development, library, packaging, pip, python

Guide to Semantic Versioning (SemVer) for Python library authors. Understand MAJOR.MINOR.PATCH rules to communicate changes and manage dependencies.

Read more →

Dependency Security: Managing Vulnerabilities with pip-audit

January 27, 2025

Tags:dependencies, development, library, python, security, vulnerabilities

Your library relies on packages. Learn how to use pip-audit to scan your dependencies for known security vulnerabilities and keep your users safe.

Read more →

The Center of Your Python Project: Understanding pyproject.toml

January 26, 2025

Tags:development, library, packaging, pytest, python, ruff

From setup.py chaos to pyproject.toml clarity. Learn why it exists, how it standardizes Python packaging/tool config via PEPs (518, 517, 621), and its anatomy.

Read more →

Bandit Security Rules: Finding Common Python Security Issues

January 25, 2025

Tags:development, library, python, ruff, secure-coding, security, vulnerabilities

Learn how to use Ruff's Bandit integration to automatically scan your Python code for common security pitfalls through static analysis.

Read more →

Don't Forget the Fine Print: Licensing Your Python Library

January 24, 2025

Tags:compliance, dependencies, development, library, licensing, open-source, python

Choosing an open-source license is crucial. Understand common options (MIT, Apache, GPL), why compatibility matters, and how to comply with obligations.

Read more →

Building and Engaging a Community Around Your Open Source Library

January 22, 2025

Tags:community, development, github, library, maintenance, open-source, python

Attract users, encourage contributions, and build a welcoming environment for your open source library. Learn practical steps for community engagement.

Read more →

The Library Author's Dilemma: Managing Python Dependencies

January 21, 2025

Tags:best-practices, dependencies, development, library, packaging, pip, python

Python library dependency management balances features vs user pain. Explore best practices for choosing, versioning (~= compatible release), and maintenance.

Read more →

Data Science Things Roundup #12

January 20, 2025

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #11: Discover hidden gems in data science, featuring ModernBERT, ReaderLM v2, and Cohere''s Rerank 3.5. Catch up on what''s new!

Read more →

Avoiding Common Pitfalls: Injection Flaws in Python Libraries

January 18, 2025

Tags:development, input validation, library, python, secure-coding, security

Injection flaws aren''t just for web apps. See how SQL & command injection affect Python libraries via input handling, and learn crucial prevention techniques.

Read more →

The Art of Saying No: Defining Your Python Library's Scope

January 17, 2025

Tags:design, development, library, python

Why keeping your Python library focused is harder than it looks, and how saying 'no' can be your most powerful design tool.

Read more →

SDLC in the Age of AI

January 12, 2025

Tags:ai, data-science, programming, software-development

Exploring how AI reshapes software development: natural language programming, documentation as prompt engineering, and the role of `.cursorrules` files.

Read more →

From Weekend Hack to Core Tool: The category_encoders Journey

December 27, 2022

Tags:data-science, machine-learning, open-source, python

Explore category_encoders' journey from a weekend Python experiment to a widely used data science library, now part of scikit-learn-contrib.

Read more →

Investment Review: Seer.ai

October 18, 2022

Tags:ai, analytics, angel-investing, deal-review, investing, machine-learning, startups

A review of my angel investment in Seer.ai, exploring how they align with my investment thesis and their unique value proposition in AI-powered analytics.

Read more →

Category Encoders v1.2.8 Release

June 4, 2018

Tags:category-encoders, data-science, open-source, python

Announcing Category Encoders v1.2.8! This release includes important bugfixes and introduces new features like optional category names in output columns.

Read more →

TechEmergence Podcast and Atlanta AI Article

June 3, 2018

Tags:ai, atlanta, data-science, podcast, predictive-maintenance

Discover how predictive maintenance is revolutionizing AI in Atlanta, highlighting the city's strengths, future prospects, and unique opportunities for growth.

Read more →

Data Engineering Podcast

February 20, 2018

Tags:data-engineering, data-science, podcast, predictive-maintenance

Hear insights on the Data Engineering Podcast discussing predictive maintenance, industrial IoT data challenges, and essential data engineering best practices.

Read more →

Category Encoders published in JOSS

January 26, 2018

Tags:category-encoders, data-science, open-source, python

Announcing the category_encoders Python package publication in JOSS. Discover the peer-review process and citation details.

Read more →

The Problem with Industrial IoT

January 16, 2018

Tags:data-science, technology

Industrial IoT promises much but faces challenges in adoption and implementation. Explore the hurdles of data quality, integration, security, and proving ROI.

Read more →

Revisiting Python support in Apache Flink

January 11, 2018

Tags:big-data, data-science, python

Early 2018 look at Apache Flink's Python support. Checking compatibility, batch vs streaming capabilities, & future developments like Streaming API support.

Read more →

Tendencies of Data Engineers and Scientists

January 9, 2018

Tags:data-engineering, data-science, engineering-management

Explore the relationship dynamics and challenges between data engineering and data science teams, including their approaches, collaboration, and best practices.

Read more →

I Made a Model, Now What?

January 4, 2018

Tags:atlanta, data-science, machine-learning

Practical insights from a PyData Atlanta talk on successfully deploying and maintaining machine learning models in production environments.

Read more →

On taking things too seriously: holiday edition

December 9, 2017

Tags:data-science, python, rating-systems, sports

Building a CFB bowl game prediction system with Python packages elote, keeks, & keeks-elote. Combines rating, betting strategies, and backtesting for analysis.

Read more →

Elote: a python package of rating systems

December 6, 2017

Tags:data-science, machine-learning, python, rating-systems

Introducing Elote, a Python package implementing various rating systems like Elo and Glicko. Learn its core concepts and see how to use it for ranking.

Read more →

Ripyr: sampled metrics on datasets using python's asyncio

November 28, 2017

Tags:data-science, python, type-hints

An introduction to ripyr, a Python library for streaming through large datasets and parsing basic metrics using asyncio and type hinting

Read more →

Category Encoders v1.2.5 Release

November 22, 2017

Tags:category-encoders, data-science, machine-learning, open-source, python

Category Encoders v1.2.5 brings community updates including stable binary/BaseN encoding, new leave-one-out encoding, and pandas compatibility fixes.

Read more →

Data Science Things Roundup #11

September 23, 2017

Tags:data-science, finance, roundup

A collection of interesting data science articles and projects, including SEC keynotes, Bayesian inference, and visualization tools

Read more →

git-pandas Caching: Faster Analysis

July 25, 2017

Tags:data-analysis, git, pandas, performance, python

Boost git repository analysis speed! Learn how git-pandas now uses caching to dramatically improve performance for repeated queries on large codebases.

Read more →

Category Encoders v1.2.4 Release

July 12, 2017

Tags:category-encoders, data-science, machine-learning, python

Category Encoders v1.2.4 is out! Includes pandas categorical type support, improved missing value handling, better error messages, BaseN fixes, and docs.

Read more →

Data Science Things Roundup #10

April 19, 2017

Tags:data-science, machine-learning, resources, roundup

A curated collection of data science articles and tools exploring network analysis, StashPy for log processing, and Bayesian survival analysis techniques.

Read more →

Data Science Things Roundup #9

March 12, 2017

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #9: Highlighting Pedro Domingos'' ML paper, Spyre (Shiny for Python), and BetaGo, an AlphaGo-inspired Go bot framework.

Read more →

Data Science Things Roundup #8

January 25, 2017

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #8: Dive into LIME for model interpretation, sklearn-expertsys for interpretable classifiers, and the value of nearest neighbors.

Read more →

BaseN Encoding Grid Search in Category Encoders

December 18, 2016

Tags:category-encoders, data-science, machine-learning, python

Explore category_encoders' BaseN encoder for representing categorical data. Learn how to use scikit-learn's grid search to find the optimal encoding base.

Read more →

Category Encoders accepted into scikit-learn-contrib

November 20, 2016

Tags:category-encoders, data-science, open-source, python

Category Encoders, a Python library for encoding categorical variables, has been accepted into the scikit-learn-contrib ecosystem. A project milestone!

Read more →

Data Science Things Roundup #7

November 10, 2016

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #7: Python-focused edition featuring Intel's Python Distribution, Go-Python for extensions, and PyFilesystem for unified access.

Read more →

Category Encoders now on conda-forge

September 17, 2016

Tags:category-encoders, data-science, open-source, python

The category_encoders Python package is now available on conda-forge, making installation easier for Conda users. Learn about the package and feedstock.

Read more →

Data Science Things Roundup #6

July 20, 2016

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #6: Focuses on calendar visualizations with D3.js Calendar Heatmap and Bostock's Calendar View, plus insights on coding interviews.

Read more →

Introducing unified glob-syntax in git-pandas

June 15, 2016

Tags:data-science, git-pandas, python

Explore the unified glob syntax (`include_globs`, `ignore_globs`) for git-pandas v2.0, offering flexible file pattern specification and usability.

Read more →

Parallelizing cumulative blame in git-pandas with joblib

June 12, 2016

Tags:data-science, git-pandas, performance, python

Boost git-pandas cumulative blame analysis performance with joblib. Parallel processing via multithreading speeds up this costly operation.

Read more →

When do I work on what?

April 30, 2016

Tags:data-science, dataviz, git-pandas, open-source, python

Use git-pandas to analyze and visualize work patterns across open source vs. closed source projects. Compare commit times with punchcard plots. Learn the code.

Read more →

Estimating the time spent on a project with git-pandas

April 16, 2016

Tags:git, git-pandas, github, open-source, python

Learn how to estimate project development time using commit history with git-pandas. Compares to git_time_extractor, git-hours, and glass.

Read more →

Data Science Things Roundup #5

March 15, 2016

Tags:data-science, machine-learning, resources, roundup

Explore Deep Q-Learning for Space Invaders, insights from Elasticsearch in production, and improved Python package management strategies.

Read more →

Automating documentation workflow with sphinx and github pages

February 29, 2016

Tags:documentation, github, open-source, python, sphinx

Explore a comprehensive guide on automating the deployment of Sphinx documentation to GitHub Pages, streamlining your workflow with efficient practices.

Read more →

Pypi-publisher: a simple cli for publishing python libraries

February 24, 2016

Tags:cli, deployment, open-source, packaging, python

Introducing pypi-publisher (ppp): a CLI tool simplifying Python library publishing. Handles .pypirc updates, linting, git tags, and PyPI sdist uploads.

Read more →

Using survival analysis and git-pandas to estimate code quality

February 21, 2016

Tags:data-analysis, data-science, dataviz, git, git-pandas, github, open-source, python

Apply survival analysis with git-pandas to measure code quality in Git repositories by analyzing code longevity and contributor patterns over time.

Read more →

Git-pandas v1.0.0, or how to check for a stable release

February 2, 2016

Tags:data-science, dataviz, git, git-pandas, github, open-source, python

Explore git-pandas v1.0.0, focusing on interface consistency, parameter naming, and API simplification for improved data analysis workflows.

Read more →

Github.com cumulative blame in 5 lines of python

January 31, 2016

Tags:data-science, dataviz, git, git-pandas, github, pandas, python

Visualize your GitHub repository growth over time using the git-pandas Python library and GitHubProfile class—all in just a few lines of code.

Read more →

Data-driven engineering team management with gitnoc and git-pandas

January 19, 2016

Tags:data visualization, dataviz, git, git-pandas, github, open-source, python

Leverage git-pandas and gitnoc for data-driven engineering management. Visualize git data for insights on bus factor, risk, project growth, and team oversight.

Read more →

Create organization-wide punchcards with git-pandas

January 17, 2016

Tags:data-analysis, git, git-pandas, github, open-source, python

Learn how git-pandas enables creating organization-wide punchcard visualizations, aggregating commit activity across multiple repositories for a unified view.

Read more →

How to Write Comprehensions and Alienate People

January 8, 2016

Tags:best-practices, data-science, programming, python

A tongue-in-cheek guide to writing Python comprehensions that will make your colleagues question their life choices and your sanity.

Read more →

Gitpandas v0.0.6: python 2.7, fileowners, file-wise blame and examples

January 7, 2016

Tags:data-analysis, data-science, git, git-pandas, github, open-source, projects, python

Overview of git-pandas v0.0.6 release, highlighting new features like Python 2.7 support, file-wise blame, file owner determination, and other improvements.

Read more →

Git-Pandas v0.0.5: coverage.py, risk, and more

December 25, 2015

Tags:data-analysis, git, git-pandas, github, open-source, pandas, projects, python

Git-pandas v0.0.5 is out! Adds coverage.py support, file change rate metrics for risk analysis, API updates, time-based filtering for commits.

Read more →

Common Data Pitfalls for Recurring Machine Learning Systems

December 20, 2015

Tags:analytics, data-engineering, data-science, machine-learning

Explore common data pitfalls in recurring machine learning systems, including new categories, data format changes, sending issues, deduplication, and updates.

Read more →

Visualize all of your git repositories with gitnoc and git-pandas

December 13, 2015

Tags:data visualization, dataviz, git, git-pandas, pandas, python

Visualize git repositories at scale using GitNOC & git-pandas. Create profiles to analyze cumulative blame & file change rates across multiple projects.

Read more →

CyberLaunch: An Accelerator for Machine Learning Companies

December 8, 2015

Tags:atlanta, data-science, machine-learning, startups

Explore CyberLaunch, Atlanta's accelerator for machine learning and info security startups, its program details, and its impact on the local startup ecosystem.

Read more →

Data Science Things Roundup #4

December 5, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #4: Featuring Scikit-learn groups for feature sets, Markov Modulated Poisson Processes for event detection, and DBoost for boosting.

Read more →

Beyond One-Hot: An Exploration of Categorical Variables

November 29, 2015

Tags:data-science, feature-engineering, machine-learning

A deep dive into different methods for encoding categorical variables in machine learning, exploring their benefits and trade-offs

Read more →

Analyzing GitPython and Pandas With GitPandas

November 19, 2015

Tags:data-analysis, git, git-pandas, github, open-source, pandas, projects, python

Analyze GitPython and Pandas repositories with git-pandas! Explore LOC, contributors, and bus factor using this Python library for git analysis.

Read more →

Create a pip-installable python package in 2 minutes

November 12, 2015

Tags:open-source, packaging, pip, python

Rapidly create and publish Python packages. Learn the steps from cookiecutter-pipproject template setup to pushing your first release to PyPI in minutes.

Read more →

Blame the world with git-pandas

November 10, 2015

Tags:dataviz, git, git-pandas, github, pandas, python

Introducing git-pandas, a Python library offering a pandas interface for git analysis. Easily aggregate git blame across projects.

Read more →

Data Science vs. Data Engineering

October 31, 2015

Tags:big-data, career, data-engineering, data-science, technology

Understanding the fundamental differences between data science and data engineering through the lens of methodology rather than tools

Read more →

Data Science Things Roundup #3

September 10, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #3: TensorFlow Self Organizing Maps, DeBaCl for density-based clustering, and options for protecting Python codebases.

Read more →

Data Science Things Roundup #2

May 20, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #2: Featuring Lifelines for survival analysis, Patsy-learn for R-style syntax in scikit-learn, and HDBSCAN clustering library.

Read more →

Data Science Things Roundup #1

February 15, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #1: Explore Kaggle past solutions, Gooey for simple Python CLIs, and Metric-Learn for optimal distance metrics. First edition!

Read more →