Artificial Intelligence & Machine Learning

Exploring Diverse Topics: From AI to Culinary Adventures

This section contains my writings on artificial intelligence and machine learning, drawing from years of experience leading AI initiatives across various organizations. Topics range from practical implementation advice to theoretical discussions and industry trends.

Perplexity 101: How Language Models Measure Surprise

June 11, 2025

Tags:nlp, perplexity, machine-learning, language-models

A deep dive into perplexity: the metric that tells us how 'surprised' a language model is by text and why it matters for AI detection.

Testing the Waters: Explore Career Paths Without Burning Bridges

June 10, 2025

Tags:career, data-science, decision-making, professional-development, personal

A practical guide for ICs to get real exposure to management, product, consulting, and senior technical roles before making career decisions.

AI for Small Business: Real-World Ways to Get Started

June 4, 2025

Tags:AI, productivity, small business, technology

Curious how AI can actually help your small business? Here are practical, approachable ways to put artificial intelligence to work: no tech wizardry required.

From Data Scientist to Manager: Models to Mentorship

June 2, 2025

Tags:career, data-science, leadership, management, personal

A data scientist's guide to trading Jupyter notebooks for team meetings, learning to delegate and build high-performing teams.

Bandit Severity Levels: Understanding High, Medium, and Low Findings

May 29, 2025

Tags:priority, python, security, severity, triage

Master Bandit's severity and confidence classification system. Learn how to prioritize security findings and build effective remediation workflows.

Bandit's Hardcoded Password Detection: Rules B105-B107 in Practice

May 27, 2025

Tags:passwords, python, security

Learn how Bandit detects hardcoded passwords in Python code with rules B105, B106, and B107. Includes real examples and secure alternatives.

Bandit Security Rules: Complete Python Vulnerability Guide

May 26, 2025

Tags:python, security, vulnerability detection

Master Python security with this comprehensive guide to Bandit's security rules. Learn what each rule detects and how to fix common vulnerabilities.

Juggling Projects? Analyze Multiple Repos at Once with GitPandas

May 23, 2025

Tags:data-analysis, git, gitpandas, project management, python

Managing multiple repositories is easier with the right tools. This post shares tips and best practices for handling multi-repo projects efficiently.

Stargazers CLI Update: Nested Commands, Account Trends, and Plotting!

May 20, 2025

Tags:cli, github, open-source, python, stargazers

Announcing the latest stargazers CLI update: all commands now under 'stargazers', plus new account-trend analysis and plotting features.

Mutation Testing with mumut for Pygeohash

May 19, 2025

Tags:python

Mutation testing checks if your tests catch real bugs by making small code changes. Learn how it works and why it matters.

Digging into Code Churn with GitPandas

May 16, 2025

Tags:code churn, data-analysis, git, gitpandas, python

Quantify code churn in your Git repositories with the gitpandas Python library. Analyze file change rates and spot areas of high activity or instability.

Refactoring Library Interfaces

May 15, 2025

Tags:api-design, best-practices, library-development, programming, python, refactoring

Discover techniques for improving library interfaces through thoughtful refactoring, using real-world examples while maintaining backward compatibility.

Context-Aware Library Design: Build for Your Users

May 14, 2025

Tags:api-design, best-practices, library-development, programming, python, user-experience

Learn to design Python libraries that adapt to various user needs and experience levels, ensuring simplicity and effectiveness for all users.

Who Holds the Keys? Calculating Bus Factor with GitPandas

May 13, 2025

Tags:bus factor, data-analysis, git, gitpandas, python, risk management

Explore the concept of bus factor and how gitpandas can help you quantify knowledge distribution and risk in your software project.

The ECF Rating System: The British Approach to Chess Ratings

May 5, 2025

Tags:algorithms, elote, open-source, python, rating-systems

Explore the English Chess Federation (ECF) rating system, its unique performance-based approach, and its distinct history from Elo.

Writing Tools MCP: A Toolkit for Better Writing

May 5, 2025

Tags:mcp, open-source, python, tools

Boost your writing workflow with custom tools for Markdown and content analysis. Discover scripts and automations for faster, better publishing.

Glicko-2: Adding Volatility to the Rating Equation

April 30, 2025

Tags:algorithms, elote, glicko, open-source, python, rating-systems

Explore Glicko-2, an enhancement over Glicko-1, by incorporating player volatility for more precise and accurate ratings.

The Glicko Rating System: When Confidence Matters

April 29, 2025

Tags:algorithms, elote, glicko, open-source, python, rating-systems

Explore the Glicko rating system, an Elo enhancement adding Rating Deviation (RD) to measure confidence. Learn its mechanics and implementation with Elote.

Handling Deprecation: Gracefully Retiring Features

April 28, 2025

Tags:api-design, best-practices, development, library, maintenance, python

Learn to deprecate Python library features gracefully with warnings, clear communication, and migration paths that minimize disruption to your users.

McCabe Complexity: The Python Metric You Should Care About

April 24, 2025

Tags:best-practices, code-quality, development, python, tools

Learn about McCabe Complexity, a key metric for code complexity. Understand, measure with tools like Ruff, and manage complexity in Python projects.

Python Logging Best Practices for Library Developers

April 20, 2025

Tags:best-practices, development, library, python

A comprehensive guide to implementing logging in Python libraries - from basic setup to advanced patterns and common pitfalls to avoid

Introducing 'stargazers': A Tool to Understand Your GitHub Audience

April 16, 2025

Tags:cli, community, github, python

Announcing 'stargazers': A CLI tool to fetch, analyze & summarize GitHub stargazers/forkers for any public repo. Inspired by Cockroach Labs' analysis.

HashingEncoder: Tackling Extreme Cardinality with the Hashing Trick

April 15, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

Explore HashingEncoder: learn how it handles extreme cardinality, its mechanism, ideal use cases, and implementation with the category_encoders library.

BinaryEncoder: The Space-Efficient Alternative to One-Hot Encoding

April 13, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

Explore the BinaryEncoder from category_encoders: a space-efficient alternative to one-hot encoding. Learn how it works, when to use it, and see implementation.

OrdinalEncoder: When Order Matters in Categorical Data

April 10, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

Explore OrdinalEncoder for categorical data where order matters. Learn how it works, its benefits, and implementation using the category_encoders library.

Makefiles: The Unsung Hero of Python Development

April 8, 2025

Tags:automation, development, python, tools

Explore why Makefiles, though old-school, are invaluable for Python projects. Learn how they provide consistent, simple shortcuts for common dev tasks.

Modern Python Package Publishing: PyGeoHash's New CI/CD Pipeline

April 6, 2025

Tags:development, github-actions, pygeohash, python

PyGeoHash uses GitHub Actions, cibuildwheel, and PyPI's Trusted Publisher for automated, secure multi-platform Python package builds and publishing.

PyGeoHash Gets Type Hints: A Journey into Modern Python

April 3, 2025

Tags:development, open-source, pygeohash, python, type-hints

Enhance PyGeoHash with comprehensive type hints and a new types module, improving developer experience and code quality for modern Python.

Optimal Bankroll Management with Keeks: The Kelly Criterion

April 1, 2025

Tags:betting, finance, keeks, kelly-criterion, open-source, python

Dive into the Kelly Criterion, the foundation of optimal betting and bankroll management. Learn its theory, pros, cons, and implementation with Keeks.

Documenting Your Library's API: Best Practices

March 30, 2025

Tags:autodoc, best-practices, development, docstrings, documentation, library, python, sphinx

Build a clear, comprehensive API reference with Sphinx & autodoc. Learn best practices for structure, content, cross-referencing your Python library docs.

OneHotEncoder: The Workhorse of Categorical Encoding

March 27, 2025

Tags:category-encoders, feature-engineering, machine-learning, python

A comprehensive guide to OneHotEncoder in category_encoders, exploring its core functionality, advantages, and practical limitations in machine learning.

Elo Rating System: The Grandfather of Competitive Rankings

March 25, 2025

Tags:algorithms, elo, elote, open-source, python, rating-systems

Deep dive into the Elo rating system: history, the math behind it (K-factor, expected score), and practical implementation with the Elote Python library.

Automating Docs Deployment with GitHub Actions and Pages

March 23, 2025

Tags:automation, ci-cd, development, documentation, github-actions, library, python, sphinx

Keep docs current automatically! Learn to set up GitHub Actions to build Sphinx docs & deploy to GitHub Pages, ensuring sync with your code changes.

Crafting Code Examples: From Snippets to Real-World Scenarios

March 22, 2025

Tags:development, documentation, library, python, sphinx

Master the art of crafting clear, runnable code examples for documentation using doctest, tutorials, and scripts to enhance user understanding.

Keeks 0.1.0 Release: Optimal Bankroll Management Made Simple

March 18, 2025

Tags:betting, finance, keeks, kelly-criterion, open-source, python

Keeks 0.1.0 is here! Python library for optimal bankroll management & betting strategies (Kelly Criterion). Includes simulation and visualization tools.

Getting Started with Sphinx for Python Project Documentation

March 15, 2025

Tags:development, docstrings, documentation, library, python, restructuredtext, sphinx

Learn how to use Sphinx to generate professional, cross-referenced documentation for your Python projects, covering installation, configuration, and autodoc.

Elote 1.0.0 Release: Rating Systems Made Simple

March 13, 2025

Tags:elo, elote, open-source, python, rating-systems

Announcing Elote 1.0.0, a Python library that simplifies the implementation of rating systems like Elo, Glicko, and TrueSkill for competitive ranking.

PyGeoHash v3.0.0: Faster, Freer, and More Pythonic

March 11, 2025

Tags:geospatial, licensing, open-source, performance, pygeohash, python

Deep dive into PyGeoHash v3.0.0: a major release with a pure CPython rewrite, MIT relicensing, and dramatic performance gains. Faster & freer geohashing!

Using Cursor for Open Source Library Maintenance

March 9, 2025

Tags:ai, cursor, elote, keeks, open-source, pygeohash, python

How AI tools like Cursor simplify open source library maintenance by assisting with code understanding, environment setup, housekeeping, and project standards.

Effective Docstrings: Google vs. NumPy vs. reStructuredText Styles

March 6, 2025

Tags:autodoc, development, docstrings, documentation, library, python, restructuredtext, sphinx

Learn to write clear docstrings Sphinx understands. Compares Google, NumPy, and reStructuredText formats for effective Python library documentation.

PyGeoHash 2.1.0: Modernizing a Geospatial Python Library

March 4, 2025

Tags:cursor, geohash, geospatial, open-source, python

A look at the latest updates to PyGeoHash, a lightweight Python library for working with geohashes, and how modern AI tools helped revitalize the project.

Geohash: When Clever Isn't Always Smart

March 2, 2025

Tags:algorithms, data-engineering, geohash, python

A deep dive into the limitations of the Geohash algorithm, including boundary issues, non-uniform cell sizes, and common implementation mistakes to avoid.

Where Did All the RAM Go? Memory Profiling with Memray

March 1, 2025

Tags:development, library, optimization, performance, profiling, python, testing

High CPU isn''t the only performance issue. Learn how Memray helps track memory leaks and excessive allocation in your Python library to optimize usage.

Claude 3.7 and new Cursor: first impressions

February 26, 2025

Tags:ai, data-science, tools

First impressions of Anthropic's Claude 3.7 model and the latest Cursor AI coding assistant update. Exploring improved reasoning, speed, and new features.

Finding the Slowdown: Profiling Python Code with Pyinstrument

February 25, 2025

Tags:development, library, optimization, performance, profiling, python, testing

Your benchmark says a function is slow, but why? Profilers like Pyinstrument help you pinpoint exactly where your Python code is spending its time.

How Fast Is It? Benchmarking Your Code with Pytest-Benchmark

February 22, 2025

Tags:development, library, performance, pytest, python, testing

Performance matters! Easily measure Python library speed with pytest-benchmark. Track performance, find regressions, and optimize effectively with benchmarks.

Silos to Shared Libraries: Guide to Inner Source Adoption

February 18, 2025

Tags:best-practices, development-practices, inner-source, library-development, python, security

Guide for transitioning from team-specific code to shared libraries, covering governance models, security, and standardized development practices.

Mastering Mocking in Python with pytest-mock

February 16, 2025

Tags:pytest, python, testing

A practical guide to mocking in Python testing - from basic concepts to advanced techniques with pytest-mock and other helpful libraries

Building Your Internal Library Developer Community

February 15, 2025

Tags:best-practices, corporate-culture, development-practices, inner-source, library-development, python

Explore strategies for building a thriving community of library developers in your organization through effective incentives, recognition, and collaboration.

Will It Blend? Testing Across Environments with Tox

February 13, 2025

Tags:ci-cd, development, library, pytest, python, testing

Works on your machine? Great, but what about Python 3.9 or 3.12? Tox ensures library compatibility across different Python versions and dependency sets easily.

Inner Source: Bringing Open Source Culture Inside Your Organization

February 11, 2025

Tags:best-practices, corporate-culture, development-practices, inner-source, library-development, open-source, python

Learn how to harness the power of open source development practices within your organization through inner source principles and practices.

Data Science Things Roundup #13

February 10, 2025

Tags:data-science, machine-learning, resources, roundup

Data Science Roundup #13: Exploring IBM Granite for enterprise AI, Mistral AI's Le Chat for Europe, and Open Deep Research for DIY AI tools.

Are Your Tests Enough? Measuring Coverage with Coverage.py

February 9, 2025

Tags:code-quality, development, library, pytest, python, testing

Writing tests is step one. Step two is knowing what parts of your library code those tests actually exercise. Enter Coverage.py.

Designing for Developer Joy: Python Library Ergonomics

February 6, 2025

Tags:api-design, best-practices, developer-experience, library-development, programming, python

What makes Python libraries joyful? Explore API ergonomics, from naming conventions & sensible defaults to helpful error messages that guide users to solutions.

Why Your Library Needs Pytest (And How to Get Started)

February 4, 2025

Tags:code-quality, development, library, pytest, python, testing

Testing is vital for Python libraries. Explore why it''s crucial and how Pytest simplifies writing powerful tests with less boilerplate and better assertions.

The Art of API Design: Making the Right Things Easy

February 3, 2025

Tags:api-design, best-practices, developer-experience, library-development, programming, python

Learn principles of intuitive Python API design that make common operations simple and guide users toward best practices, while preserving advanced features.

Secure Coding Practices for Python Library Developers

February 2, 2025

Tags:best-practices, development, input validation, library, python, secure-coding, security

Beyond tools, what principles guide secure Python library development? Explore essential practices: input validation, least privilege, error handling, and more.

Taming the Python Chaos: Linting & Formatting with Ruff

January 30, 2025

Tags:ci-cd, code-quality, development, github-actions, python, ruff

What linting and formatting actually are, why they matter (a lot!), and how the speedy tool Ruff can save your Python project (and your sanity).

Handling Sensitive Data Securely Within Your Python Library

January 29, 2025

Tags:development, library, python, secure-coding, security

Handle sensitive data in Python libraries securely. Learn best practices for managing API keys, passwords, PII, and other secrets without exposing them in code.

Decoding Library Updates: Understanding Semantic Versioning (SemVer)

January 28, 2025

Tags:dependencies, development, library, packaging, pip, python

Guide to Semantic Versioning (SemVer) for Python library authors. Understand MAJOR.MINOR.PATCH rules to communicate changes and manage dependencies.

Dependency Security: Managing Vulnerabilities with pip-audit

January 27, 2025

Tags:dependencies, development, library, python, security, vulnerabilities

Your library relies on packages. Learn how to use pip-audit to scan your dependencies for known security vulnerabilities and keep your users safe.

The Center of Your Python Project: Understanding pyproject.toml

January 26, 2025

Tags:development, library, packaging, pytest, python, ruff

From setup.py chaos to pyproject.toml clarity. Learn why it exists, how it standardizes Python packaging/tool config via PEPs (518, 517, 621), and its anatomy.

Bandit Security Rules: Finding Common Python Security Issues

January 25, 2025

Tags:development, library, python, ruff, secure-coding, security, vulnerabilities

Learn how to use Ruff's Bandit integration to automatically scan your Python code for common security pitfalls through static analysis.

Don't Forget the Fine Print: Licensing Your Python Library

January 24, 2025

Tags:compliance, dependencies, development, library, licensing, open-source, python

Choosing an open-source license is crucial. Understand common options (MIT, Apache, GPL), why compatibility matters, and how to comply with obligations.

Building and Engaging a Community Around Your Open Source Library

January 22, 2025

Tags:community, development, github, library, maintenance, open-source, python

Attract users, encourage contributions, and build a welcoming environment for your open source library. Learn practical steps for community engagement.

The Library Author's Dilemma: Managing Python Dependencies

January 21, 2025

Tags:best-practices, dependencies, development, library, packaging, pip, python

Python library dependency management balances features vs user pain. Explore best practices for choosing, versioning (~= compatible release), and maintenance.

Data Science Things Roundup #12

January 20, 2025

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #11: Discover hidden gems in data science, featuring ModernBERT, ReaderLM v2, and Cohere''s Rerank 3.5. Catch up on what''s new!

Avoiding Common Pitfalls: Injection Flaws in Python Libraries

January 18, 2025

Tags:development, input validation, library, python, secure-coding, security

Injection flaws aren't just for web apps. See how SQL & command injection affect Python libraries via input handling, and learn crucial prevention techniques.

The Art of Saying No: Defining Your Python Library's Scope

January 17, 2025

Tags:design, development, library, python

Why keeping your Python library focused is harder than it looks, and how saying 'no' can be your most powerful design tool.

SDLC in the Age of AI

January 12, 2025

Tags:ai, data-science, programming, software-development

Exploring how AI reshapes software development: natural language programming, documentation as prompt engineering, and the role of `.cursorrules` files.

From Weekend Hack to Core Tool: The category_encoders Journey

December 27, 2022

Tags:data-science, machine-learning, open-source, python

Explore category_encoders' journey from a weekend Python experiment to a widely used data science library, now part of scikit-learn-contrib.

Investment Review: Seer.ai

October 18, 2022

Tags:ai, analytics, angel-investing, deal-review, investing, machine-learning, startups

A review of my angel investment in Seer.ai, exploring how they align with my investment thesis and their unique value proposition in AI-powered analytics.

Category Encoders v1.2.8 Release

June 4, 2018

Tags:category-encoders, data-science, open-source, python

Announcing Category Encoders v1.2.8! This release includes important bugfixes and introduces new features like optional category names in output columns.

TechEmergence Podcast and Atlanta AI Article

June 3, 2018

Tags:ai, atlanta, data-science, podcast, predictive-maintenance

Discover how predictive maintenance is revolutionizing AI in Atlanta, highlighting the city's strengths, future prospects, and unique opportunities for growth.

Data Engineering Podcast

February 20, 2018

Tags:data-engineering, data-science, podcast, predictive-maintenance

Hear insights on the Data Engineering Podcast discussing predictive maintenance, industrial IoT data challenges, and essential data engineering best practices.

Category Encoders published in JOSS

January 26, 2018

Tags:category-encoders, data-science, open-source, python

Announcing the category_encoders Python package publication in JOSS. Discover the peer-review process and citation details.

The Problem with Industrial IoT

January 16, 2018

Tags:data-science, technology

Industrial IoT promises much but faces challenges in adoption and implementation. Explore the hurdles of data quality, integration, security, and proving ROI.

Revisiting Python support in Apache Flink

January 11, 2018

Tags:big-data, data-science, python

Early 2018 look at Apache Flink's Python support. Checking compatibility, batch vs streaming capabilities, & future developments like Streaming API support.

Tendencies of Data Engineers and Scientists

January 9, 2018

Tags:data-engineering, data-science, engineering-management

Explore the relationship dynamics and challenges between data engineering and data science teams, including their approaches, collaboration, and best practices.

I Made a Model, Now What?

January 4, 2018

Tags:atlanta, data-science, machine-learning

Practical insights from a PyData Atlanta talk on successfully deploying and maintaining machine learning models in production environments.

On taking things too seriously: holiday edition

December 9, 2017

Tags:data-science, python, rating-systems, sports

Building a CFB bowl game prediction system with Python packages elote, keeks, & keeks-elote. Combines rating, betting strategies, and backtesting for analysis.

Elote: a python package of rating systems

December 6, 2017

Tags:data-science, machine-learning, python, rating-systems

Introducing Elote, a Python package implementing various rating systems like Elo and Glicko. Learn its core concepts and see how to use it for ranking.

Ripyr: sampled metrics on datasets using python's asyncio

November 28, 2017

Tags:data-science, python, type-hints

An introduction to ripyr, a Python library for streaming through large datasets and parsing basic metrics using asyncio and type hinting

Category Encoders v1.2.5 Release

November 22, 2017

Tags:category-encoders, data-science, machine-learning, open-source, python

Category Encoders v1.2.5 brings community updates including stable binary/BaseN encoding, new leave-one-out encoding, and pandas compatibility fixes.

Data Science Things Roundup #11

September 23, 2017

Tags:data-science, finance, roundup

A collection of interesting data science articles and projects, including SEC keynotes, Bayesian inference, and visualization tools

git-pandas Caching: Faster Analysis

July 25, 2017

Tags:data-analysis, git, pandas, performance, python

Boost git repository analysis speed! Learn how git-pandas now uses caching to dramatically improve performance for repeated queries on large codebases.

Category Encoders v1.2.4 Release

July 12, 2017

Tags:category-encoders, data-science, machine-learning, python

Category Encoders v1.2.4 is out! Includes pandas categorical type support, improved missing value handling, better error messages, BaseN fixes, and docs.

Data Science Things Roundup #10

April 19, 2017

Tags:data-science, machine-learning, resources, roundup

A curated collection of data science articles and tools exploring network analysis, StashPy for log processing, and Bayesian survival analysis techniques.

Data Science Things Roundup #9

March 12, 2017

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #9: Highlighting Pedro Domingos'' ML paper, Spyre (Shiny for Python), and BetaGo, an AlphaGo-inspired Go bot framework.

Data Science Things Roundup #8

January 25, 2017

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #8: Dive into LIME for model interpretation, sklearn-expertsys for interpretable classifiers, and the value of nearest neighbors.

BaseN Encoding Grid Search in Category Encoders

December 18, 2016

Tags:category-encoders, data-science, machine-learning, python

Explore category_encoders' BaseN encoder for representing categorical data. Learn how to use scikit-learn's grid search to find the optimal encoding base.

Category Encoders accepted into scikit-learn-contrib

November 20, 2016

Tags:category-encoders, data-science, open-source, python

Category Encoders, a Python library for encoding categorical variables, has been accepted into the scikit-learn-contrib ecosystem. A project milestone!

Data Science Things Roundup #7

November 10, 2016

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #7: Python-focused edition featuring Intel's Python Distribution, Go-Python for extensions, and PyFilesystem for unified access.

Category Encoders now on conda-forge

September 17, 2016

Tags:category-encoders, data-science, open-source, python

The category_encoders Python package is now available on conda-forge, making installation easier for Conda users. Learn about the package and feedstock.

Data Science Things Roundup #6

July 20, 2016

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #6: Focuses on calendar visualizations with D3.js Calendar Heatmap and Bostock's Calendar View, plus insights on coding interviews.

Introducing unified glob-syntax in git-pandas

June 15, 2016

Tags:data-science, git-pandas, python

Explore the unified glob syntax (`include_globs`, `ignore_globs`) for git-pandas v2.0, offering flexible file pattern specification and usability.

Parallelizing cumulative blame in git-pandas with joblib

June 12, 2016

Tags:data-science, git-pandas, performance, python

Boost git-pandas cumulative blame analysis performance with joblib. Parallel processing via multithreading speeds up this costly operation.

When do I work on what?

April 30, 2016

Tags:data-science, dataviz, git-pandas, open-source, python

Use git-pandas to analyze and visualize work patterns across open source vs. closed source projects. Compare commit times with punchcard plots. Learn the code.

Estimating the time spent on a project with git-pandas

April 16, 2016

Tags:git, git-pandas, github, open-source, python

Learn how to estimate project development time using commit history with git-pandas. Compares to git_time_extractor, git-hours, and glass.

Data Science Things Roundup #5

March 15, 2016

Tags:data-science, machine-learning, resources, roundup

Explore Deep Q-Learning for Space Invaders, insights from Elasticsearch in production, and improved Python package management strategies.

Automating documentation workflow with sphinx and github pages

February 29, 2016

Tags:documentation, github, open-source, python, sphinx

Explore a comprehensive guide on automating the deployment of Sphinx documentation to GitHub Pages, streamlining your workflow with efficient practices.

Pypi-publisher: a simple cli for publishing python libraries

February 24, 2016

Tags:cli, deployment, open-source, packaging, python

Introducing pypi-publisher (ppp): a CLI tool simplifying Python library publishing. Handles .pypirc updates, linting, git tags, and PyPI sdist uploads.

Using survival analysis and git-pandas to estimate code quality

February 21, 2016

Tags:data-analysis, data-science, dataviz, git, git-pandas, github, open-source, python

Apply survival analysis with git-pandas to measure code quality in Git repositories by analyzing code longevity and contributor patterns over time.

Git-pandas v1.0.0, or how to check for a stable release

February 2, 2016

Tags:data-science, dataviz, git, git-pandas, github, open-source, python

Explore git-pandas v1.0.0, focusing on interface consistency, parameter naming, and API simplification for improved data analysis workflows.

Github.com cumulative blame in 5 lines of python

January 31, 2016

Tags:data-science, dataviz, git, git-pandas, github, pandas, python

Visualize your GitHub repository growth over time using the git-pandas Python library and GitHubProfile class-all in just a few lines of code.

Data-driven engineering team management with gitnoc and git-pandas

January 19, 2016

Tags:data visualization, dataviz, git, git-pandas, github, open-source, python

Leverage git-pandas and gitnoc for data-driven engineering management. Visualize git data for insights on bus factor, risk, project growth, and team oversight.

Create organization-wide punchcards with git-pandas

January 17, 2016

Tags:data-analysis, git, git-pandas, github, open-source, python

Learn how git-pandas enables creating organization-wide punchcard visualizations, aggregating commit activity across multiple repositories for a unified view.

How to Write Comprehensions and Alienate People

January 8, 2016

Tags:best-practices, data-science, programming, python

A tongue-in-cheek guide to writing Python comprehensions that will make your colleagues question their life choices and your sanity.

Gitpandas v0.0.6: python 2.7, fileowners, file-wise blame and examples

January 7, 2016

Tags:data-analysis, data-science, git, git-pandas, github, open-source, projects, python

Overview of git-pandas v0.0.6 release, highlighting new features like Python 2.7 support, file-wise blame, file owner determination, and other improvements.

Git-Pandas v0.0.5: coverage.py, risk, and more

December 25, 2015

Tags:data-analysis, git, git-pandas, github, open-source, pandas, projects, python

Git-pandas v0.0.5 is out! Adds coverage.py support, file change rate metrics for risk analysis, API updates, time-based filtering for commits.

Common Data Pitfalls for Recurring Machine Learning Systems

December 20, 2015

Tags:analytics, data-engineering, data-science, machine-learning

Explore common data pitfalls in recurring machine learning systems, including new categories, data format changes, sending issues, deduplication, and updates.

Visualize all of your git repositories with gitnoc and git-pandas

December 13, 2015

Tags:data visualization, dataviz, git, git-pandas, pandas, python

Visualize git repositories at scale using GitNOC & git-pandas. Create profiles to analyze cumulative blame & file change rates across multiple projects.

CyberLaunch: An Accelerator for Machine Learning Companies

December 8, 2015

Tags:atlanta, data-science, machine-learning, startups

Explore CyberLaunch, Atlanta's accelerator for machine learning and info security startups, its program details, and its impact on the local startup ecosystem.

Data Science Things Roundup #4

December 5, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #4: Featuring Scikit-learn groups for feature sets, Markov Modulated Poisson Processes for event detection, and DBoost for boosting.

Beyond One-Hot: An Exploration of Categorical Variables

November 29, 2015

Tags:data-science, feature-engineering, machine-learning

A deep dive into different methods for encoding categorical variables in machine learning, exploring their benefits and trade-offs

Analyzing GitPython and Pandas With GitPandas

November 19, 2015

Tags:data-analysis, git, git-pandas, github, open-source, pandas, projects, python

Analyze GitPython and Pandas repositories with git-pandas! Explore LOC, contributors, and bus factor using this Python library for git analysis.

Create a pip-installable python package in 2 minutes

November 12, 2015

Tags:open-source, packaging, pip, python

Rapidly create and publish Python packages. Learn the steps from cookiecutter-pipproject template setup to pushing your first release to PyPI in minutes.

Blame the world with git-pandas

November 10, 2015

Tags:dataviz, git, git-pandas, github, pandas, python

Introducing git-pandas, a Python library offering a pandas interface for git analysis. Easily aggregate git blame across projects.

Data Science vs. Data Engineering

October 31, 2015

Tags:big-data, career, data-engineering, data-science, technology

Understanding the fundamental differences between data science and data engineering through the lens of methodology rather than tools

Data Science Things Roundup #3

September 10, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #3: TensorFlow Self Organizing Maps, DeBaCl for density-based clustering, and options for protecting Python codebases.

Data Science Things Roundup #2

May 20, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #2: Featuring Lifelines for survival analysis, Patsy-learn for R-style syntax in scikit-learn, and HDBSCAN clustering library.

Data Science Things Roundup #1

February 15, 2015

Tags:data-science, machine-learning, resources, roundup

Data Science Things Roundup #1: Explore Kaggle past solutions, Gooey for simple Python CLIs, and Metric-Learn for optimal distance metrics. First edition!