Mastering Python Programming: Insights and Best Practices
This section contains my writings on Python programming, sharing insights from years of professional software development. Topics include best practices, advanced techniques, performance optimization, and practical applications across various domains.
Stargazers CLI Update: Nested Commands, Account Trends, and Plotting!
May 20, 2025
Tags:stargazers, github, cli, open source, python
Announcing the latest stargazers CLI update: all commands now under 'stargazers', plus new account-trend analysis and plotting features.
Mutation Testing with mumut for Pygeohash
May 19, 2025
Tags:python
Mutation testing checks if your tests catch real bugs by making small code changes. Learn how it works and why it matters.
Digging into Code Churn with GitPandas
May 16, 2025
Tags:code churn, data-analysis, git, gitpandas, python
Quantify code churn in your Git repositories with the gitpandas Python library. Analyze file change rates and spot areas of high activity or instability.
Refactoring Library Interfaces
May 15, 2025
Tags:api-design, best-practices, library-development, programming, python, refactoring
Discover techniques for improving library interfaces through thoughtful refactoring, using real-world examples while maintaining backward compatibility.
Context-Aware Library Design: Build for Your Users
May 14, 2025
Tags:api-design, best-practices, library-development, programming, python, user-experience
Learn to design Python libraries that adapt to various user needs and experience levels, ensuring simplicity and effectiveness for all users.
Who Holds the Keys? Calculating Bus Factor with GitPandas
May 13, 2025
Tags:bus factor, data-analysis, git, gitpandas, python, risk management
Explore the concept of bus factor and how gitpandas can help you quantify knowledge distribution and risk in your software project.
The ECF Rating System: The British Approach to Chess Ratings
May 5, 2025
Tags:algorithms, elote, open-source, python, rating-systems
Explore the English Chess Federation (ECF) rating system, its unique performance-based approach, and its distinct history from Elo.
Writing Tools MCP: A Toolkit for Better Writing
May 5, 2025
Tags:mcp, writing, tools, open-source, python
Boost your writing workflow with custom tools for Markdown and content analysis. Discover scripts and automations for faster, better publishing.
Glicko-2: Adding Volatility to the Rating Equation
April 30, 2025
Tags:algorithms, elote, glicko, open-source, python, rating-systems
Explore Glicko-2, an enhancement over Glicko-1, by incorporating player volatility for more precise and accurate ratings.
The Glicko Rating System: When Confidence Matters
April 29, 2025
Tags:algorithms, elote, glicko, open-source, python, rating-systems
Explore the Glicko rating system, an Elo enhancement adding Rating Deviation (RD) to measure confidence. Learn its mechanics and implementation with Elote.
Handling Deprecation: Gracefully Retiring Features
April 28, 2025
Tags:api-design, best-practices, development, library, maintenance, python
Learn to deprecate Python library features gracefully with warnings, clear communication, and migration paths that minimize disruption to your users.
McCabe Complexity: The Python Metric You Should Care About
April 24, 2025
Tags:best-practices, code-quality, development, python, tools
Learn about McCabe Complexity, a key metric for code complexity. Understand, measure with tools like Ruff, and manage complexity in Python projects.
Python Logging Best Practices for Library Developers
April 20, 2025
Tags:best-practices, development, library, python
A comprehensive guide to implementing logging in Python libraries - from basic setup to advanced patterns and common pitfalls to avoid
Introducing 'stargazers': A Tool to Understand Your GitHub Audience
April 16, 2025
Tags:cli, community, github, python
Announcing 'stargazers': A CLI tool to fetch, analyze & summarize GitHub stargazers/forkers for any public repo. Inspired by Cockroach Labs' analysis.
HashingEncoder: Tackling Extreme Cardinality with the Hashing Trick
April 15, 2025
Tags:category-encoders, feature-engineering, machine-learning, python
Explore HashingEncoder: learn how it handles extreme cardinality, its mechanism, ideal use cases, and implementation with the category_encoders library.
BinaryEncoder: The Space-Efficient Alternative to One-Hot Encoding
April 13, 2025
Tags:category-encoders, feature-engineering, machine-learning, python
Explore the BinaryEncoder from category_encoders: a space-efficient alternative to one-hot encoding. Learn how it works, when to use it, and see implementation.
OrdinalEncoder: When Order Matters in Categorical Data
April 10, 2025
Tags:category-encoders, feature-engineering, machine-learning, python
Explore OrdinalEncoder for categorical data where order matters. Learn how it works, its benefits, and implementation using the category_encoders library.
Makefiles: The Unsung Hero of Python Development
April 8, 2025
Tags:automation, development, python, tools
Explore why Makefiles, though old-school, are invaluable for Python projects. Learn how they provide consistent, simple shortcuts for common dev tasks.
Modern Python Package Publishing: PyGeoHash's New CI/CD Pipeline
April 6, 2025
Tags:development, github-actions, pygeohash, python
PyGeoHash uses GitHub Actions, cibuildwheel, and PyPI's Trusted Publisher for automated, secure multi-platform Python package builds and publishing.
PyGeoHash Gets Type Hints: A Journey into Modern Python
April 3, 2025
Tags:development, open-source, pygeohash, python, type-hints
Enhance PyGeoHash with comprehensive type hints and a new types module, improving developer experience and code quality for modern Python.
Optimal Bankroll Management with Keeks: The Kelly Criterion
April 1, 2025
Tags:betting, finance, keeks, kelly-criterion, open-source, python
Dive into the Kelly Criterion, the foundation of optimal betting and bankroll management. Learn its theory, pros, cons, and implementation with Keeks.
Documenting Your Library's API: Best Practices
March 30, 2025
Tags:autodoc, best-practices, development, docstrings, documentation, library, python, sphinx
Build a clear, comprehensive API reference with Sphinx & autodoc. Learn best practices for structure, content, cross-referencing your Python library docs.
OneHotEncoder: The Workhorse of Categorical Encoding
March 27, 2025
Tags:category-encoders, feature-engineering, machine-learning, python
A comprehensive guide to OneHotEncoder in category_encoders, exploring its core functionality, advantages, and practical limitations in machine learning.
Elo Rating System: The Grandfather of Competitive Rankings
March 25, 2025
Tags:algorithms, elo, elote, open-source, python, rating-systems
Deep dive into the Elo rating system: history, the math behind it (K-factor, expected score), and practical implementation with the Elote Python library.
Automating Docs Deployment with GitHub Actions and Pages
March 23, 2025
Tags:automation, ci-cd, development, documentation, github-actions, library, python, sphinx
Keep docs current automatically! Learn to set up GitHub Actions to build Sphinx docs & deploy to GitHub Pages, ensuring sync with your code changes.
Crafting Code Examples: From Snippets to Real-World Scenarios
March 22, 2025
Tags:development, documentation, library, python, sphinx
Master the art of crafting clear, runnable code examples for documentation using doctest, tutorials, and scripts to enhance user understanding.
Keeks 0.1.0 Release: Optimal Bankroll Management Made Simple
March 18, 2025
Tags:betting, finance, keeks, kelly-criterion, open-source, python
Keeks 0.1.0 is here! Python library for optimal bankroll management & betting strategies (Kelly Criterion). Includes simulation and visualization tools.
Getting Started with Sphinx for Python Project Documentation
March 15, 2025
Tags:development, docstrings, documentation, library, python, restructuredtext, sphinx
Learn how to use Sphinx to generate professional, cross-referenced documentation for your Python projects, covering installation, configuration, and autodoc.
Elote 1.0.0 Release: Rating Systems Made Simple
March 13, 2025
Tags:elo, elote, open-source, python, rating-systems
Announcing Elote 1.0.0, a Python library that simplifies the implementation of rating systems like Elo, Glicko, and TrueSkill for competitive ranking.
PyGeoHash v3.0.0: Faster, Freer, and More Pythonic
March 11, 2025
Tags:geospatial, licensing, open-source, performance, pygeohash, python
Deep dive into PyGeoHash v3.0.0: a major release with a pure CPython rewrite, MIT relicensing, and dramatic performance gains. Faster & freer geohashing!
Using Cursor for Open Source Library Maintenance
March 9, 2025
Tags:ai, cursor, elote, keeks, open-source, pygeohash, python
How AI tools like Cursor simplify open source library maintenance by assisting with code understanding, environment setup, housekeeping, and project standards.
Effective Docstrings: Google vs. NumPy vs. reStructuredText Styles
March 6, 2025
Tags:autodoc, development, docstrings, documentation, library, python, restructuredtext, sphinx
Learn to write clear docstrings Sphinx understands. Compares Google, NumPy, and reStructuredText formats for effective Python library documentation.
PyGeoHash 2.1.0: Modernizing a Geospatial Python Library
March 4, 2025
Tags:cursor, geohash, geospatial, open-source, python
A look at the latest updates to PyGeoHash, a lightweight Python library for working with geohashes, and how modern AI tools helped revitalize the project.
Geohash: When Clever Isn't Always Smart
March 2, 2025
Tags:algorithms, data-engineering, geohash, python
A deep dive into the limitations of the Geohash algorithm, including boundary issues, non-uniform cell sizes, and common implementation mistakes to avoid.
Where Did All the RAM Go? Memory Profiling with Memray
March 1, 2025
Tags:development, library, optimization, performance, profiling, python, testing
High CPU isn''t the only performance issue. Learn how Memray helps track memory leaks and excessive allocation in your Python library to optimize usage.
Finding the Slowdown: Profiling Python Code with Pyinstrument
February 25, 2025
Tags:development, library, optimization, performance, profiling, python, testing
Your benchmark says a function is slow, but why? Profilers like Pyinstrument help you pinpoint exactly where your Python code is spending its time.
How Fast Is It? Benchmarking Your Code with Pytest-Benchmark
February 22, 2025
Tags:development, library, performance, pytest, python, testing
Performance matters! Easily measure Python library speed with pytest-benchmark. Track performance, find regressions, and optimize effectively with benchmarks.
Silos to Shared Libraries: Guide to Inner Source Adoption
February 18, 2025
Tags:best-practices, development-practices, inner-source, library-development, python, security
Guide for transitioning from team-specific code to shared libraries, covering governance models, security, and standardized development practices.
Mastering Mocking in Python with pytest-mock
February 16, 2025
A practical guide to mocking in Python testing - from basic concepts to advanced techniques with pytest-mock and other helpful libraries
Building Your Internal Library Developer Community
February 15, 2025
Tags:best-practices, corporate-culture, development-practices, inner-source, library-development, python
Explore strategies for building a thriving community of library developers in your organization through effective incentives, recognition, and collaboration.
Will It Blend? Testing Across Environments with Tox
February 13, 2025
Tags:ci-cd, development, library, pytest, python, testing
Works on your machine? Great, but what about Python 3.9 or 3.12? Tox ensures library compatibility across different Python versions and dependency sets easily.
Inner Source: Bringing Open Source Culture Inside Your Organization
February 11, 2025
Tags:best-practices, corporate-culture, development-practices, inner-source, library-development, open-source, python
Learn how to harness the power of open source development practices within your organization through inner source principles and practices.
Are Your Tests Enough? Measuring Coverage with Coverage.py
February 9, 2025
Tags:code-quality, development, library, pytest, python, testing
Writing tests is step one. Step two is knowing what parts of your library code those tests actually exercise. Enter Coverage.py.
Designing for Developer Joy: Python Library Ergonomics
February 6, 2025
Tags:api-design, best-practices, developer-experience, library-development, programming, python
What makes Python libraries joyful? Explore API ergonomics, from naming conventions & sensible defaults to helpful error messages that guide users to solutions.
Why Your Library Needs Pytest (And How to Get Started)
February 4, 2025
Tags:code-quality, development, library, pytest, python, testing
Testing is vital for Python libraries. Explore why it''s crucial and how Pytest simplifies writing powerful tests with less boilerplate and better assertions.
The Art of API Design: Making the Right Things Easy
February 3, 2025
Tags:api-design, best-practices, developer-experience, library-development, programming, python
Learn principles of intuitive Python API design that make common operations simple and guide users toward best practices, while preserving advanced features.
Secure Coding Practices for Python Library Developers
February 2, 2025
Tags:best-practices, development, input validation, library, python, secure-coding, security
Beyond tools, what principles guide secure Python library development? Explore essential practices: input validation, least privilege, error handling, and more.
Taming the Python Chaos: Linting & Formatting with Ruff
January 30, 2025
Tags:ci-cd, code-quality, development, github-actions, python, ruff
What linting and formatting actually are, why they matter (a lot!), and how the speedy tool Ruff can save your Python project (and your sanity).
Handling Sensitive Data Securely Within Your Python Library
January 29, 2025
Tags:development, library, python, secure-coding, security
Handle sensitive data in Python libraries securely. Learn best practices for managing API keys, passwords, PII, and other secrets without exposing them in code.
Decoding Library Updates: Understanding Semantic Versioning (SemVer)
January 28, 2025
Tags:dependencies, development, library, packaging, pip, python
Guide to Semantic Versioning (SemVer) for Python library authors. Understand MAJOR.MINOR.PATCH rules to communicate changes and manage dependencies.
Dependency Security: Managing Vulnerabilities with pip-audit
January 27, 2025
Tags:dependencies, development, library, python, security, vulnerabilities
Your library relies on packages. Learn how to use pip-audit to scan your dependencies for known security vulnerabilities and keep your users safe.
The Center of Your Python Project: Understanding pyproject.toml
January 26, 2025
Tags:development, library, packaging, pytest, python, ruff
From setup.py chaos to pyproject.toml clarity. Learn why it exists, how it standardizes Python packaging/tool config via PEPs (518, 517, 621), and its anatomy.
Bandit Security Rules: Finding Common Python Security Issues
January 25, 2025
Tags:development, library, python, ruff, secure-coding, security, vulnerabilities
Learn how to use Ruff's Bandit integration to automatically scan your Python code for common security pitfalls through static analysis.
Don't Forget the Fine Print: Licensing Your Python Library
January 24, 2025
Tags:compliance, dependencies, development, library, licensing, open-source, python
Choosing an open-source license is crucial. Understand common options (MIT, Apache, GPL), why compatibility matters, and how to comply with obligations.
Building and Engaging a Community Around Your Open Source Library
January 22, 2025
Tags:community, development, github, library, maintenance, open-source, python
Attract users, encourage contributions, and build a welcoming environment for your open source library. Learn practical steps for community engagement.
The Library Author's Dilemma: Managing Python Dependencies
January 21, 2025
Tags:best-practices, dependencies, development, library, packaging, pip, python
Python library dependency management balances features vs user pain. Explore best practices for choosing, versioning (~= compatible release), and maintenance.
Avoiding Common Pitfalls: Injection Flaws in Python Libraries
January 18, 2025
Tags:development, input validation, library, python, secure-coding, security
Injection flaws aren''t just for web apps. See how SQL & command injection affect Python libraries via input handling, and learn crucial prevention techniques.
The Art of Saying No: Defining Your Python Library's Scope
January 17, 2025
Tags:design, development, library, python
Why keeping your Python library focused is harder than it looks, and how saying 'no' can be your most powerful design tool.
SDLC in the Age of AI
January 12, 2025
Tags:ai, data-science, programming, software-development
Exploring how AI reshapes software development: natural language programming, documentation as prompt engineering, and the role of `.cursorrules` files.
From Weekend Hack to Core Tool: The category_encoders Journey
December 27, 2022
Tags:data-science, machine-learning, open-source, python
Explore category_encoders' journey from a weekend Python experiment to a widely used data science library, now part of scikit-learn-contrib.
Category Encoders v1.2.8 Release
June 4, 2018
Tags:category-encoders, data-science, open-source, python
Announcing Category Encoders v1.2.8! This release includes important bugfixes and introduces new features like optional category names in output columns.
Category Encoders published in JOSS
January 26, 2018
Tags:category-encoders, data-science, open-source, python
Announcing the category_encoders Python package publication in JOSS. Discover the peer-review process and citation details.
Revisiting Python support in Apache Flink
January 11, 2018
Tags:big-data, data-science, python
Early 2018 look at Apache Flink's Python support. Checking compatibility, batch vs streaming capabilities, & future developments like Streaming API support.
On taking things too seriously: holiday edition
December 9, 2017
Tags:data-science, python, rating-systems, sports
Building a CFB bowl game prediction system with Python packages elote, keeks, & keeks-elote. Combines rating, betting strategies, and backtesting for analysis.
Elote: a python package of rating systems
December 6, 2017
Tags:data-science, machine-learning, python, rating-systems
Introducing Elote, a Python package implementing various rating systems like Elo and Glicko. Learn its core concepts and see how to use it for ranking.
Ripyr: sampled metrics on datasets using python's asyncio
November 28, 2017
Tags:data-science, python, type-hints
An introduction to ripyr, a Python library for streaming through large datasets and parsing basic metrics using asyncio and type hinting
Category Encoders v1.2.5 Release
November 22, 2017
Tags:category-encoders, data-science, machine-learning, open-source, python
Category Encoders v1.2.5 brings community updates including stable binary/BaseN encoding, new leave-one-out encoding, and pandas compatibility fixes.
git-pandas Caching: Faster Analysis
July 25, 2017
Tags:data-analysis, git, pandas, performance, python
Boost git repository analysis speed! Learn how git-pandas now uses caching to dramatically improve performance for repeated queries on large codebases.
Category Encoders v1.2.4 Release
July 12, 2017
Tags:category-encoders, data-science, machine-learning, python
Category Encoders v1.2.4 is out! Includes pandas categorical type support, improved missing value handling, better error messages, BaseN fixes, and docs.
BaseN Encoding Grid Search in Category Encoders
December 18, 2016
Tags:category-encoders, data-science, machine-learning, python
Explore category_encoders' BaseN encoder for representing categorical data. Learn how to use scikit-learn's grid search to find the optimal encoding base.
Category Encoders accepted into scikit-learn-contrib
November 20, 2016
Tags:category-encoders, data-science, open-source, python
Category Encoders, a Python library for encoding categorical variables, has been accepted into the scikit-learn-contrib ecosystem. A project milestone!
Category Encoders now on conda-forge
September 17, 2016
Tags:category-encoders, data-science, open-source, python
The category_encoders Python package is now available on conda-forge, making installation easier for Conda users. Learn about the package and feedstock.
Introducing unified glob-syntax in git-pandas
June 15, 2016
Tags:data-science, git-pandas, python
Explore the unified glob syntax (`include_globs`, `ignore_globs`) for git-pandas v2.0, offering flexible file pattern specification and usability.
Parallelizing cumulative blame in git-pandas with joblib
June 12, 2016
Tags:data-science, git-pandas, performance, python
Boost git-pandas cumulative blame analysis performance with joblib. Parallel processing via multithreading speeds up this costly operation.
When do I work on what?
April 30, 2016
Tags:data-science, dataviz, git-pandas, open-source, python
Use git-pandas to analyze and visualize work patterns across open source vs. closed source projects. Compare commit times with punchcard plots. Learn the code.
Estimating the time spent on a project with git-pandas
April 16, 2016
Tags:git, git-pandas, github, open-source, python
Learn how to estimate project development time using commit history with git-pandas. Compares to git_time_extractor, git-hours, and glass.
Automating documentation workflow with sphinx and github pages
February 29, 2016
Tags:documentation, github, open-source, python, sphinx
Explore a comprehensive guide on automating the deployment of Sphinx documentation to GitHub Pages, streamlining your workflow with efficient practices.
Pypi-publisher: a simple cli for publishing python libraries
February 24, 2016
Tags:cli, deployment, open-source, packaging, python
Introducing pypi-publisher (ppp): a CLI tool simplifying Python library publishing. Handles .pypirc updates, linting, git tags, and PyPI sdist uploads.
Using survival analysis and git-pandas to estimate code quality
February 21, 2016
Tags:data-analysis, data-science, dataviz, git, git-pandas, github, open-source, python
Apply survival analysis with git-pandas to measure code quality in Git repositories by analyzing code longevity and contributor patterns over time.
Git-pandas v1.0.0, or how to check for a stable release
February 2, 2016
Tags:data-science, dataviz, git, git-pandas, github, open-source, python
Explore git-pandas v1.0.0, focusing on interface consistency, parameter naming, and API simplification for improved data analysis workflows.
Github.com cumulative blame in 5 lines of python
January 31, 2016
Tags:data-science, dataviz, git, git-pandas, github, pandas, python
Visualize your GitHub repository growth over time using the git-pandas Python library and GitHubProfile class—all in just a few lines of code.
Data-driven engineering team management with gitnoc and git-pandas
January 19, 2016
Tags:data visualization, dataviz, git, git-pandas, github, open-source, python
Leverage git-pandas and gitnoc for data-driven engineering management. Visualize git data for insights on bus factor, risk, project growth, and team oversight.
Create organization-wide punchcards with git-pandas
January 17, 2016
Tags:data-analysis, git, git-pandas, github, open-source, python
Learn how git-pandas enables creating organization-wide punchcard visualizations, aggregating commit activity across multiple repositories for a unified view.
How to Write Comprehensions and Alienate People
January 8, 2016
Tags:best-practices, data-science, programming, python
A tongue-in-cheek guide to writing Python comprehensions that will make your colleagues question their life choices and your sanity.
Gitpandas v0.0.6: python 2.7, fileowners, file-wise blame and examples
January 7, 2016
Tags:data-analysis, data-science, git, git-pandas, github, open-source, projects, python
Overview of git-pandas v0.0.6 release, highlighting new features like Python 2.7 support, file-wise blame, file owner determination, and other improvements.
Git-Pandas v0.0.5: coverage.py, risk, and more
December 25, 2015
Tags:data-analysis, git, git-pandas, github, open-source, pandas, projects, python
Git-pandas v0.0.5 is out! Adds coverage.py support, file change rate metrics for risk analysis, API updates, time-based filtering for commits.
Visualize all of your git repositories with gitnoc and git-pandas
December 13, 2015
Tags:data visualization, dataviz, git, git-pandas, pandas, python
Visualize git repositories at scale using GitNOC & git-pandas. Create profiles to analyze cumulative blame & file change rates across multiple projects.
Analyzing GitPython and Pandas With GitPandas
November 19, 2015
Tags:data-analysis, git, git-pandas, github, open-source, pandas, projects, python
Analyze GitPython and Pandas repositories with git-pandas! Explore LOC, contributors, and bus factor using this Python library for git analysis.
Create a pip-installable python package in 2 minutes
November 12, 2015
Tags:open-source, packaging, pip, python
Rapidly create and publish Python packages. Learn the steps from cookiecutter-pipproject template setup to pushing your first release to PyPI in minutes.
Blame the world with git-pandas
November 10, 2015
Tags:dataviz, git, git-pandas, github, pandas, python
Introducing git-pandas, a Python library offering a pandas interface for git analysis. Easily aggregate git blame across projects.
Miscellaneous MATLAB
June 10, 2012
Tags:engineering, programming
Useful MATLAB tips learned the hard way: logical indexing, read-only parameters, pre-allocation, OOP, user input, and publishing for cleaner, faster code.