Journal
Posts and long-form series on AI, startups, venture capital, and more.
All Posts
Git-Pandas v0.0.5: coverage.py, risk, and more
Git-pandas v0.0.5 is out! Adds coverage.py support, file change rate metrics for risk analysis, API updates, time-based filtering for commits.
Common Data Pitfalls for Recurring Machine Learning Systems
Explore common data pitfalls in recurring machine learning systems, including new categories, data format changes, sending issues, deduplication, and updates.
Visualize all of your git repositories with gitnoc and git-pandas
Visualize git repositories at scale using GitNOC & git-pandas. Create profiles to analyze cumulative blame & file change rates across multiple projects.
CyberLaunch: An Accelerator for Machine Learning Companies
Explore CyberLaunch, Atlanta's accelerator for machine learning and info security startups, its program details, and its impact on the local startup ecosystem.
Data Science Things Roundup #4
Data Science Things Roundup #4: Featuring Scikit-learn groups for feature sets, Markov Modulated Poisson Processes for event detection, and DBoost for boosting.
Beyond One-Hot: An Exploration of Categorical Variables
A deep dive into different methods for encoding categorical variables in machine learning, exploring their benefits and trade-offs
Analyzing GitPython and Pandas With GitPandas
Analyze GitPython and Pandas repositories with git-pandas! Explore LOC, contributors, and bus factor using this Python library for git analysis.
Create a pip-installable python package in 2 minutes
Rapidly create and publish Python packages. Learn the steps from cookiecutter-pipproject template setup to pushing your first release to PyPI in minutes.
Blame the world with git-pandas
Introducing git-pandas, a Python library offering a pandas interface for git analysis. Easily aggregate git blame across projects.
Data Science vs. Data Engineering
Understanding the fundamental differences between data science and data engineering through the lens of methodology rather than tools