Data-Science
52 posts
Gitpandas v0.0.6: python 2.7, fileowners, file-wise blame and examples
Overview of git-pandas v0.0.6 release, highlighting new features like Python 2.7 support, file-wise blame, file owner determination, and other improvements.
Common Data Pitfalls for Recurring Machine Learning Systems
Explore common data pitfalls in recurring machine learning systems, including new categories, data format changes, sending issues, deduplication, and updates.
CyberLaunch: An Accelerator for Machine Learning Companies
Explore CyberLaunch, Atlanta's accelerator for machine learning and info security startups, its program details, and its impact on the local startup ecosystem.
Data Science Things Roundup #4
Data Science Things Roundup #4: Featuring Scikit-learn groups for feature sets, Markov Modulated Poisson Processes for event detection, and DBoost for boosting.
Beyond One-Hot: An Exploration of Categorical Variables
A deep dive into different methods for encoding categorical variables in machine learning, exploring their benefits and trade-offs
Data Science vs. Data Engineering
Understanding the fundamental differences between data science and data engineering through the lens of methodology rather than tools
Sklearn-Extensions: Collecting Useful Scikit-Learn Add-ons
From time to time I come across an interesting blog post, gist, or other snippet of code related to scikit-learn that is pretty cool. Historically, th
Grapht: graph connectedness and dimensionality
I've been toying around with [graph stuff](https://en.wikipedia.org/wiki/Graph_theory) from time to time for a while now. I still have pretty much no
Data Science Things Roundup #3
Data Science Things Roundup #3: TensorFlow Self Organizing Maps, DeBaCl for density-based clustering, and options for protecting Python codebases.
Data Science Things Roundup #2
Data Science Things Roundup #2: Featuring Lifelines for survival analysis, Patsy-learn for R-style syntax in scikit-learn, and HDBSCAN clustering library.