Data-Science
56 posts
Data Science Things Roundup #9
Data Science Things Roundup #9: Highlighting Pedro Domingos'' ML paper, Spyre (Shiny for Python), and BetaGo, an AlphaGo-inspired Go bot framework.
Data Science Things Roundup #8
Data Science Things Roundup #8: Dive into LIME for model interpretation, sklearn-expertsys for interpretable classifiers, and the value of nearest neighbors.
BaseN Encoding Grid Search in Category Encoders
Explore category_encoders' BaseN encoder for representing categorical data. Learn how to use scikit-learn's grid search to find the optimal encoding base.
Category Encoders accepted into scikit-learn-contrib
Category Encoders, a Python library for encoding categorical variables, has been accepted into the scikit-learn-contrib ecosystem. A project milestone!
Data Science Things Roundup #7
Data Science Things Roundup #7: Python-focused edition featuring Intel's Python Distribution, Go-Python for extensions, and PyFilesystem for unified access.
Category Encoders now on conda-forge
The category_encoders Python package is now available on conda-forge, making installation easier for Conda users. Learn about the package and feedstock.
Data Science Things Roundup #6
Data Science Things Roundup #6: Focuses on calendar visualizations with D3.js Calendar Heatmap and Bostock's Calendar View, plus insights on coding interviews.
Introducing unified glob-syntax in git-pandas
Explore the unified glob syntax (`include_globs`, `ignore_globs`) for git-pandas v2.0, offering flexible file pattern specification and usability.
Parallelizing cumulative blame in git-pandas with joblib
Boost git-pandas cumulative blame analysis performance with joblib. Parallel processing via multithreading speeds up this costly operation.
When do I work on what?
Use git-pandas to analyze and visualize work patterns across open source vs. closed source projects. Compare commit times with punchcard plots. Learn the code.