Exploring Ideas: A Blog on Technology, Startups, Food, and More

Welcome to my blog where I share thoughts and insights on technology, startups, and life in Atlanta. Browse through the articles below or explore by topic.

BaseN Encoding Grid Search in Category Encoders

December 18, 2016

One of the more interesting encoders in the category_encoders library is the BaseN encoder. The idea behind it is to take a categorical variable and convert it into a series of binary variables, similar to one-hot encoding, but with a different base. For example, if we have a categorical variable with 8 unique values, we could encode it in base 2 with 3 binary variables (2³ = 8). The advantage of ...

Read more →

Category Encoders accepted into scikit-learn-contrib

November 20, 2016

In the past I’ve posted a few times about a library I’m working on called category encoders. The idea of it is to provide a complete toolbox of scikit-learn compatible transformers for the encoding of categorical variables in different ways. Scikit-learn is an extremely popular python package that extends Numpy and Scipy to provide rich machine learning functionality. It’s one of the most active p...

Read more →

Data Science Things Roundup #7

November 10, 2016

This weeks edition of the Data Science Things Roundup is pretty python-heavy, as opposed to previous editions that were a bit more machine learning and dataviz heavy. At the end of the day, some kind of software is backing most of data science, so getting a bit lower level can be useful sometimes. This week we look at a couple of ways to increase performance in python codebases and one way to gene...

Read more →

Category Encoders now on conda-forge

September 17, 2016

My scikit-learn compatible library of categorical data encoders (category_encoders) is now published on conda-forge! Conda, if you didn’t know, is an open source package manager for python (and other things) developed primarily by continuum analytics. Thanks to continuum developer @bollwyvl for doing pretty much all of the work to get it working. Check out the category_encoders feedstock here: htt...

Read more →

Data Science Things Roundup #6

July 20, 2016

Time again for the weekly data science things roundup. If you haven’t seen this before, check out some of the previous ones to get a feel for it. Each Tuesday I run through 3 things I’ve found interesting and bookmarked recently, generally related to python and data science (with some admitted diversions). This week is pretty calendar heavy. Dates are weird for a lot of reasons (shoutout leap-seco...

Read more →

Introducing unified glob-syntax in git-pandas

June 15, 2016

In an effort to improve the user interface to git-pandas, I’m introducing a new way of specifying which files in a repository you care about, which will become the sole way of specifying this kind of thing in version 2.0.0. Currently, for any given function, you can specify a list of extensions you’d like to include, and a list of directories you’d like to exclude. A toy example would be: df = rep...

Read more →

Parallelizing cumulative blame in git-pandas with joblib

June 12, 2016

It’s been a little while since I’ve posted anything about git-pandas, as I’ve been working on getting a sister project, twitter-pandas up and running. Work has continued though, and today I’d like to show a currently experimental feature, parallelized cumulative blame. Cumulative blame is one of the more popular features used in git-pandas, as it can be used to easily create a pretty interesting p...

Read more →

Exit Interviews in Startups

May 5, 2016

Exit interviews are a critical but often overlooked practice in startups. While larger companies have formalized processes for gathering feedback when employees leave, startups often skip this valuable opportunity for insight, either due to the emotional nature of departures or the pressure to focus on immediate operational needs. Why Exit Interviews Matter In startups, every departure represents ...

Read more →

When do I work on what?

April 30, 2016

In past posts, I’ve shown that it’s pretty easy to create organization wide punchcards with git-pandas. Today, I put together a little twist on that particular visualization, to split my projects into two cohorts: open and closed source. My work at Predikto is, as work tends to be, mostly closed source (though we try to contribute to projects when we can). The work I do outside of Predikto is, in ...

Read more →

Building an Engineering Team Around Ownership

April 28, 2016

Small, talented teams have an inherent challenge: the individuals that make them up are talented. In a small, talented engineering team, the engineers understand architecture, the architects understand engineering, and the product managers understand the technical side of things. While this cross-functional knowledge seems beneficial on the surface - enabling empathy and better questions - it come...

Read more →

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.