Exploring Ideas: A Blog on Technology, Startups, Food, and More
Welcome to my blog where I share thoughts and insights on technology, startups, and life in Atlanta. Browse through the articles below or explore by topic.
Introducing unified glob-syntax in git-pandas
June 15, 2016
In an effort to improve the user interface to git-pandas, I’m introducing a new way of specifying which files in a repository you care about, which will become the sole way of specifying this kind of thing in version 2.0.0. Currently, for any given function, you can specify a list of extensions you’d like to include, and a list of directories you’d like to exclude. A toy example would be: df = rep...
Parallelizing cumulative blame in git-pandas with joblib
June 12, 2016
It’s been a little while since I’ve posted anything about git-pandas, as I’ve been working on getting a sister project, twitter-pandas up and running. Work has continued though, and today I’d like to show a currently experimental feature, parallelized cumulative blame. Cumulative blame is one of the more popular features used in git-pandas, as it can be used to easily create a pretty interesting p...
Exit Interviews in Startups
May 5, 2016
Exit interviews are a critical but often overlooked practice in startups. While larger companies have formalized processes for gathering feedback when employees leave, startups often skip this valuable opportunity for insight, either due to the emotional nature of departures or the pressure to focus on immediate operational needs. Why Exit Interviews Matter In startups, every departure represents ...
When do I work on what?
April 30, 2016
In past posts, I’ve shown that it’s pretty easy to create organization wide punchcards with git-pandas. Today, I put together a little twist on that particular visualization, to split my projects into two cohorts: open and closed source. My work at Predikto is, as work tends to be, mostly closed source (though we try to contribute to projects when we can). The work I do outside of Predikto is, in ...
Building an Engineering Team Around Ownership
April 28, 2016
Small, talented teams have an inherent challenge: the individuals that make them up are talented. In a small, talented engineering team, the engineers understand architecture, the architects understand engineering, and the product managers understand the technical side of things. While this cross-functional knowledge seems beneficial on the surface - enabling empathy and better questions - it come...
Estimating the time spent on a project with git-pandas
April 16, 2016
I stumbled across a conversation recently on the Tech404 slack channel (a pretty good public slack group for Atlanta area software folks) about mostly taxes, but nestled in the middle was this project: git_time_extractor. In the past I’ve noticed a kind of weird concentration of git related open source projects among Atlanta developers, I’m not sure if that says more about Atlanta or git’s abstrus...
Data Science Things Roundup #5
March 15, 2016
Time again for the 5th edition of the data science things roundup, named suspiciously similarly to the much more established Data Science Roundup by RJ Metrics (but we won’t worry about that this week). In previous weeks we’ve seen some pretty cool ML and Data Science libraries, mostly in python, this week we branch out a little bit in more engineering-level projects (databases and deployment). Sp...
Automating documentation workflow with sphinx and github pages
February 29, 2016
I’ve got a large and growing list of open source projects I try to keep up to date. Some are better documented than others, but I’m working on making them all very simple to get started with. In light of that, I’ve done some hacking around on how to get the workflow of github pages and sphinx nailed down. For a while I was manually building the docs in the master branch, copying the html out to a ...
Pypi-publisher: a simple cli for publishing python libraries
February 24, 2016
I’ve just finished the first release of pypi-publisher (ppp), a library for simple command line publishing of python libraries. You can grab the source or an install here: https://github.com/wdm0006/pypi-publisher In previous posts, I’ve shown how with a cookiecutter framework and some fast typing, you can release a package to pypi in pretty much no time at all. In it’s current state, ppp can cut ...
Using survival analysis and git-pandas to estimate code quality
February 21, 2016
Survival analysis is a statistical technique for determining the likelihood of events to happen over a timeline. It was originally based heavily in the medical/actuarial profession, where it would answer questions like: given this set of conditions, how likely is a person to survive X years? In previous posts, we’ve seen that we can tap into a huge amount of data in git repositories with git-panda...
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.