Exploring Ideas: A Blog on Technology, Startups, Food, and More
Welcome to my blog where I share thoughts and insights on technology, startups, and life in Atlanta. Browse through the articles below or explore by topic.
Estimating the time spent on a project with git-pandas
April 16, 2016
I stumbled across a conversation recently on the Tech404 slack channel (a pretty good public slack group for Atlanta area software folks) about mostly taxes, but nestled in the middle was this project: git_time_extractor. In the past I’ve noticed a kind of weird concentration of git related open source projects among Atlanta developers, I’m not sure if that says more about Atlanta or git’s abstrus...
Data Science Things Roundup #5
March 15, 2016
Time again for the 5th edition of the data science things roundup, named suspiciously similarly to the much more established Data Science Roundup by RJ Metrics (but we won’t worry about that this week). In previous weeks we’ve seen some pretty cool ML and Data Science libraries, mostly in python, this week we branch out a little bit in more engineering-level projects (databases and deployment). Sp...
Automating documentation workflow with sphinx and github pages
February 29, 2016
I’ve got a large and growing list of open source projects I try to keep up to date. Some are better documented than others, but I’m working on making them all very simple to get started with. In light of that, I’ve done some hacking around on how to get the workflow of github pages and sphinx nailed down. For a while I was manually building the docs in the master branch, copying the html out to a ...
Pypi-publisher: a simple cli for publishing python libraries
February 24, 2016
I’ve just finished the first release of pypi-publisher (ppp), a library for simple command line publishing of python libraries. You can grab the source or an install here: https://github.com/wdm0006/pypi-publisher In previous posts, I’ve shown how with a cookiecutter framework and some fast typing, you can release a package to pypi in pretty much no time at all. In it’s current state, ppp can cut ...
Using survival analysis and git-pandas to estimate code quality
February 21, 2016
Survival analysis is a statistical technique for determining the likelihood of events to happen over a timeline. It was originally based heavily in the medical/actuarial profession, where it would answer questions like: given this set of conditions, how likely is a person to survive X years? In previous posts, we’ve seen that we can tap into a huge amount of data in git repositories with git-panda...
Journalism and the Perfect Pitch Deck
February 15, 2016
Having been pretty immersed in the VC funded startup experience at Predikto for a couple of years now, I have (counter to what I would have ever thought), taken an intellectual curiosity to pitch decks, including studying great examples like LinkedIn’s fantastically annotated deck from their Series B. Despite having pretty much always been in some sort of engineering, I don’t come from a family of...
Git-pandas v1.0.0, or how to check for a stable release
February 2, 2016
In the process of making the v1.0.0 release of git-pandas, I had one primary goal: to simplify and solidify the interface to git-pandas objects (the ProjectDirectory and the Repository). At the end of the day, the usefulness of a project like git-pandas versus one off analysis or rolling your own interface is consistent and predictable interfaces to commonly used functions. So with that in mind, I...
Github.com cumulative blame in 5 lines of python
January 31, 2016
Git-pandas has gotten to be pretty capable. Currently in the master branch and soon to be in the v1.0.0 release, we’ve included a github.com interface to git-pandas via the GitHubProfile class. With this, in just a few lines of code, you can see how your profile has grown over time: from gitpandas.utilities.plotting import plot_cumulative_blame from gitpandas import GitHubProfile g = GitHubProfile...
Decision Strategies: Beyond Expected Value
January 28, 2016
Oftentimes when making some kind of uncertain decision, the decision maker will use a measure such as expected value to make that decision. Imagine the case of a single coin flip where the better pays 5 dollars to play, and gets 2 dollars for heads and 10 dollars for tails. The expected value of this game is to pay 5 dollars to enter and make 6 dollars, giving an expected one dollar profit. But th...
Data-driven engineering team management with gitnoc and git-pandas
January 19, 2016
The management of engineering projects is very very different when it scales outside of one developer, and even more so when it scales outside of one repository. Devops and source control aside, as the number of contributors and repos increases, the ability for a manager to keep track of each developer and repository relative to each other increases in kind. Data, especially incomplete data, canno...
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.