Juggling Projects? Analyze Multiple Repos at Once with GitPandas

Exploring Ideas: A Blog on Technology, Startups, Food, and More

So, you’ve got your Git analysis chops honed with gitpandas on a single repository. Nice! But what happens when your project isn’t just one repo? Maybe you’re wrangling microservices, managing a monorepo with distinct sub-projects, or just have a collection of related tools living in separate folders. Analyzing them one by one is a drag.

Fear not! gitpandas has a nifty tool called ProjectDirectory designed exactly for this scenario. It lets you treat a directory full of Git repositories as a single entity for analysis. Let’s see how it works.

Setting Up Your Project Directory

Imagine you have a main project folder, and inside it are several sub-folders, each containing a Git repository:

my-awesome-project/
├── service-a/
│   └── .git/
├── service-b/
│   └── .git/
├── shared-library/
│   └── .git/
└── ... other files ...

To analyze these together, you first import and instantiate ProjectDirectory, pointing it to the parent directory (my-awesome-project in this case):

from gitpandas import ProjectDirectory

# Point this to the directory containing your repositories
project_path = '/path/to/my-awesome-project'
proj = ProjectDirectory(project_path)

# gitpandas will automatically discover the git repos inside
print(f"Found repositories: {proj.repo_names()}")

ProjectDirectory scans the given path and finds all the directories that look like Git repositories. You can also explicitly tell it which repositories to include if you don’t want it to auto-discover everything.

What Can You Do With It?

Once you have your proj object, you can perform analyses that span across all the included repositories.

Combined Commit History

Want to see the commit activity across all services? Easy peasy:

# Get a DataFrame of commits from all repos in the project
all_commits_df = proj.commit_history(limit=100) # Limit for brevity

print(all_commits_df.head())

This DataFrame will look similar to the single Repository commit history, but it will include commits from service-a, service-b, and shared-library, likely with an added column indicating which repository the commit belongs to.

Project-Wide File Changes

Similarly, you can look at file changes across the board:

# Get file change history across all repositories
all_changes_df = proj.file_change_history(limit=500) # Larger limit might be needed

# You could then, for example, find the most changed files regardless of repo
agg_changes = all_changes_df.groupby('filename')[['insertions', 'deletions']].sum()
agg_changes['total_churn'] = agg_changes['insertions'] + agg_changes['deletions']
print(agg_changes.sort_values('total_churn', ascending=False).head(10))

General Information

You can get a quick overview of all the repositories included:

# Get general info like author counts, file counts, etc., per repo
info_df = proj.general_information()
print(info_df)

This gives you a handy summary table comparing the basic stats of each repository in your project directory.

Why Bother?

Using ProjectDirectory is super helpful when:

You want a holistic view of development activity across related components.
You need to track contributions or changes that span multiple repositories.
You want to compare metrics (like churn, commit frequency) between different parts of your system.

It turns gitpandas from a single-repo analysis tool into something that can handle more complex, multi-repository setups without needing to write a bunch of custom scripting to loop through directories yourself.

Give it a try next time you’re faced with more than one .git folder!

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.