Juggling Projects? Analyze Multiple Repos at Once with GitPandas
So, you’ve got your Git analysis chops honed with gitpandas
on a single repository. Nice! But what happens when your project isn’t just one repo? Maybe you’re wrangling microservices, managing a monorepo with distinct sub-projects, or just have a collection of related tools living in separate folders. Analyzing them one by one is a drag.
Fear not! gitpandas
has a nifty tool called ProjectDirectory
designed exactly for this scenario. It lets you treat a directory full of Git repositories as a single entity for analysis. Let’s see how it works.
Setting Up Your Project Directory
Imagine you have a main project folder, and inside it are several sub-folders, each containing a Git repository:
my-awesome-project/
├── service-a/
│ └── .git/
├── service-b/
│ └── .git/
├── shared-library/
│ └── .git/
└── ... other files ...
To analyze these together, you first import and instantiate ProjectDirectory
, pointing it to the parent directory (my-awesome-project
in this case):
from gitpandas import ProjectDirectory
# Point this to the directory containing your repositories
project_path = '/path/to/my-awesome-project'
proj = ProjectDirectory(project_path)
# gitpandas will automatically discover the git repos inside
print(f"Found repositories: {proj.repo_names()}")
ProjectDirectory
scans the given path and finds all the directories that look like Git repositories. You can also explicitly tell it which repositories to include if you don’t want it to auto-discover everything.
What Can You Do With It?
Once you have your proj
object, you can perform analyses that span across all the included repositories.
Combined Commit History
Want to see the commit activity across all services? Easy peasy:
# Get a DataFrame of commits from all repos in the project
all_commits_df = proj.commit_history(limit=100) # Limit for brevity
print(all_commits_df.head())
This DataFrame will look similar to the single Repository
commit history, but it will include commits from service-a
, service-b
, and shared-library
, likely with an added column indicating which repository the commit belongs to.
Project-Wide File Changes
Similarly, you can look at file changes across the board:
# Get file change history across all repositories
all_changes_df = proj.file_change_history(limit=500) # Larger limit might be needed
# You could then, for example, find the most changed files regardless of repo
agg_changes = all_changes_df.groupby('filename')[['insertions', 'deletions']].sum()
agg_changes['total_churn'] = agg_changes['insertions'] + agg_changes['deletions']
print(agg_changes.sort_values('total_churn', ascending=False).head(10))
General Information
You can get a quick overview of all the repositories included:
# Get general info like author counts, file counts, etc., per repo
info_df = proj.general_information()
print(info_df)
This gives you a handy summary table comparing the basic stats of each repository in your project directory.
Why Bother?
Using ProjectDirectory
is super helpful when:
- You want a holistic view of development activity across related components.
- You need to track contributions or changes that span multiple repositories.
- You want to compare metrics (like churn, commit frequency) between different parts of your system.
It turns gitpandas
from a single-repo analysis tool into something that can handle more complex, multi-repository setups without needing to write a bunch of custom scripting to loop through directories yourself.
Give it a try next time you’re faced with more than one .git
folder!
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.