How to extract a subdirectory from a Git repository while keeping its history

This is more or less to document a procedure I needed a few days ago: the extraction of a sub-directory from a Git repository into a separate (new) repository. The nice thing here is, that the whole history of the respective directory is kept.

First I created a check-out of the existing repository and removed the link to its origin.

git clone SOURCE.git
mv SOURCE NEW-REPO
cd NEW-REPO
git remote rm origin

The next step is the actual extraction. We will remove all directories but the desired one throughout the whole history. This also works with more than one directory, i.e. here: DIR1 and DIR2

git filter-branch --tree-filter 'ls -1 |grep -v -e DIR1 -e DIR2 | xargs -i rm -rf {}' --prune-empty -f HEAD

Now the repository should only contain the (here) two directories. But the removed files still are in the packed object references. We have to manually let them expire and remove them.

git reflog expire --expire=now --all
git gc --aggr --prune=now

Finally I did some cleaning, but this is optionally:

git repack -a -d -l
git clean -f -d

Finally we can for example create a new (bare) repository from our NEW-REPO and further use it. I recommend to just remove the extracted directory from the source repository and keep its history, just in case.

The steps are similar if you need to instead remove a directory including its history. The only difference is in the actual removal command:

git filter-branch --tree-filter 'ls -1 |grep -e DIRECTORY |xargs -i rm -rf {}' --prune-empty -f HEAD

Leave a Reply

Your email address will not be published.