Tuesday, February 1, 2022

Purging a directory or a list of files from a Git repository

Sometimes we want to remove a directory or a file at a given path completely from a Git repository's history. BFG Repo-Cleaner is a recommended tool for this. Typical examples examples given are to remove files matching a name or a pattern anywhere in the repository. Sometimes this is not what we wanted, for instance, we want to remove a Readme.md in a particular directory, or remove a doc directory from a repository that has multiple files or directories of those names instead.

Below we show an example where we remove a file or a directory at a particular path. This relies on the following option of BFG Repo-Cleaner,

-bi, --strip-blobs-with-ids <blob-ids-file>  
           strip blobs with the specified Git object ids

where it is noteworthy that the argument given to this option is a file. The file contains Git object hashes of the files we wish to purge. 

In addition, to purge a directory is to purge all the files in the directory; once all of the files are gone, the directory is gone; however, BFG Repo-Cleaner would yield an error if we pass the Git object hash of a directory to it. 

The question becomes how we figure out the Git object hashes of the files we wish to purge. For this, we can use git rev-list command, e.g.,

git rev-list --all --objects | \
      grep "src/Readme.md" | \
      cut -d' ' -f1 > file_hashes.txt

Following this,

bfg -bi file_hashes.txt 

git push --force