CI and Git

Continuous Integration (CI) has become vital in the modern development arena. It helps forcilitate the conituous development within team, as well as contiuous deployment to different envioronments.

Regarding to Source Code Control System, more and more people are using Git. Companring to another hot source code control system -SVN (Subversion), Git conveys a huge benefit over SVN, which is to do easy and lightweight branching and merging.

Beneting from Git’s easy branching and merging, a new Git-based methodology – GitFlow has been widely disccused and deployed in Git community, including those big Git players, like GitHub and Atlassian.

For more info about how to learn Git, please check my another post – Learning Git

Tips to Make Git CI-friendly

Doing CI based on a Git repo brings huge benefit to development, especially for a team based development. By using GitFlow, developer inside a team works isolately on their specific branch and uses CI to make sure the code is error-free to the entire code repo.

Atlassian published a blog post to talk about “5 tips to make your Git repos CI-friendly“, which are:

  • Avoid tracking large files in your repo
  • Use shallow clones for CI
  • Cache the repo on build agents
  • Choose your triggers wisely
  • Stop polling, start hooking

In all. these tips come to three major aspects when using CI:

  • Transfer the code from repo to CI.
    • Avoid tracking large files in your repo
    • Use shallow clones for CI
    • Cache the repo on build agents
  • When to run CI job.
    • Choose your triggers wisely
  • How to run CI job.
    • Stop polling, start hooking

Tips – Remove Large File from Git Repo

As discussed above, having a large file in Git repo is not CI-friendly. Then, how to fix thsi issue if you do have such a big file and now want to get it removed from the repo?

Here is how.

Refered to Removing Objects on Pro Git.

Identify Big Files

  • Do git gc (Garbage Collect) to put all the objects are in a packfile.
  • Run git verify-pack to list out every objects with size. Be sure to do a “sorting” (by sort) and limit the output (by tail or head) if the repo is big enough.
    $ git verify-pack -v .git/objects/pack/pack-29…69.idx 
        | sort -k 3 -n 
        | tail -3
    dadf7258d699da2c8d89b09ef6670edb7d5f91b4 commit 229 159 12
    033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob   22044 5792 4977696
    82c99a3e86bb1267b236a4b6eff7868d97489af1 blob   4975916 4976258 1438
  • Pick the file )object) you are going to remove, and use git rev-list to list all the commit SHAs and the blob SHAs with the file paths associated with them, such as:
    $ git rev-list --objects --all | grep 82c99a3
    82c99a3e86bb1267b236a4b6eff7868d97489af1 git.tgz
  • Once you get the path to that bug file, check the “earliest” commit wheich includes this file.
    $ git log --oneline --branches -- git.tgz
    dadf725 oops - removed large tarball
    7b30847 add git tarball

Remove File from Commit History

  • So, now we are going to rewrite all the commits downstream from 7b30847 to fully remove this file from the Git history. Use git filter-branch.
    $ git filter-branch --index-filter 'git rm --cached --ignore-unmatch git.tgz' -- 7b30847^..
    Rewrite 7b30847d080183a1ab7d18fb202473b3096e9f34 (1/2)rm 'git.tgz'
    Rewrite dadf7258d699da2c8d89b09ef6670edb7d5f91b

Cleanup the Backup

  • Now that file has been remvoed from all commit history. When ding git filter-branch, it does a backup under .git/refs/original folder. Simply delete this folder and run git gc again.
    $ rm -Rf .git/refs/original
    $ rm -Rf .git/logs/
    $ git gc
    Counting objects: 15, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (11/11), done.
    Writing objects: 100% (15/15), done.
    Total 15 (delta 1), reused 12 (delta 0)
  • Run git prune with --expire option to completely remove the object.
    $ git prune --expire now

Be warned: this technique is destructive to your commit history!!!!


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>