I've been asked to mirror our Azure DevOps git repository to github. The problem is unlike DevOps, Github has a 100mb file limit. I would really like to avoid using git LFS because I don't want all the devs to need to install and learn it, and I don't know if it will work across both repositories. I don't want to have to modify thousands of historical check-ins either.
I have been provided a sample yml, and I'm wondering if I can modify it in some way to exclude specific files from being cloned or going to GitHub. They aren't needed there, but because the clone is copying everything from all history, simply deleting them or gitignoring them isn't going to solve the problem.
starting yml:
task: PowerShell@2
inputs:
targetType: 'inline'
script: |
Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - -'
Write-Host ' mirror Azure DevOps repo changes to GitHub repo'
Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - - '
$stageDir = '$(Build.SourcesDirectory)' | Split-Path
$githubDir = $stageDir +"\"+"gitHub"
$destination = $githubDir +"\"+"$(devOpsRepoName).git"
$sourceURL = 'https://$(devops.pat)@dev.azure/$(devOpsProjectPath)/_git/$(devOpsRepoName)'
$destURL = 'https://$(github.pat)@github/anization/$(gitHubRepoName).git'
#Check if the parent directory exists and delete
if((Test-Path -path $githubDir))
{
Remove-Item -Path $githubDir -Recurse -force
}
if(!(Test-Path -path $githubDir))
{
New-Item -ItemType directory -Path $githubDir
Set-Location $githubDir
Write-Output '*****Clone****'
git clone --mirror $sourceURL
}
else
{
Write-Host "The given folder path $githubDir already exists";
}
Set-Location $destination
Write-Output '*****Git removing remote secondary****'
git remote rm secondary
Write-Output '*****Git remote add****'
git remote add --mirror=fetch secondary $destURL
Write-Output '*****Git fetch origin****'
git fetch $sourceURL
Write-Output '*****Git push secondary****'
git push secondary --all
Write-Output '**Azure Devops repo synced with Github repo**'
Set-Location $stageDir
if((Test-Path -path $githubDir))
{
Remove-Item -Path $githubDir -Recurse -force
}
Currently fails with a handful of messages like this:
remote: error: File my-file.exe is 157.51 MB; this exceeds GitHub's file size limit of 100.00 MB
I've been asked to mirror our Azure DevOps git repository to github. The problem is unlike DevOps, Github has a 100mb file limit. I would really like to avoid using git LFS because I don't want all the devs to need to install and learn it, and I don't know if it will work across both repositories. I don't want to have to modify thousands of historical check-ins either.
I have been provided a sample yml, and I'm wondering if I can modify it in some way to exclude specific files from being cloned or going to GitHub. They aren't needed there, but because the clone is copying everything from all history, simply deleting them or gitignoring them isn't going to solve the problem.
starting yml:
task: PowerShell@2
inputs:
targetType: 'inline'
script: |
Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - -'
Write-Host ' mirror Azure DevOps repo changes to GitHub repo'
Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - - '
$stageDir = '$(Build.SourcesDirectory)' | Split-Path
$githubDir = $stageDir +"\"+"gitHub"
$destination = $githubDir +"\"+"$(devOpsRepoName).git"
$sourceURL = 'https://$(devops.pat)@dev.azure/$(devOpsProjectPath)/_git/$(devOpsRepoName)'
$destURL = 'https://$(github.pat)@github/anization/$(gitHubRepoName).git'
#Check if the parent directory exists and delete
if((Test-Path -path $githubDir))
{
Remove-Item -Path $githubDir -Recurse -force
}
if(!(Test-Path -path $githubDir))
{
New-Item -ItemType directory -Path $githubDir
Set-Location $githubDir
Write-Output '*****Clone****'
git clone --mirror $sourceURL
}
else
{
Write-Host "The given folder path $githubDir already exists";
}
Set-Location $destination
Write-Output '*****Git removing remote secondary****'
git remote rm secondary
Write-Output '*****Git remote add****'
git remote add --mirror=fetch secondary $destURL
Write-Output '*****Git fetch origin****'
git fetch $sourceURL
Write-Output '*****Git push secondary****'
git push secondary --all
Write-Output '**Azure Devops repo synced with Github repo**'
Set-Location $stageDir
if((Test-Path -path $githubDir))
{
Remove-Item -Path $githubDir -Recurse -force
}
Currently fails with a handful of messages like this:
remote: error: File my-file.exe is 157.51 MB; this exceeds GitHub's file size limit of 100.00 MB
- Beyond your question, you might also consider leveraging the migration CLI tools instead of a custom script. docs.github/en/migrations/using-github-enterprise-importer/… – Matt Commented 17 hours ago
2 Answers
Reset to default 1Once large commits are in your history, they're part of the repository regardless if they've been deleted.
Tools like BFG can purge blobs from your repository to significantly reduce the size of your repository.
This article has a pretty good walk-through on how to purge these large files from the repository. At a high-level:
Install the tool
Clone the repo to your local machine and make a backup.
Use bfg to strip blobs
bfg --strip-blobs-bigger-than 10M your-repo.git
Expire and prune
cd your-repo.git git reflog expire --expire=now --all && git gc --prune=now --aggressive
Push your changes back
git push
You can use git filter-repo
, which is a Python script that allows for fast and comprehensive rewriting of repository history. The script operates by scanning the entire history of a repository and applying modifications (like removing files), replacing text in files, or changing old commit/email details. It's often used to remove sensitive data, change old commit messages, reduce repository size by excluding unwanted files, or restructure the repository layout.
git filter-repo
isn't installed on MS-hosted agent by default. Run python -m pip install git-filter-repo
to install it.
Share an example for your reference.
- script: |
python -m pip install git-filter-repo
git clone --mirror https://$(System.AccessToken)@dev.azure/{ADOName}/{ProjectName}/_git/AzureRepo
cd AzureRepo.git
git filter-repo --path TestFolder --path largerfile.zip --invert-paths
git remote add github https://$(github.pat)@github/anization/$(gitHubRepoName).git
git push --mirror github
displayName: 'Mirror repository to GitHub'
Use
--path
to specify the path or file that you want to focus on in the repo. You can use multiple--path
in the command.--invert-paths
Invert the selection of files from the specified--path
. Only select files matching none of those options.
Result:
References:
Official doc: git-filter-repo.
Blog: Using the git filter-repo tool. (Note, there is no
--all
option, this command is automatically applied to all branches.)