最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

git - How to omit Large Files when migrating from azure DevOps to Github? - Stack Overflow

programmeradmin2浏览0评论

I've been asked to mirror our Azure DevOps git repository to github. The problem is unlike DevOps, Github has a 100mb file limit. I would really like to avoid using git LFS because I don't want all the devs to need to install and learn it, and I don't know if it will work across both repositories. I don't want to have to modify thousands of historical check-ins either.

I have been provided a sample yml, and I'm wondering if I can modify it in some way to exclude specific files from being cloned or going to GitHub. They aren't needed there, but because the clone is copying everything from all history, simply deleting them or gitignoring them isn't going to solve the problem.

starting yml:

task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - -'
      Write-Host ' mirror Azure DevOps repo changes to GitHub repo'
      Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - - '
   
      $stageDir = '$(Build.SourcesDirectory)' | Split-Path
      $githubDir = $stageDir +"\"+"gitHub"
      $destination = $githubDir +"\"+"$(devOpsRepoName).git"
   
      $sourceURL = 'https://$(devops.pat)@dev.azure/$(devOpsProjectPath)/_git/$(devOpsRepoName)'
      $destURL = 'https://$(github.pat)@github/anization/$(gitHubRepoName).git'
      #Check if the parent directory exists and delete
      if((Test-Path -path $githubDir))
      {
        Remove-Item -Path $githubDir -Recurse -force
      }
      if(!(Test-Path -path $githubDir))
      {
        New-Item -ItemType directory -Path $githubDir
        Set-Location $githubDir
        Write-Output '*****Clone****'
        git clone --mirror $sourceURL
      }
      else
      {
        Write-Host "The given folder path $githubDir already exists";
      }
      Set-Location $destination
      Write-Output '*****Git removing remote secondary****'
      git remote rm secondary
      Write-Output '*****Git remote add****'
      git remote add --mirror=fetch secondary $destURL
      Write-Output '*****Git fetch origin****'
      git fetch $sourceURL
      Write-Output '*****Git push secondary****'
      git push secondary --all
      Write-Output '**Azure Devops repo synced with Github repo**'
      Set-Location $stageDir
      if((Test-Path -path $githubDir))
      {
       Remove-Item -Path $githubDir -Recurse -force
      }

Currently fails with a handful of messages like this:

remote: error: File my-file.exe is 157.51 MB; this exceeds GitHub's file size limit of 100.00 MB

I've been asked to mirror our Azure DevOps git repository to github. The problem is unlike DevOps, Github has a 100mb file limit. I would really like to avoid using git LFS because I don't want all the devs to need to install and learn it, and I don't know if it will work across both repositories. I don't want to have to modify thousands of historical check-ins either.

I have been provided a sample yml, and I'm wondering if I can modify it in some way to exclude specific files from being cloned or going to GitHub. They aren't needed there, but because the clone is copying everything from all history, simply deleting them or gitignoring them isn't going to solve the problem.

starting yml:

task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - -'
      Write-Host ' mirror Azure DevOps repo changes to GitHub repo'
      Write-Host ' - - - - - - - - - - - - - - - - - - - - - - - - - '
   
      $stageDir = '$(Build.SourcesDirectory)' | Split-Path
      $githubDir = $stageDir +"\"+"gitHub"
      $destination = $githubDir +"\"+"$(devOpsRepoName).git"
   
      $sourceURL = 'https://$(devops.pat)@dev.azure/$(devOpsProjectPath)/_git/$(devOpsRepoName)'
      $destURL = 'https://$(github.pat)@github/anization/$(gitHubRepoName).git'
      #Check if the parent directory exists and delete
      if((Test-Path -path $githubDir))
      {
        Remove-Item -Path $githubDir -Recurse -force
      }
      if(!(Test-Path -path $githubDir))
      {
        New-Item -ItemType directory -Path $githubDir
        Set-Location $githubDir
        Write-Output '*****Clone****'
        git clone --mirror $sourceURL
      }
      else
      {
        Write-Host "The given folder path $githubDir already exists";
      }
      Set-Location $destination
      Write-Output '*****Git removing remote secondary****'
      git remote rm secondary
      Write-Output '*****Git remote add****'
      git remote add --mirror=fetch secondary $destURL
      Write-Output '*****Git fetch origin****'
      git fetch $sourceURL
      Write-Output '*****Git push secondary****'
      git push secondary --all
      Write-Output '**Azure Devops repo synced with Github repo**'
      Set-Location $stageDir
      if((Test-Path -path $githubDir))
      {
       Remove-Item -Path $githubDir -Recurse -force
      }

Currently fails with a handful of messages like this:

remote: error: File my-file.exe is 157.51 MB; this exceeds GitHub's file size limit of 100.00 MB

Share Improve this question asked 19 hours ago AlexAlex 20.9k4 gold badges18 silver badges24 bronze badges 1
  • Beyond your question, you might also consider leveraging the migration CLI tools instead of a custom script. docs.github/en/migrations/using-github-enterprise-importer/… – Matt Commented 17 hours ago
Add a comment  | 

2 Answers 2

Reset to default 1

Once large commits are in your history, they're part of the repository regardless if they've been deleted.

Tools like BFG can purge blobs from your repository to significantly reduce the size of your repository.

This article has a pretty good walk-through on how to purge these large files from the repository. At a high-level:

  1. Install the tool

  2. Clone the repo to your local machine and make a backup.

  3. Use bfg to strip blobs

    bfg --strip-blobs-bigger-than 10M your-repo.git
    
  4. Expire and prune

    cd your-repo.git
    git reflog expire --expire=now --all && git gc --prune=now --aggressive
    
  5. Push your changes back

    git push
    

You can use git filter-repo , which is a Python script that allows for fast and comprehensive rewriting of repository history. The script operates by scanning the entire history of a repository and applying modifications (like removing files), replacing text in files, or changing old commit/email details. It's often used to remove sensitive data, change old commit messages, reduce repository size by excluding unwanted files, or restructure the repository layout.

git filter-repo isn't installed on MS-hosted agent by default. Run python -m pip install git-filter-repo to install it.

Share an example for your reference.

- script: |
    python -m pip install git-filter-repo

    git clone --mirror https://$(System.AccessToken)@dev.azure/{ADOName}/{ProjectName}/_git/AzureRepo
    cd AzureRepo.git
    git filter-repo --path TestFolder --path largerfile.zip --invert-paths 
    git remote add github https://$(github.pat)@github/anization/$(gitHubRepoName).git
    git push --mirror github
  displayName: 'Mirror repository to GitHub'
  • Use --path to specify the path or file that you want to focus on in the repo. You can use multiple --path in the command.

  • --invert-paths Invert the selection of files from the specified --path. Only select files matching none of those options.

Result:

References:

  • Official doc: git-filter-repo.

  • Blog: Using the git filter-repo tool. (Note, there is no --all option, this command is automatically applied to all branches.)

发布评论

评论列表(0)

  1. 暂无评论