Skip to content

SVN to Git Conversion

C. Piker edited this page Apr 20, 2021 · 7 revisions

Das2 java sources were converted from two different SVN repositories into a single git repository while retaining history back to 2003. This was possible thanks to the svn-all-fast-export provided by the KDE project and due to fact that git uses universally unique tags.

Here's the recipe that was followed for the conversion process.

Setting up the svn-all-fast-export Tool

Get the svn-all-fast-export tool and read about it:

apt install svn-get-all-fast-export
man svn-get-all-fast-export

if the package is not available from your distribution clone the svn-all-fast-export git repository and build it from source.

Downloading the Raw SVN Repositories

Get a copy of the server files using rsync. Don't use the same name for the output directory that any of the new repos may use. The --delete option is used below in case you've run rsync before and want to make sure the local copy only contains files that are available at the remote source. It will not affect the remote source location.

rsync -avP --delete USER@saturn:/PATH/TO/REPO/das2/ ./das2_svn_repo/

ssh -t [email protected] create
rsync -avP --delete [email protected]:/home/svn/p/autoplot/code/ ./ap_svn_repo/

Creating the Author Map

Make an authors file for all commits to a given branch. NOTE: All paths given after /das2_svn_repo/ below are repository paths and have nothing to do with the file layout underneath the repository directory. So:

file://$(pwd)/das2_svn_repo/           <-- Path to local copy of the repository
dasCore/community/autoplot2011/trunk   <-- Virtual path within the repository

Here's an example. You'll probably have to run this multiple times for various branches in your repository:

svn log --quiet file://$(pwd)/das2_svn_repo/dasCore/community/autoplot2011/trunk | \
   grep -E "r[0-9]+ \| .+ \|" | cut -d'|' -f2 | sed 's/ //g' | sort | uniq > \
   authors.txt

Note that if a branch is missing in the HEAD version of your SVN repository the path in a particular version of the repository may be specified using the @ symbol after the path. For example:

file://$(pwd)/das2_svn_repo/dasCore/netbeans_trunk@9324

All git commits require an email address. Each line of the authors.txt file used by the conversion rules below looks similar to:

eewest  Edward West <[email protected]>
jbfaden  Jeremy Faden <[email protected]>
cwp  Chris Piker <[email protected]>

and so on.

Writing the Path Map Rules

Follow the source paths as they move around the repository back to the original location at revision 8. This involves many commands like:

svn log -v --stop-on-copy \
  file://$(pwd)/das2_svn_repo/core/stable/dasCore/src/main/java/org/das2/graph/DasAxis.java

It's probably easiest to pick an old representative file that was committed very early in the project, such as DasAxis.java or QDataSet.java. Each time log output stops. Amend the path, set to the next earlier revision number and start again:

svn log -v --stop-on-copy \
  file://$(pwd)/das2_svn_repo/dasCore/netbeans_trunk/src/org/das2/graph/DasAxis.java@9324

Keep playing out the logs, jumping to earlier and earlier locations going back to the original commit. The resulting rules files manually created via this process are provided at the end of this page.

Running the Rules

Run the rules to create a bare git repository, i.e. one with no working files:

svn-all-fast-export --identity-map=authors.txt --rules=dasCore.rules \
  --stats --add-metadata --dry-run das2_svn_repo

Base on the rules file below this creates the a bare das2java git repository. To convert this to a standard git repository, and to repack the objects:

cd das2java
mkdir .git
mv * .git
git config --local --bool core.bare false
git reset --hard
git repack -a -d -f --window=250 --depth=250

Repacking git repositories created by svn-all-fast-export is essential for better performance down the road. The size of repository can be collapsed by half or more.

Next run the rules file on the Autoplot SVN repository. Again this is the actual repository as rsync'ed from SourceForge, not a working copy. The rules file referenced in this command is provided at bottom of this page.

svn-all-fast-export --identity-map=authors.txt --rules=QDataSet.rules \
  --stats --add-metadata --dry-run ap_svn_repo

Now turn it into a standard git repository and repack the deltas:

cd qdataset
mkdir .git
mv * .git
git config --local --bool core.bare false
git reset --hard
git repack -a -d -f --window=250 --depth=250

Combining the Repositories

Git commits are tagged by globally unique hash values. This means that two different git repositories may be merged while maintaining history, but there is a catch. The paths within each repository must be different. We are able to merge the das2java and qdataset because the rules files have been carefully crafted to avoid using the same top level paths at any revision.

To combine git repositories, start with the oldest one (by date) and add the newer one:

cd das2java
git checkout main  
git remote add -f qdataset /path/to/new/repo/qdataset
git merge --allow-unrelated-histories qdataset/main     # other repo/branch

Now test that SVN numbers are findable in both:

git log --follow QDataSet/src/org/das2/qds/QDataSet.java
git log --follow dasCore/src/org/das2/graph/DasAxis.java

Remove the remote tracking branch that leads to local qdataset repository:

git branch -r -d qdataset/main

Pushing all Branches to Github

Create an empty repository on github.com. Don't add LICENSE or README.md! Since the rules files only defined two branches, the following commands push everything to github.

git remote add origin [email protected]:das-developers/das2java.git
git branch -M main
git push -u origin main
git checkout original
git branch -M original
git push -u origin original

Files

The following files provided the input and output rules for svn-all-fast-export.

dasCore.rules

# Rules to convert the dasCore SVN repository to git, using the 
# svn-all-fast-export tool from the KDE project.
#
#
# dasCore dependencies tree
#
#  dasCore  (main)          dasCore  (original)
#    |                        |     
#    |- QDataSet              |- (none)
#    |   |
#    |   |- dasCoreDatum
#    |   |- dasCoreUtil
#    |
#    |- dasCoreDatum
#    |- dasCoreUtil
#    
#
# The overall dasCore project went through three major changes
# 
# 1. Renaming the source packages: edu.uiowa.physics.pw.das -> org.das2
#
# 2. Branching to add a dependency on the QDataSet project
#
# 3. Breaking off the util and datum components.
#
# The sources that are today (r12058) part of dasCore have moved around
# the larger SVN repository roughly as follows:
#
#  Revision Range    Repo Path                                      Branch
#  ---------------   --------------------------------------------   ------
#  r1     -  r4347   dasCore/trunk                                  main
#                    (big reorg to src/org/das2 ~r4187)
# 
#  r4348  -  r6504   dasCore/trunk                                  main
#  r6505  -  r9324   dasCore/netbeans_trunk                         main
#
#    (Maybe add maven main here?)
#
#  r9325  - r12056   core/stable/dasCore                            main
#
#  r4348  -  r4387   dasCore/branches/community/autoplot/trunk      autoplot
#  r4388  -  r5032   dasCore/community/autoplot/trunk               autoplot
#  r5033  -  r5215   dasCore/community/autoplot2010/trunk/dasCore   autoplot
#  r5216  - r12058   dasCore/community/autoplot2011/trunk/dasCore   autoplot
#
#
# Since the autoplot branch has the most support, it will be displayed as the
# default branch at github.com
#
# The rule-set below thus tracks different paths for different revision 
# ranges.

create repository das2java
end repository

# The main branch (which maybe should be called the classic branch?)
match /dasCore/trunk/
  min revision 1
  max revision 4348
  branch main
  prefix dasCore/
  repository das2java
end match

# The autoplot branch (which receives more testing than the main branch)

match /dasCore/branches/community/autoplot/trunk/
  min revision 4348
  max revision 4387
  branch main
  prefix dasCore/
  repository das2java
end match

match /dasCore/community/autoplot/trunk/
  min revision 4388  
  max revision 5032
  branch main
  prefix dasCore/
  repository das2java
end match

# Starting with the 2010 branches we no longer need a sub directory
# for dasCore since it's separated out in the repository
match /dasCore/community/autoplot2010/trunk/
  min revision 5033
  max revision 5215
  branch main
  repository das2java
end match

match /dasCore/community/autoplot2011/trunk/
  min revision 5216
  branch main
  repository das2java
end match

# And the continuation of the original branch

match /dasCore/trunk/
  min revision 4348
  max revision 6504
  branch original
  prefix dasCore/
  repository das2java
end match

match /dasCore/netbeans_trunk/
  min revision 6505 
  max revision 9324
  branch original
  prefix dasCore/
  repository das2java
end match

match /core/stable/dasCore/
  min revision 9325  
  branch original
  prefix dasCore/
  repository das2java
end match

# Ignore all the rest of the repo
match /
end match

QDataSet.rules

# Rules to extract the qdataset project from the autoplot repoitory and convent
# it co a get repoistory using the svn-all-fast-export tool from the KDE project

create repository qdataset
end repository

# Only one branch retained in this ruleset, since the focus is on 
# retaining SVN revision numbers for later tagging

match /autoplot/trunk/QDataSet/
	repository qdataset
	branch main
	prefix QDataSet/
	min revision 5779
end match

match /autoplot/trunk/QStream/
	repository qdataset
	branch main
	prefix QStream/
	min revision 5779
end match



match /autoplot/branches/autoplot2010/QDataSet/
	repository qdataset
	branch main
	prefix QDataSet/	
	max revision 5778
	min revision 5287
end match

match /autoplot/branches/autoplot2010/QStream/
	repository qdataset
	branch main
	prefix QStream/	
	max revision 5778
	min revision 5287
end match



match /autoplot/branches/agu2009/QDataSet/
	repository qdataset
	branch main
	prefix QDataSet/
	max revision 5286
	min revision 3784
end match

match /autoplot/branches/agu2009/QStream/
	repository qdataset
	branch main
	prefix QStream/
	max revision 5286
	min revision 3784
end match



match /autoplot/trunk/QDataSet/
	repository qdataset
	branch main
	prefix QDataSet/
	max revision 3783
	min revision 1
end match

match /autoplot/trunk/QStream/
	repository qdataset
	branch main
	prefix QStream/
	max revision 3783
	min revision 1
end match

# Ignore all the rest of the repo
match /
end match