Git
Chapters ▾ 2nd Edition

9.2 Git in drugi sistemi - Migracija na Git

Migracija na Git

Če imate obstoječo bazo kode v drugem VCS-ju, vendar ste se odločili začeti uporablajti Git, morate migrirati vaš projekt na en ali drug način. Ta sekcija gre skozi nekaj uvoznikov za pogoste sisteme in nato demonstrira, kako razvijati vašega lastnega uvoznika. Naučili se boste, kako uvažati podatke iz nekaj največjih profesionalnih uporabljenih SCM sistemov, ker delajo glavnino uporabnikov, ki preklapljajo in ker visoko kvalitetna orodja zanje so enostavna za dobiti.

Subversion

Če ste prebrali prejšnjo sekcijo o uporabi git svn lahko enostavno uporabite ta navodila za git svn clone repozitorija; nato prenehajte uporabljati strežnik Subversion, potisnite na novi strežnik Git in ga začnite uporabljati. Če želite zgodovino, lahko to dosežete kakor hitro lahko potegnete podatke iz strežnika Subversion (kar lahko vzame nekaj časa).

Vendar import ni popoln; in ker bo vzel nekaj časa, lahko tudi naredite, kakor je prav. Prvi problem so informacije avtorja. V Subversion-u, vsaka oseba, ki pošilja, ima uporabnika na sistemu, ki je posnet v informacija pošiljanja. Primeri v prejšnji sekciji prikazani schacon na nekaterih mestih, kot je izpis blame in git svn log. Če želite preslikati to na boljše podatke Git avtorja, morate preslika iz Subversion uporabnikov na avtorje Git. Ustvarite datoteko imenovano users.txt, ki ima to preslikavo v sledeči obliki:

schacon = Scott Chacon <schacon@geemail.com>
selse = Someo Nelse <selse@geemail.com>

Da dobite seznam imen avtorja, ki jih uporablja SVN, lahko poženete to:

$ svn log --xml | grep author | sort -u | \
  perl -pe 's/.*>(.*?)<.*/$1 = /'

To generira izpis dnevnika v XML formatu, nato obdrži samo vrstice z informacijami avtorja, opusti duplikate, izpusti značke XML. (Očitno to deluje samo na napravi z nameščenimi grep, sort in perl.) Nato preusmerite ta izpis v vašo datoteko users.txt, da lahko dodate ekvivalentne podatke Git uporabnika zraven vsakega vnosa.

To datoteko lahko ponudite git svn, da pomaga preslikati podatke avtorja bolj točno. Poveste lahko tudi, da git svn ne vključuje meta podatkov, ki jih Subvestion običajno uvaža s podajanjem --no-metadata k ukazoma clone ali init. To naredi vaš ukaz import, da izgleda sledeče:

$ git svn clone http://my-project.googlecode.com/svn/ \
      --authors-file=users.txt --no-metadata -s my_project

Sedaj morate imeti lepši uvoz Subversion-a v vaš direktorij my_project. Namesto pošiljanj, ki izgledajo takole

commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

    git-svn-id: https://my-project.googlecode.com/svn/trunk@94 4c93b258-373f-11de-
    be05-5f7a86268029

izgledajo takole:

commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2
Author: Scott Chacon <schacon@geemail.com>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

Ne samo, da polje Author izgleda veliko boljše, ampak tudi git-svn-id ni več tam.

Sedaj bi morali narediti tudi nekaj post-import čiščenja. Za eno stvar, bi morali počistiti čudne reference, ki jih je nastavil git svn. Najprej boste premaknili oznake, da so dejansko oznake namesto čudnih oddaljenih vej in nato boste premaknili preostanek vej, da so lokalne.

Da premaknete oznake, da so ustrezne Git oznake, poženite

$ cp -Rf .git/refs/remotes/origin/tags/* .git/refs/tags/
$ rm -Rf .git/refs/remotes/origin/tags

To vzame reference, ki so oddaljene veje in se začnejo z remotes/origin/tags/ in jih naredi realne (lightweight) oznake.

Naslednje premaknite preostanek referenc pod refs/remotes, da so lokalne veje:

$ cp -Rf .git/refs/remotes/* .git/refs/heads/
$ rm -Rf .git/refs/remotes

Sedaj so vse stare veje prave Git veje in vse stare oznake so prave Git oznake. Zadnja stvar za narediti je dodati vaš novi strežnik Git kot daljavo in potisniti nanj. Tu je primer dodajanja vašega strežnika kot daljavo:

$ git remote add origin git@my-git-server:myrepository.git

Ker želite vse vaše veje in oznake dodati gor, lahko sedaj poženete to:

$ git push origin --all

Vse vaše veje in oznake bi morale biti na vašem novem Git strežniku z lepim, čistim uvozom.

Mercurial

Ker imate Mercurial in Git precej podobna modela za predstavitev verzij in ker je Git nekoliko bolj fleksibilen, je pretvorba repozitorija iz Mercurial na Git precej enostavna z uporabo orodja imenovanega "hg-fast-export", ki ga boste potrebovali kopirati:

$ git clone http://repo.or.cz/r/fast-export.git /tmp/fast-export

Prvi korak je pretvorba dobiti polni klon repozitorija Mercurial, ki ga želite pretvoriti:

$ hg clone <remote repo URL> /tmp/hg-repo

Naslednji korak je ustvariti datoteko preslikave avtorja. Mercurial je nekoliko bolj odpustljiv kot Git zaradi česar bo dal polje avtorja za skupke sprememb, torej je to dober čas za počistiti hišo. Generiranje tega je ukaz ene vrtice v lupini bash:

$ cd /tmp/hg-repo
$ hg log | grep user: | sort | uniq | sed 's/user: *//' > ../authors

To bo vzelo nekaj sekund, odvisno od tega kako dolga je zgodovina vašega projekta in potem bo datoteka /tmp/authors izgledala nekako takole:

bob
bob@localhost
bob <bob@company.com>
bob jones <bob <AT> company <DOT> com>
Bob Jones <bob@company.com>
Joe Smith <joe@company.com>

V tem primeru je ista oseba (Bob) ustvarila skupek sprememb pod štirimi različnimi imeni, eno izmed njih dejansko izgleda v redu in eno od njih bi bilo popolnoma neveljavno za pošiljanje Git-a. Hg-fast-export vam omogoča to popraviti z dodajanjem `={novo ime in e-pošta} na koncu vsake vrstice, ki jo želimo spremeniti in odstraniti vrstice za katerokoli uporabniško ime, ki ga želimo pustiti pri miru. Če vsa uporabniška imena izgledajo v redu te datoteke sploh ne bomo potrebovali. V tem primeru želimo, da naša datoteka izgleda takole:

bob=Bob Jones <bob@company.com>
bob@localhost=Bob Jones <bob@company.com>
bob jones <bob <AT> company <DOT> com>=Bob Jones <bob@company.com>
bob <bob@company.com>=Bob Jones <bob@company.com>

Naslednji korak je ustvariti naš novi repozitorij Git in pognati izvozno skripto:

$ git init /tmp/converted
$ cd /tmp/converted
$ /tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors

Zastaviva -r pove hg-fast-export, kje najti repozitorij Mercurial, ki ga želimo pretvoriti in zastavica -A mu pove, kje najti datoteko author-mapping. Skripta prevede skupke sprememb Mercurial-a in jih pretvori v skripto za Git-ovo lastnost "fast-import" (o kateri bomo govorili v podrobnosti nekoliko kasneje). To vzame nekaj (čeprav je veliko hitreje kot bi bilo preko omrežja) in izpis je precej opisen:

$ /tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors
Loaded 4 authors
master: Exporting full revision 1/22208 with 13/0/0 added/changed/removed files
master: Exporting simple delta revision 2/22208 with 1/1/0 added/changed/removed files
master: Exporting simple delta revision 3/22208 with 0/1/0 added/changed/removed files
[…]
master: Exporting simple delta revision 22206/22208 with 0/4/0 added/changed/removed files
master: Exporting simple delta revision 22207/22208 with 0/2/0 added/changed/removed files
master: Exporting thorough delta revision 22208/22208 with 3/213/0 added/changed/removed files
Exporting tag [0.4c] at [hg r9] [git :10]
Exporting tag [0.4d] at [hg r16] [git :17]
[…]
Exporting tag [3.1-rc] at [hg r21926] [git :21927]
Exporting tag [3.1] at [hg r21973] [git :21974]
Issued 22315 commands
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     120000
Total objects:       115032 (    208171 duplicates                  )
      blobs  :        40504 (    205320 duplicates      26117 deltas of      39602 attempts)
      trees  :        52320 (      2851 duplicates      47467 deltas of      47599 attempts)
      commits:        22208 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:         109 (         2 loads     )
      marks:        1048576 (     22208 unique    )
      atoms:           1952
Memory total:          7860 KiB
       pools:          2235 KiB
     objects:          5625 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =      90430
pack_report: pack_mmap_calls          =      46771
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  340852700 /  340852700
---------------------------------------------------------------------

$ git shortlog -sn
   369  Bob Jones
   365  Joe Smith

To je večinoma vse, kar je. Vse oznake Mercurial-a so bile pretvorjene v oznake Git in veje Mercurial in zaznamki so bili pretvorjeni v veje Git. Sedaj ste pripravljeni potisniti repozitorij na njegov novi strežniški dom:

$ git remote add origin git@my-git-server:myrepository.git
$ git push origin --all

Perforce

Naslednji sistem, ki ga boste pogledali pri uvažanju je Perforce. Kot smo govorili zgoraj, sta dva načina, da omogočimo Git-u in Perforce-u govoriti drug z drugim: git-p4 in Perforce Git Fusion.

Perforce Git Fusion

Git Fusion naredi ta proces precej neboleč. Samo nastavite nastavitve vašega projekta, preslikave uporabnika in veje, ki uporabljajo nastavitveno datoteko (kot je povedano v Git Fusion in klonirajte repozitorij. Git Fusion vas pusti z nečim, kar izgleda kot materni repozitorij Git, ki je nato pripravljen za potiskanje na materni gostitelj Gi, če želite. Lahko bi celo uporabili Perforce kot vašega gostitelja Git-a, če želite.

Git-p4

Git-p4 se lahko obnaša tudi kot uvozno orodje. Kot primer, bomo uvozili projekt Jam iz Perforce javnega depot-a. Da nastavite vašega klienta, morate izvoziti okoljsko spremenljivko P4PORT, da kaže na Perforce depot:

$ export P4PORT=public.perforce.com:1666
Note

Da zrave sledite, boste potrebovali Perforce depot za povezavo. Uporabljali bomo javni depot na public.perforce.com za naš primer, vendar lahko uporabite katerikoli drugi depot, do katerega imate dostop.

Poženite ukaz git p4 clone, da uvozite projekt Jam iz strežnika Perforce, kar dobavlja depot in pot projekta ter pot v katero želite uvoziti projekt:

$ git-p4 clone //guest/perforce_software/jam@all p4import
Importing from //guest/perforce_software/jam@all into p4import
Initialized empty Git repository in /private/tmp/p4import/.git/
Import destination: refs/remotes/p4/master
Importing revision 9957 (100%)

Ta določen projekt ima samo eno vejo vendar če imate veje, ki so nastavljive s pogledi vej (ali samo skupkom direktorijev), lahko uporabite zastavico --detect-branches na git p4 clone, da uvozite tudi vse veje projekta. Glejte Branching za nekoliko več podrobnosti o tem.

Na tej točki ste že skoraj končali. Če greste v direktorij p4import in poženete git log, lahko vidite vaše uvoženo delo:

$ git log -2
commit e5da1c909e5db3036475419f6379f2c73710c4e6
Author: giles <giles@giles@perforce.com>
Date:   Wed Feb 8 03:13:27 2012 -0800

    Correction to line 355; change </UL> to </OL>.

    [git-p4: depot-paths = "//public/jam/src/": change = 8068]

commit aa21359a0a135dda85c50a7f7cf249e4f7b8fd98
Author: kwirth <kwirth@perforce.com>
Date:   Tue Jul 7 01:35:51 2009 -0800

    Fix spelling error on Jam doc page (cummulative -> cumulative).

    [git-p4: depot-paths = "//public/jam/src/": change = 7304]

Vidite lahko, da je git-p4 pustil identifikator v vsakem sporočilu pošiljanja. V redu je obdržati ta identifikator tam v primeru, če se potrebujete sklicevati na Perforce število spremembe kasneje. Vendar, če želite odstraniti identifikator je to sedaj čas, da naredite - preden začnete delati delo na novem repozitoriju. Lahko uporabite git filter-branc, da odstranite nize identifikatorja v celoti:

$ git filter-branch --msg-filter 'sed -e "/^\[git-p4:/d"'
Rewrite e5da1c909e5db3036475419f6379f2c73710c4e6 (125/125)
Ref 'refs/heads/master' was rewritten

Če poženete git log lahko vidite, da so bile vse preverjene vsote SHA-1 za pošiljanja spremenjene vendar nizi git-p4 niso več v sporočilih pošiljanja:

$ git log -2
commit b17341801ed838d97f7800a54a6f9b95750839b7
Author: giles <giles@giles@perforce.com>
Date:   Wed Feb 8 03:13:27 2012 -0800

    Correction to line 355; change </UL> to </OL>.

commit 3e68c2e26cd89cb983eb52c024ecdfba1d6b3fff
Author: kwirth <kwirth@perforce.com>
Date:   Tue Jul 7 01:35:51 2009 -0800

    Fix spelling error on Jam doc page (cummulative -> cumulative).

Vaš uvoz je sedaj pripravljen potisniti na vaš novi strežnik Git.

TFS

Če vaša ekipa pretvarja svoj nadzor izvorne kode iz TFVC na Git, boste želeli najvišjo zvestobo pretvorbe, ki jo lahko dobite. To pomeni, da medtem ko smo pokrili tako git-tfv in git-tf za sekcijo interoperabilnosti, bomo pokrili samo git-tfs za ta del, ker git-tfs podpira veje in to je nedopustno težko z uporabo git-tf.

Note

To je enosmerna pretvorba. Rezultirajoči repozitorij Git se ne bo zmožen povezati z originalnim projektom TFVC.

Prva stvar, ki jo morate narediti je preslikati uporabniška imena. TFVC je precej liberalen s tem, kar gre v polje avtorja za skupke sprememb, vendar Git želi človeku bralno ime in naslov e-pošte. Te informacije lahko dobite iz klienta ukazne vrstice tf, sledeče:

PS> tf history $/myproject -recursive > AUTHORS_TMP

To vzame vse skupke sprememb v zgodovini projekta in jih da v datoteko AUTHORS_TMP, ki ga bomo procesirali za razširitev podatkov stolpca User (2. stolpec). Odprite datoteko in najdite kateri znaki se začnejo na koncu stolpca in zamenjajo v sledeči ukazni vrstici, parametri 11-20 ukaza cut s tistimi najdenimi:

PS> cat AUTHORS_TMP | cut -b 11-20 | tail -n+3 | uniq | sort > AUTHORS

Ukaz cut obdrži samo znake med 11 in 20 iz vsake vrstice. Ukaz tail preskoči prvi dve vrstici, ki sta glavi polj in podčrtaji ASCII-art. Rezultat vsega tega je preusmerjen na uniq, da eliminira duplikate in shrani datoteko imenovano AUTHORS. Naslednji korak je ročen; da je git-tfs efektiven uporabite to datoteko, vsaka vrstica mora biti tega formata:

DOMAIN\username = User Name <email@address.com>

Del na levi je polje “User” iz TFVC in del na desni strani znaka za enakost je uporabniško ime, ki bo uporabljeno za pošiljanja Git.

Ko imate enkrat to datoteko, je naslednja stvar narediti polno kloniranje projekta TFVC, za katerega ste zainteresirani:

PS> git tfs clone --with-branches --authors=AUTHORS https://username.visualstudio.com/DefaultCollection $/project/Trunk project_git

Naslednje boste želeli počistiti sekcije git-tfs-id iz dna sporočila pošiljanja. Sledeči ukaz bo to naredil:

PS> git filter-branch -f --msg-filter 'sed "s/^git-tfs-id:.*$//g"' -- --all

To uporablja ukaz sed iz okolja Git-bash, da zamenja katerokoli vrstico, ki se začne z “git-tfs-id:” s praznino, ki jo bo Git nato ignoriral.

Ko je enkrat to narejeno, ste pripravljeni, da dodate novo daljavo, potisnete navzgor vse vaše veje in vaša ekipa prične delati iz Git-a.

A Custom Importer

If your system isn’t one of the above, you should look for an importer online – quality importers are available for many other systems, including CVS, Clear Case, Visual Source Safe, even a directory of archives. If none of these tools works for you, you have a more obscure tool, or you otherwise need a more custom importing process, you should use git fast-import. This command reads simple instructions from stdin to write specific Git data. It’s much easier to create Git objects this way than to run the raw Git commands or try to write the raw objects (see Notranjost Git-a for more information). This way, you can write an import script that reads the necessary information out of the system you’re importing from and prints straightforward instructions to stdout. You can then run this program and pipe its output through git fast-import.

To quickly demonstrate, you’ll write a simple importer. Suppose you work in current, you back up your project by occasionally copying the directory into a time-stamped back_YYYY_MM_DD backup directory, and you want to import this into Git. Your directory structure looks like this:

$ ls /opt/import_from
back_2014_01_02
back_2014_01_04
back_2014_01_14
back_2014_02_03
current

In order to import a Git directory, you need to review how Git stores its data. As you may remember, Git is fundamentally a linked list of commit objects that point to a snapshot of content. All you have to do is tell fast-import what the content snapshots are, what commit data points to them, and the order they go in. Your strategy will be to go through the snapshots one at a time and create commits with the contents of each directory, linking each commit back to the previous one.

As we did in An Example Git-Enforced Policy, we’ll write this in Ruby, because it’s what we generally work with and it tends to be easy to read. You can write this example pretty easily in anything you’re familiar with – it just needs to print the appropriate information to stdout. And, if you are running on Windows, this means you’ll need to take special care to not introduce carriage returns at the end your lines – git fast-import is very particular about just wanting line feeds (LF) not the carriage return line feeds (CRLF) that Windows uses.

To begin, you’ll change into the target directory and identify every subdirectory, each of which is a snapshot that you want to import as a commit. You’ll change into each subdirectory and print the commands necessary to export it. Your basic main loop looks like this:

last_mark = nil

# loop through the directories
Dir.chdir(ARGV[0]) do
  Dir.glob("*").each do |dir|
    next if File.file?(dir)

    # move into the target directory
    Dir.chdir(dir) do
      last_mark = print_export(dir, last_mark)
    end
  end
end

You run print_export inside each directory, which takes the manifest and mark of the previous snapshot and returns the manifest and mark of this one; that way, you can link them properly. “Mark” is the fast-import term for an identifier you give to a commit; as you create commits, you give each one a mark that you can use to link to it from other commits. So, the first thing to do in your print_export method is generate a mark from the directory name:

mark = convert_dir_to_mark(dir)

You’ll do this by creating an array of directories and using the index value as the mark, because a mark must be an integer. Your method looks like this:

$marks = []
def convert_dir_to_mark(dir)
  if !$marks.include?(dir)
    $marks << dir
  end
  ($marks.index(dir) + 1).to_s
end

Now that you have an integer representation of your commit, you need a date for the commit metadata. Because the date is expressed in the name of the directory, you’ll parse it out. The next line in your print_export file is

date = convert_dir_to_date(dir)

where convert_dir_to_date is defined as

def convert_dir_to_date(dir)
  if dir == 'current'
    return Time.now().to_i
  else
    dir = dir.gsub('back_', '')
    (year, month, day) = dir.split('_')
    return Time.local(year, month, day).to_i
  end
end

That returns an integer value for the date of each directory. The last piece of meta-information you need for each commit is the committer data, which you hardcode in a global variable:

$author = 'John Doe <john@example.com>'

Now you’re ready to begin printing out the commit data for your importer. The initial information states that you’re defining a commit object and what branch it’s on, followed by the mark you’ve generated, the committer information and commit message, and then the previous commit, if any. The code looks like this:

# print the import information
puts 'commit refs/heads/master'
puts 'mark :' + mark
puts "committer #{$author} #{date} -0700"
export_data('imported from ' + dir)
puts 'from :' + last_mark if last_mark

You hardcode the time zone (-0700) because doing so is easy. If you’re importing from another system, you must specify the time zone as an offset. The commit message must be expressed in a special format:

data (size)\n(contents)

The format consists of the word data, the size of the data to be read, a newline, and finally the data. Because you need to use the same format to specify the file contents later, you create a helper method, export_data:

def export_data(string)
  print "data #{string.size}\n#{string}"
end

All that’s left is to specify the file contents for each snapshot. This is easy, because you have each one in a directory – you can print out the deleteall command followed by the contents of each file in the directory. Git will then record each snapshot appropriately:

puts 'deleteall'
Dir.glob("**/*").each do |file|
  next if !File.file?(file)
  inline_data(file)
end

Note: Because many systems think of their revisions as changes from one commit to another, fast-import can also take commands with each commit to specify which files have been added, removed, or modified and what the new contents are. You could calculate the differences between snapshots and provide only this data, but doing so is more complex – you may as well give Git all the data and let it figure it out. If this is better suited to your data, check the fast-import man page for details about how to provide your data in this manner.

The format for listing the new file contents or specifying a modified file with the new contents is as follows:

M 644 inline path/to/file
data (size)
(file contents)

Here, 644 is the mode (if you have executable files, you need to detect and specify 755 instead), and inline says you’ll list the contents immediately after this line. Your inline_data method looks like this:

def inline_data(file, code = 'M', mode = '644')
  content = File.read(file)
  puts "#{code} #{mode} inline #{file}"
  export_data(content)
end

You reuse the export_data method you defined earlier, because it’s the same as the way you specified your commit message data.

The last thing you need to do is to return the current mark so it can be passed to the next iteration:

return mark
Note

If you are running on Windows you’ll need to make sure that you add one extra step. As mentioned before, Windows uses CRLF for new line characters while git fast-import expects only LF. To get around this problem and make git fast-import happy, you need to tell ruby to use LF instead of CRLF:

$stdout.binmode

That’s it. Here’s the script in its entirety:

#!/usr/bin/env ruby

$stdout.binmode
$author = "John Doe <john@example.com>"

$marks = []
def convert_dir_to_mark(dir)
    if !$marks.include?(dir)
        $marks << dir
    end
    ($marks.index(dir)+1).to_s
end


def convert_dir_to_date(dir)
    if dir == 'current'
        return Time.now().to_i
    else
        dir = dir.gsub('back_', '')
        (year, month, day) = dir.split('_')
        return Time.local(year, month, day).to_i
    end
end

def export_data(string)
    print "data #{string.size}\n#{string}"
end

def inline_data(file, code='M', mode='644')
    content = File.read(file)
    puts "#{code} #{mode} inline #{file}"
    export_data(content)
end

def print_export(dir, last_mark)
    date = convert_dir_to_date(dir)
    mark = convert_dir_to_mark(dir)

    puts 'commit refs/heads/master'
    puts "mark :#{mark}"
    puts "committer #{$author} #{date} -0700"
    export_data("imported from #{dir}")
    puts "from :#{last_mark}" if last_mark

    puts 'deleteall'
    Dir.glob("**/*").each do |file|
        next if !File.file?(file)
        inline_data(file)
    end
    mark
end


# Loop through the directories
last_mark = nil
Dir.chdir(ARGV[0]) do
    Dir.glob("*").each do |dir|
        next if File.file?(dir)

        # move into the target directory
        Dir.chdir(dir) do
            last_mark = print_export(dir, last_mark)
        end
    end
end

If you run this script, you’ll get content that looks something like this:

$ ruby import.rb /opt/import_from
commit refs/heads/master
mark :1
committer John Doe <john@example.com> 1388649600 -0700
data 29
imported from back_2014_01_02deleteall
M 644 inline README.md
data 28
# Hello

This is my readme.
commit refs/heads/master
mark :2
committer John Doe <john@example.com> 1388822400 -0700
data 29
imported from back_2014_01_04from :1
deleteall
M 644 inline main.rb
data 34
#!/bin/env ruby

puts "Hey there"
M 644 inline README.md
(...)

To run the importer, pipe this output through git fast-import while in the Git directory you want to import into. You can create a new directory and then run git init in it for a starting point, and then run your script:

$ git init
Initialized empty Git repository in /opt/import_to/.git/
$ ruby import.rb /opt/import_from | git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:           13 (         6 duplicates                  )
      blobs  :            5 (         4 duplicates          3 deltas of          5 attempts)
      trees  :            4 (         1 duplicates          0 deltas of          4 attempts)
      commits:            4 (         1 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           1 (         1 loads     )
      marks:           1024 (         5 unique    )
      atoms:              2
Memory total:          2344 KiB
       pools:          2110 KiB
     objects:           234 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =         10
pack_report: pack_mmap_calls          =          5
pack_report: pack_open_windows        =          2 /          2
pack_report: pack_mapped              =       1457 /       1457
---------------------------------------------------------------------

As you can see, when it completes successfully, it gives you a bunch of statistics about what it accomplished. In this case, you imported 13 objects total for 4 commits into 1 branch. Now, you can run git log to see your new history:

$ git log -2
commit 3caa046d4aac682a55867132ccdfbe0d3fdee498
Author: John Doe <john@example.com>
Date:   Tue Jul 29 19:39:04 2014 -0700

    imported from current

commit 4afc2b945d0d3c8cd00556fbe2e8224569dc9def
Author: John Doe <john@example.com>
Date:   Mon Feb 3 01:00:00 2014 -0700

    imported from back_2014_02_03

There you go – a nice, clean Git repository. It’s important to note that nothing is checked out – you don’t have any files in your working directory at first. To get them, you must reset your branch to where master is now:

$ ls
$ git reset --hard master
HEAD is now at 3caa046 imported from current
$ ls
README.md main.rb

You can do a lot more with the fast-import tool – handle different modes, binary data, multiple branches and merging, tags, progress indicators, and more. A number of examples of more complex scenarios are available in the contrib/fast-import directory of the Git source code.