Rebase edit and commit splitting - Part I

January 18, 2018
Andrea Richiardi

It is generally good practice to organize your patches in self-contained, well-motivated commits so that your colleagues or your future self (akin to a brand new colleague if you like me tend to be forgetful) don't have to reverse engineer and investigate things you have already, supposedly, dealt with.

Nevertheless, hacking is mostly chaotic so you usually end up with commits that include many unrelated changes. Note that I am trying not to use the word feature here mostly because squashing topic branches can mitigate this problem.

I believe though that even within topic branches you want to maintain some sort of order so that you can express the "story of the feature" better. If done right, you could actually just rebase your feature without squashing.

Case Study

Let's say you are working on an open source project with one file in it: file.txt. The first and second commit are just simple additions (git output compacted for clarity):

commit 4b9899b6dd36082f1877bbe6b42a598a8708050f

    Write second line

diff --git a/file.txt b/file.txt
index 9748558..cb184b5 100644
--- a/file.txt
+++ b/file.txt
@@ -1 +1,3 @@
 This is a file and I am writing it.
+
+This is the second line and I so like it.

commit 2cfc94884d31c388f9204f44beef0deaefd25414

    Start writing a file

diff --git a/file.txt b/file.txt
new file mode 100644
index 0000000..9748558
--- /dev/null
+++ b/file.txt
@@ -0,0 +1 @@
+This is a file and I am writing it.

Now you want to write a third line and modify your thoughts on the other ones:

commit ea8c0d3f9c4da5de35970a7a6c00507a0a54cae3

    Write third line

diff --git a/file.txt b/file.txt
index cb184b5..846d404 100644
--- a/file.txt
+++ b/file.txt
@@ -1,3 +1,5 @@
-This is a file and I am writing it.
+This is the first line of the file and I am writing it.

-This is the second line and I so like it.
+This is the second line and I now I don't like it that much.
+
+This is the best and third line.

Restlessly waiting for your branch to be merged, your best friend - the project maintainer - asks you to split the third commit so that the change to the first line is more visible in the history.

Don't panic, that's all good, we got this covered in not only one but two ways.

Rebase and edit

Every time you rebase you rewrite history. You have probably read a lot about how bad this is, and there is even a fully-fledged programming paradigm that fosters immutability. Let's face it though, nothing is inherently bad if you know what you are doing. Git references are actually mutable things you can always refer to. Just make sure that nobody is relying on the commits you are modifying, in which case you don't want to git push --force. Never ever force push to master.

The first thing to understand is that for modifying history we need a "stable" point in time to start from. "Doc" Emmet Brown chooses a very specific moment in time when he decides to change the McFly family history in Back to the Future Part II. That's the point from where you start modifying. In git rebase terms that moment is called base and it is represented as a SHA-1. You will modify things AFTER it:

$ git log --oneline
ea8c0d3 Write third line      # will rewrite this
4b9899b Write second line     # back in time to this
2cfc948 Start writing a file
$ git rebase --interactive 4b9899b
edit ea8c0d3 Write third line # from "pick" to "edit"
# Rebase ...
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
...

Your DeLorean accelerates to 88 miles per hour (141.6 km/h) and the flux capacitor is on - you're now back in time - at the ea8c0d3 commit right after the 4b9899b. The above is called the rebase todo list and allows you to specify what to do at each and every "stop" in time, while reworking history.

Stopped at ea8c0d3f9c4da5de35970a7a6c00507a0a54cae3... Write third line
You can amend the commit now, with

	git commit --amend

Once you are satisfied with your changes, run

	git rebase --continue

$ git log --oneline
ea8c0d3 Write third line
4b9899b Write second line
2cfc948 Start writing a file

Remember that at any given moment you can git rebase --abort and just discard everything and get back to safety.

Split using git reset - the do-it-again way

This approach is in my opinion the most inelegant but indeed very practical and easy to understand. It basically undoes all the git changes without touching the files, only the index. Note that your Write third line commit will not appear in the history anymore:

$ git reset 4b9899b
Unstaged changes after reset:
M	file.txt

$ git log --oneline
4b9899b Write second line
2cfc948 Start writing a file

After this you can use another great git trick, git add --patch:

$ git add --patch file.txt
diff --git a/file.txt b/file.txt
index cb184b5..846d404 100644
--- a/file.txt
+++ b/file.txt
@@ -1,3 +1,5 @@
-This is a file and I am writing it.
+This is the first line of the file and I am writing it.

-This is the second line and I so like it.
+This is the second line and I now I don't like it that much.
+
q+This is the best and third line.
Stage this hunk [y,n,q,a,d,/,s,e,?]? s

Split into 2 hunks.
@@ -1,2 +1,2 @@
-This is a file and I am writing it.
+This is the first line of the file and I am writing it.

Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? n
@@ -2,2 +2,4 @@

-This is the second line and I so like it.
+This is the second line and I now I don't like it that much.
+
+This is the best and third line.
Stage this hunk [y,n,q,a,d,/,K,g,e,?]? y

See the s answer? That is the split we need.

$ git diff --staged
diff --git a/file.txt b/file.txt
index cb184b5..643fe55 100644
--- a/file.txt
+++ b/file.txt
@@ -1,3 +1,5 @@
 This is a file and I am writing it.

-This is the second line and I so like it.
+This is the second line and I now I don't like it that much.
+
+This is the best and third line.

Now the bottom lines are staged. Exactly what we wanted, we can commit.

$ git commit --message ...

Hrm, what was the commit message again? I cannot recall it and now the original commit is gone. Git got this covered as well but you need to know the SHA-1 of the original commit:

$ git commit --reuse-message=ea8c0d3
[detached HEAD c3e3c4d] Write third line
 Date: Mon Jan 15 17:32:42 2018 -0800
 1 file changed, 3 insertions(+), 1 deletion(-)

$ git log --oneline
c3e3c4d Write third line
4b9899b Write second line
2cfc948 Start writing a file

It looks good, everything is in there but the change to the first line:

$ git diff --patch c3e3c4d 4b9899b
diff --git a/file.txt b/file.txt
index 643fe55..cb184b5 100644
--- a/file.txt
+++ b/file.txt
@@ -1,5 +1,3 @@
 This is a file and I am writing it.

-This is the second line and I now I don't like it that much.
-
-This is the best and third line.
+This is the second line and I so like it.

Let's finish this up:

$ git add --patch file.txt
diff --git a/file.txt b/file.txt
index 643fe55..846d404 100644
--- a/file.txt
+++ b/file.txt
@@ -1,4 +1,4 @@
-This is a file and I am writing it.
+This is the first line of the file and I am writing it.

 This is the second line and I now I don't like it that much.

Stage this hunk [y,n,q,a,d,/,e,?]? y

Note that a git add would have achieved the same here but it is always good to give things a last read.

$ git commit --message "Modify the first line with new thoughts"
[detached HEAD b7bdd01] Modify the first line with new thoughts
 1 file changed, 1 insertion(+), 1 deletion(-)

$ git rebase --continue          # concludes the editing session
Successfully rebased and updated refs/heads/master.

$ git log --oneline
b7bdd01 Modify the first line with new thoughts
c3e3c4d Write third line
4b9899b Write second line
2cfc948 Start writing a file

Done and done. The reason why I maintain this is not the most elegant way for doing can be explained with the Back to the Future analogy again: "Doc" goes back in time knowing what the future holds and knowing which part of the past to change but cannot isolate those changes. He cannot pick and choose and ends up modifying things he did not mean to.

This is exactly what happens when you git reset - you indiscriminately throw everything away and need to rebuild from scratch, with all the problems that come with it. The one we have already noticed is that you need to remember things from the past, like the SHA-1 of the original commit for the message. If the past you rewrite is very tangled end up confused and can easily make mistakes.

In the next post I will explain the second and more elegant way.

Thank you for reading!

Links