Stacked Diffs on GitHub with SPR!
This is a transcription of a podcast.
Sven: In your local repository, you do development with a Stacked Diff workflow, like you’re used to. If you did that before. And if you use SPR, people will be happy to review your code on GitHub and you can get it merged and basically use GitHub as your code review tool and a central repository, but keep your old workflow.
Jackson: Hello everyone. The voice you’re hearing right now is Jackson Gabbard the CTO of Cord.com. But I’m also an ex Facebook engineer. And in fact, I’m an old fogey engineer. I’ve been in this hustle for a good long time. And the reason that’s important is because I have seen many, many development workflows going all the way back to early Git, SVN, CVS, syncing files on FTP, and also working with companies that use to no source control whatsoever.
Something I’ve learned across all of that time is an efficient engineering workflow and good source control matters.
It matters so much that I wrote a blog post about Stacked Diffs versus Pull Requests, trying to show the industry, “Hey, there’s a better way that almost everyone is missing out on.” I feel very lucky that that blog post has been widely distributed.
I’ve been on the front page of Hacker News, I think it’s two times or three times now. And I’ve gotten a lot of feedback from people positively and negatively and largely out of confusion. I think a lot of people still just really don’t know this workflow. Hopefully this discussion will change some of that.
I’m joined today with Sven Over, a senior engineer here at Cord, who actually also came from Facebook. And before that he was a Last FM engineer. And before that he was an astrophysicist, as I recall…
Sven: Astro Particle Physicist.
Jackson: An Astro Particle Physicist! My apologies, my apologies. Sven and I share a passion for efficient workflows.
And, in fact, at Cord, we all share this passion for an efficient engineering workflow. But it’s inconsistent across the industry. GitHub is the clear winner and some people are happy with Pull Requests. Those of us who have come from a different background, Facebook in particular, we know a workflow called the Stacked Diff workflow based around a tool called Phabricator– but there was another tool called Gerritt.
And in fact, the Linux kernel project also works this way– that’s just completely different to Pull Requests. I argue strongly that they are significantly more efficient than Pull Requests. They are also fundamentally incompatible with how GitHub does pull Requests, which has created a tension in the world.
If you are an engineer who wants this workflow, and you are an engineer who is forced to work on GitHub, because GitHub is the defacto standard, you’re kind of in a bad place. That is until just recently.
Sven, I would love for you to jump in now and talk a little bit about what you’ve built.
Sven: Ah, sure. I built a little command line tool that basically bridges those two worlds.
You have, like you were saying, the Stacked Diff workflow. And from my point of view, it’s just what I was used to. And, and it’s just a workflow that has very low friction. That makes things very simple. And that’s why I like it. We have this situation here at Cord that we did use Phabricator, which makes it very easy to run with that workflow.
But Phabricator announced that it wasn’t maintained any longer. It’s basically reaching end of life,
Jackson: Evan Priestley, why?! Why have you abandoned us?!
Sven: So we were looking for a replacement for Phabricator, and there are reasons not to pick the next obscure tool somewhere on the market, but go with, with a tool that everyone uses.
Uh, which has GitHub and many things about GitHub are actually very nice, but the workflow that we were used to just doesn’t work very well with GitHub because GitHub was designed with lots of branches and merging in mind. You might remember the Git Flow workflow, which was also a blog post some 10 years ago.
It’s got a lot of traction, which sort of like formalized a little bit, how you name your various branches and merges. And like, this is, this is so different from what we are used to, which is a lot simpler as a model. And yeah, I wanted definitely to keep this much simpler model. And in my head, I was thinking that what we are doing kind of maps to GitHub if we can only serve our code reviews to GitHub in the right way.
Jackson: I might jump in here for, for the lay person, for the person who’s not so deeply steeped in this. The super, super abbreviated version of Pull Request style code review –of standard, GitHub style code review. The way you do this is you have a local checkout and you’re about to do some work, so you make a feature branch and you push some commits onto that branch.
And then you push that branch to GitHub. And then in GitHub, you create a Pull Request. You get some Pull Request feedback, and then you make some more commits and you push those again to GitHub. But at some point you have satisfied the draconian demands of your code reviewers. And you are fine to make this thing part of the authoritative history of the code base. And in GitHub by default, you usually do this as a merge, specifically, a merge commit. And what that means is you now have a nonlinear history in your code base. You’ve got some commits that were part of the main branch, and then you had some number of commits that were diverging from the main branch.
And now those commits come back into the main branch altogether with a magically created, auto-generated commit called a merge commit. That merge commit creates a lot of problems for most of the rest of the Git tool chain beyond just this workflow. Maybe we’ll save advanced Git topics like that for a later conversations.
But to compare this with the standard Stacked Diff workflow– in the Stacked Diff workflow, you don’t ever even need to make an additional branch. Many people do. But the important part about this is that you don’t make a bunch of commits and merge them into the main branch. Rather, you make one commit and you keep refining that one commit until it is perfect.
And then you just gently set that commit on top of the main. So there’s never any merging. There’s never any divergent branches. There are no merge commits. This unlocks things like bisect and many other very, very nice things to have. Uh, rebasing. Many things. But again, I’m going to do my best, not to go down the rabbit hole of talking about all of the advanced Git topics.
Sven: Yeah. I’ve equally strong opinions here, but one thing that you maybe can imagine easily is if you compare a history of your repository for your project or your, your workplace. And you imagine with like lots of branching points and, and places where those branches get United together. So it’s just like the, the London tube map, but that’s your history.
And what we are talking about is like you, you produce a history where, where you only just ever add the next step to your strictly linear, sequential history of previous commits. I mean, it’s much easier just to think about your history when, you know, like we deployed changes to our server or we shipped releases of our product, but at any one time there was only one place in this history.
You don’t have to worry about like where this time we were here in this branch and then that branch and this got merged in later, and this got merged in sooner. And so I was like, no, no, everything hits your official version of your software at a defined point in time and purely sequentially. It just so much easier to deal with that situation later on right.
Jackson: A hundred percent. So for the listeners at home who are wondering about this, I was at Facebook early enough that they weren’t even using it yet. It was not a consistently adopted tool. There were still people doing SVN. The important detail though, was the linear history concept. For the first time I got to watch a team of hundreds of engineers working extremely efficiently.
The code reviews were faster. The time that new code was out away from the main branch in danger of causing conflicts– it was tiny. It was minutes in many cases. I’m going to resist the temptation to try to convert the unconverted. What I do think it’s really important for us to do though, is for the folks out there who are Stacked Pull Request curious or Stacked Pull Request converted, but frustrated that there is no good tool for this. I think this is really for you.
Sven, like, you’ve built a thing here that we’re pretty sure actually solves this problem better than anything else that has been offered so far. What is it called and what does it do?
Sven: It’s called SPR, but we decided that it’s pronounced “super.” Uh, SPR, obviously stands for Stacked Pull Requests. So it’s, it’s not a very creative name here.
But it’s a command line tool that helps you submit Pull Requests to GitHub that look good on GitHub, but that don’t force you into the GitHub workflow.
So, locally, in your local repository, you do development with a Stacked Diff workflow like you’re used to, if you did that before. You keep doing that. And if you use SPR, people will be happy to review your code on GitHub and you can get it merged and basically use GitHub as your code review tool and as a central repository, but keep your old workflow.
Jackson: This sounds impossibly magical. You’re telling me that as an ex-Facebook engineer – or as any engineer who’s used Phabricator before – that you’ve just created a command line tool that makes it so that I can do like
If for those of you who know what
arc is, you’re saying there’s a tool that does that, but that for the folks who are using only GitHub, it feels like the right thing to them, too?
Sven: That’s exactly the idea. It’s heavily inspired by Phabricator and
arc so much so that the commands have the same names. So what used to be
arc diff is now
spr diff – “S” “P” “R” diff. What used to be
arc land is now
spr land. And basically that’s, that’s almost all you need to know if you’re familiar with the
arc command and that workflow.
Jackson: I want to believe in this, Sven. You’re playing with my heart strings here. How is this possible though? Because
arc and the Phabricator style workflow was patch based and GitHub is just fundamentally not compatible with that because all the changes they are represented as commits. So how does SPR do this?
Sven: So yes, the local repository that you have in which you create a commit, when you start working on a unit of work and you amend the commit when you have changes, maybe because there were comments on code review already. Occasionally, you might have to rebase your commit on newest master to pick up the changes in there, that doesn’t play well with GitHub, right?
If you, if you were to submit that branch with your one commit on and you just keep resubmitting it to GitHub, you would have to force push it to GitHub. That’s something that GitHub doesn’t handle very well. Like if, if there were comments on your code review and then you do this force push to update it, then it’s kind of hard to find those comments back.
You don’t have this proper conversation on your code, in the code review and so on. So we can’t do that. So we have to give GitHub what GitHub wants, which is: a branch per pull request and in this branch we only ever add commits. And those added commits are like the updated versions of the Pull Request. So what the command line tool does is basically create the commits that make up that branch that we send to GitHub.
And we tell it GitHub to use for the Pull Request. But that branch, we never really show you in your local repository. So from, from your point of view, it’s like a remote branch on GitHub that you’re not really interested in.
Basically it’s just a translator. It looks at your local branch where the commit keeps changing and it produces a different branch where it instead adds the new versions on top of it.
And that way GitHub is pretty happy showing you the Pull Request and giving you this normal code review experience of this ongoing conversation while you can incorporate changes, et cetera.
Jackson: This is the point where as a senior engineer, as a grumpy, I’ve-seen-every-type-of-workflow-that-there-is, you-can’t-do-magic engineer.
That part of me wants to kick in here and say, “nah, man, I called BS.” But I can’t actually… because when Phabricator announced that it was shutting down, we moved to GitHub and we looked for a tool to solve this. We spoke with the folks at Graphite early on. I was in chats with them, right at the beginning of their, sort of, operations. Back before they’d even done the pivot. Way before the funding round.
And they looked like they were going to build exactly the tool that we needed. So we adopted it. We actually moved to GitHub and to Graphite. But it wasn’t really the full blown solution. They hadn’t quite gone as far as you’ve gone, Sven. Like they have, as far as I know, the last I checked in there, they still rely on force push for it to do the thing that it needs to do.
And what’s interesting is you worked on super, SPR. You worked on it for a while and it has gone through some internal evolutions. The fascinating thing that I’m watching inside of my own company is some people on the team are just using GitHub, the UI. Some people on the team are using Graphite.dev to do their code review and their command line tooling.
And some people are using SPR. And… nothing has burst into flames. No one has flipped the table over. There’s nothing being lost. All of these tools are shockingly interoperable. I can only assume that you have sold your soul to Linus Torvalds. That’s the only way this could have been possible.
Sven: Uh, not that I know. Although I am also the only Linux user in the company, so maybe I have…
Um, yeah, the idea here is SPR, super. It doesn’t lock you in as a team. It’s not a thing that everyone has to use in order to be able to work together. It’s really just like your personal choice as a team you use GitHub and that’s where you do code reviews. But as a single developer, you can choose to use super to basically prepare your Pull Requests to create your Pull Requests.
Jackson: A burning question in my mind is if I’m the one engineer at my company, who really wants a more efficient workflow, but everyone else is doing vanilla, GitHub pull requests, does super serve me?
Sven: At the end of the day, it’s a GitHub Pull Request, but the way you produce it is not by following GitHub’s workflow, but by doing your own private workflow and using the command line tool, which basically does the translation.
If you use the advanced features, like when you start actually stacking your Pull Requests, when you have like related work that you want to put into separate code reviews, then there’s a little bit about the branches that can exist on GitHub. As in, basically it boils down to you, you shouldn’t use the GitHub UI to merge your Pull Request, or you should be careful about it.
If that’s something you absolutely cannot live with then that’s probably the one deal breaker for you. Other than that, I’m not really aware of anything that, that would be an obstacle.
Jackson: I feel like there are people who are listening to this right now, going… Well, they probably just want to know where to get this… like right now they’re probably like, yes, thank goodness someone has done this. This is the thing that I wanted. Where, where can they find it?
Sven: So that’s super easy. It’s obviously on, on GitHub. It’s GitHub.com/getcord/spr. It’s open source, so you can download it. Use it. It’s written in Rust. You can build it from source if you have a Rust tool chain, but you can also do, um, a much quicker install.
If you’re on Macintosh, you can just use Home Brew. So it’s
brew install getcord/tap/spr. And you’re good. You have the SPR tool on your computer then.
Jackson: Another question in my mind about this is Graphite, for instance, their command line tools are good. But they also push you to use their code review interface. Which, fair enough to them.
I mean, GitHub does not reach the pinnacle of code review by any stretch. So they’ve done a better job. They deserve to be paid for their time and their, their energy in doing that and the service they offer. Does super require any sort of running of a server or subscriptions or anything?
Sven: No, it’s purely a command line tool. It doesn’t talk to us at all. There’s no server side at Cord that keeps track of your Pull Requests and how you stack your diffs or, or anything like this. It’s just a purely locally executing tool.
Jackson: Okay. Okay. So let, let me put on my skeptic hat a little bit more. You’re saying to me, that my local files on my machine. I’m going to run some set of commands that are going to do Git things. And it’s going to be really good. What if I don’t trust you? What if, what if I’m like, no, man, like don’t mangle my code base. I don’t want you to make weird commits. I don’t want you to like mess up my indexes and make magic branches. Like what do you say to that person?
Sven: So if you don’t trust me at all, you will have to read the source code. Which you can. It’s on GitHub, right? But what I, what I can say is like what the tool does, is it, it never touches your your local checkout files, right? Your source code files. It’s not interested in them.
It actually does very low level operations in the Git repository. It would be difficult to basically do the same with Git command line tools. We actually using libgit2, which is like the, the low level library implementation of Git. What we can do with that is surgically construct commits. And as you know, with Git, me constructing a new commit with my tool doesn’t destroy your existing commits. It doesn’t change anything. Right? So I’m just constructing some commits that have just the right contents for what we have to ship to GitHub with parent commits chosen to make the connection to the branches, et cetera, just as it has to be.
But it’s, those are all completely separate from your normal history and so on. I don’t, I don’t change that. Those are the commits that we then push to GitHub. On GitHub’s side, they have some branch name. But like I say, you don’t really have to worry about that because it doesn’t mutate the stuff that you work with.
The only thing, to be like completely honest, is when you use
spr diff, we change the commit message of your commit very slightly in the same way as Phabricator did, which is we just add a line with a URL to the Pull Request. And that way, the next time you run the tool, it knows that this is a commit that has been submitted as a Pull Request already.
So we’re updating an existing Pull Request. So other than adding that line and formatting your commit message a little bit to make it neat, it doesn’t change anything in your repository.
Jackson: That makes me feel super, super relieved.
So, I guess the question is if someone wants to give you a Pull Request, is that a thing that they can do? Like say they want to expand the API or fix a bug? You know, maybe they want to support GitLabs instead. Is that something that they can do?
Sven: Absolutely. It’s, it’s open source. It’s on GitHub. You can, you can open issues if you come across problems. If you even have a fix for them, I would be delighted to see a pull request that adds some improvements to it. Absolutely.
Jackson: You speak of improvements. Now I must ask you: what else do you want to add to SPR that’s not there right now?
Sven: There were a couple of nice things that
arc did other than submitting your code for… what did they call it… for a diff in Phabricator? Or revision? Revision was the correct term. Everyone called it a diff. Anyway,
arc also could be configured to run the linter or to run unit tests and that kind of thing. That was actually quite neat that you could easily make that part of your workflow, that when you’re at the point where you show your work to other people to review that the linter would run once. Definitely. It’s probably not a big deal for us because by now we all have IDEs that run the linter constantly all the time anyway, but I could imagine that could be quite useful for some people.
Jackson: So at this point, I predict that we have three types of listeners. The first type of listener has no idea what we’re talking about, and this is way too involved. We’ve gone down a rabbit hole of productivity and engineering workflow they’re just not interested in.
Then there’s the second type of person who has heard this and thinks, “Yeah, I want to give that a try.” If you are that type of listener, I hope that this satisfies you and that you go and try SPR and let us know what you think.
And then I predict that there is a third type. This is the type of engineer who knows enough about Git and Git internals to say, “but how did you do that? What does it actually do?” If you are listening and thinking that I think probably we should just do a up conversation with them. Sven, would you be up for that?
Sven: Absolutely. Absolutely. I mean, the, the big picture is what I said earlier. We construct commits just so that they look right to GitHub. If we want to go into detail, it’s actually quite a lengthy thing because there are so many sort of edge cases or different sequences of operations. Like
you do rebase, you do an amend in what order. You reorder your commits on your branch and all, all these things. So there’s many different cases that have to be sort of covered if we want to discuss all of that and sort of like give you the technical documentation, like in spoken word here, that probably has to go into a separate episode altogether.
Jackson: If you’re interested in SPR and you want to give it a try, check us out. It’s github.com/getcord/spr.
Sven: I really want to recommend reading Jackson’s blog posts that he mentioned in the beginning as well. So I stumbled upon that when I started a new job, one that I kept for about four weeks because I ran away screaming.
But part of the frustration was the, the workflow that we had there, which was classic GitHub and Git Flow, which is very complicated. I was wondering… it’s like, this does the whole world work like that? And I just typed in like a few words into Google. And the first hit that I got was your blog post, which is funny because that was after we worked together at Facebook, so quite a random thing.
So really, I recommend people to read that if you’re, if they’re interested in this kind of discussion of different workflows and especially if you, if you feel that what you’re doing is not 100% efficient. Like if you feel that there’s still a bit of friction in, in dealing with like code review and organizing your own work.
Like you’re working on three things at once and you want to have them all reviewed, but you know that as the review goes on, you will have to make changes. And the three things kind of all depend on each other, and so on. If in those situations you feel like it’s all too complicated and takes too much of your time then, yeah, read that blog posts because that will be enlightening. I think. And then maybe you also interested in that in our command line tool, because that is just the accompanying bit of software now that makes it really easy tool to use on GitHub
Jackson: Sven, thank you so much for your time. Thank you so much for building SPR and bringing it to the world. If you have questions or if you want to reach out, uh, I’m Jackson, the CTO of Cord, you can find me at email@example.com or fb.me/jg.
Sven, how can they reach you?
Just an email is fine. I’m firstname.lastname@example.org. Sven. That’s S V E N – not a, not a very common name here, but yeah, drop me an email
And it’s with a D. Uh, you’ve got a very hard German and DT sound: “cort!”
Sven: Cord. Yeah, exactly. C O R D. Uh, one thing that we might link to very soon as, um, we had the idea that we produce a couple of tutorial videos. That should make it super easy. I guess it might become all a bit clearer anyway, when you just see it being done on the command line. So we haven’t done those yet, but we totally promise that we will produce them and then there’ll be links.
Jackson: Pinky swear. We’ll do it. Thank you all for listening and look for a follow-up conversation about the deep dark internals of SPR in the near future.