Majd's blog

# The History of Version Control

Version Control Systems (VCS) provide a way for software developers to manage the source code and keep track of versions during the develop- ment of the project. VCS often come into play when multiple developers are working together on the same project because they support a collab- orative framework that helps with managing big projects. Maintaining large codebases without VCS effectively among thousands of contributors would be a very difficult if not a nearly impossible task.

### History of VCS

It started with engineering

It may seem like Version Control was invented by the computer industry, but in fact, the computer industry only adopted practices that were already established. Industrial manufacturers for complex machines such as cars or aircrafts needed a way to version each iteration of the design. You don’t start sketching a design for sports cars and start producing the car as soon as you’re done with it. Those manufacturers developed an engineering process where they can define the scope and context of the work to be and have multiple departments and teams working on delivering independent work. Throughout this independent work, different teams would go through multiple revisions of the product and give each revision a version name.

Software followed after

When programs started being created in a digital form, those processes that were carried out by the different offices started getting automated. Thanks to Source Code Control Systems (SCCS), records of the whole program would be kept for every change cycle that is concluded. In the 1980s, a so-called mechanism of delta-storage started getting mainstream with the introduction of Revision Control System (RCS). Formally, we define delta compression as having two strings (files): $$f_{tar} \in \sum^*$$ (the target file) and $$f_{ref} \in \sum^*$$ (the reference file). Then the goal for an encoder $$E$$ with access to both $$f_{tar}$$ and $$f_{ref}$$ is to construct a file $$f_\delta \in \sum^*$$ of minimum size, such that a decoder $$D$$ can reconstruct $$f_{tar}$$ from $$f_{ref}$$ and $$f_\delta$$. We also refer to $$f_\delta$$ as the delta of $$f_{tar}$$ and $$f_{ref}$$. During this phase, the whole process was still dependent on the idea of having just one file as the source of the work, just like a drawing. The idea of using multiple files to separate the work while also having multiple programmers working on the same thing started taking off with the introduction of Concurrent Versions Systems (CVS). However, CVS were not reliable enough until Subversion was introduced to the mix.

### Generations of VCS

We can divide the history of version control into three generations.

Generation Networking Operations Concurrency Tools
I None Single-file Locks RCS, SCCS
II Centralized Multi-file Merge before commit CVS, SVN
III Distributed Changesets Commit before merge Git, Mercurial

#### Generation I

In this generation, development was being handled solely with locks. Only one person could be working on a file a time, networking was also not supported. Operations could only apply to one file at a time. Some examples for tools used in this generation are RCS and SCCS.

Source Code Control System

The SCCS system was initially designed for IBM 370 and PDP 11. Those systems were time-sharing systems which meant that the clients execute using the same processor. Because of that, there was no need for a networking imple- mentation in the design of SCCS. The system stood out because of the ability to create and manage multiple versions of histories automatically; however, the versioning was limited to individual artifacts and not the complete repository. SCCS also used interleaved deltas technique to store all revisions of a file in a way that makes every revision accessible with the same effort, which was very space optimal. A big downside for SCCS is that it did not feature automatic merging and conflict resolution.

Revision Control System

The first version was released in 1982 by Walter F. Tichy at Purdue Uni- versity. It is a text-based system that is very similar in its characteristics to SCCS and is easy to understand and use. RCS introduced genuine branches, which allowed the users to work on multiple development lines in parallel. Those development lines could be merged later on in an automated process. RCS used delta encoding for version storage which made it faster than SCCS when retriev- ing close ancestors but slower for retrieving earlier ones. RCS also lacked net- working and concurrency support. This system also had no concept of projects or trees: everything is simply a file. Relationships between files could not be tracked.

#### Generation II

This is the generation that spawned the concept of simultaneous modifications; however, with one restriction: Users must merge the current revisions into their work before they are allowed to commit. The systems in this generation were strongly centralized and they incorporated networking and multi-file operations. A big change from the previous generation is that the versioning now applies to the project as a whole instead of applying to the individual artifacts. Some of the most popular tools of this generation are CVS and SVN.

Concurrent Versions System

CVS for a start was built on top of RCS as a foundation. The goal of this system was to extend the older systems such as RCS and SCCS to allow con- current modifications of artifacts. CVS allowed automatic merging of changes in artifacts that contain no conflicts, while the artifacts that contain conflicts are marked for manual review and resolution. With CVS adding network sup- port, developers would check out a version of the source code over the network and would commit the changes back after the modifications have been made. This hugely encouraged parallel development. The versioning system that was introduced by CVS also supported tags, so that when a user attempts to push committed changes to the repository which has already changes made by a peer developer, the user would be asked to merge the latest changes into the current copy of the repository before the new changes could apply. The same automatic merging concept applies except for when there are conflicts, the conflicting arti- facts would need to be reviewed manually. Besides, CVS needed manual input to recognize which files are binary and which are text files. CVS was also not capable of tracking file renames. CVS still did not track the repository as a whole, which ignored the dependency of certain artifacts.

Subversion

SVN was born as an attempt to improve on the shortcomings of CVS. It shared the same client-server architecture with CVS with some noticeable differences: The commits would apply on a directory level rather than on a file level, which would keep the commits atomic, in contrast to CVS. The system also tracked copies of files and file renames. Branching in SVN was also unique, a shallow copy of the target version would be created in the root directory. A rollback of commits was not allowed in SVN, a user would need to create a new copy of the targeted repository state to the end of the trunk to overwrite a bad commit; the bad commit would still exist int he history of the repository.

#### Generation III

In previous generations, the merge and commit actions were coupled together. With generation III, those 2 actions are finally separated into two different steps. This generation spawned systems with distributed networking, as in a copy of the entire codebase and the complete history would be mirrored on each developer’s computer. This provided multiple advantages over centralized systems. Performing actions was very fast since a copy of the entire codebase existed on the local machine. Committing changes to the codebase could be done locally without updating the source, which allowed grouping of changes together. No network connection is required besides the initial ”pull” from the source repository and the final ”push” back to the source repository. Because of these additions, changes could be shared with only select developers to increase effectivity and consistency before introducing the changes to the main codebase. The most popular tools that were created are Git and Mercurial.

Git

Git is one of the biggest decentralized version control systems. Big projects such as Linux kernel use Git for version control. Git took an- other shot about optimizing the repository branching workflow where instead of creating a copy of the entire history when creating a new branch, the branching mechanism work by referencing the start of the branch to a point in a parent branch. Git is also very flexible in managing relationships between commits and it can be modified at will using the ”rebase” operation. Operation ”amend” can be used to modify a specific commit. Git introduced a concept called ”stag- ing area” which provides more flexibility when creating commits from a pool of changes. Other concepts such as ”stashing”, allows saving the work in a tem- porary branch. The ”cherry-picking” feature allows extracting certain commits from the branch. Operation ”bisect” can be used to check against a series of commits to find which one introduced a change in behavior.

Mercurial

This version control system is similar to Git in its nature but is based on an operation based history model. While Git represents commits as snapshots, Mercurial represents them as diffs. In essence, Mercurial has the philosophy of ”history is permanent and sacred”. The most extreme thing you can do to the history with Mercurial is going back and undoing the last commit or pull. Git, on the other hand, allows you to freely rewrite history. Mercurial is extended heavily by its extensions. For instance, even though, that developers might not be able to rewrite history solely using Mercurial, extensions such as collapse, histedit and MQ can be used to perform those tasks. Extensions such as ACL allow administrators to set up access control on specific parts of the repository.

### Conclusion

Looking back at the evolution of VCS, it is apparent that the first generation solved the needs of developers at the time but it was very lacking and limited in terms of features and functionalities. The second generation tackled the biggest shortcomings of the first generation, lack of networking and concurrency. By having the repository stored in a central location, developers would be able to check out a version and check in changes when needed. The drawback is that network connectivity was a requirement. With the third generation of VCS, the systems moved from a centralized networking model to a distributed one. The action of making changes and merging them to the main codebase was separated. Developers started having more flexibility, as making a copy of the entire codebase eliminated the need for an all-time network connection.

Dispute the introduction of distributed version control systems, some developers and system administrators may still decide to use a centralized version control system. With centralized VCS, access control is easier to set up because ev- erything will be controlled from a single source (the server). Besides, if the project has a very long and big history, it may take very long to make a copy of the repository with DVCS. The CVCS are more suitable in this case because developers just need to get the few lines of code that they are working with, instead of making a full copy of the history.

It may be easy to conclude that with the introduction of DVCS, developers can not have a central place for their codebase. That is not the case, because DVCS simply provide more freedom to do the work developers desire. It could be as simple as having a setup where the code is always pulled from repository A and pushed back to the same repository A when the changes are to be merged.

Whether a developer or an organization chooses to use a DVCS or a CVSS, it boils down to the nature of the project and the goals of administrators. In con- clusion, version control has now been an essential part of software development for a long time. It has revolutionized team-working and the next generation will likely improve on the current generation as new DevOps methodologies and techniques are being created.

###### References
• Suel, Torsten. ”Delta Compression Techniques”, Overview. Department of Computer Science and Engineering, Tandon School of Engineering, New York.
• Raymond, Eric. ”Understanding Version-Control Systems”, A Brief History of Version Control.
• Marc J. Rochkind. ”The source code control system”. IEEE Trans, Software Eng.
• Tichy, Walter F. ”Design, Implementation, and Evaluation of a Revision Control System”, Proceedings of the 6th International Conference on Soft- ware Engineering.
• Tichy, Walter F. ”Rcs: a system for version control”. Software: Practice & Experience, 1985.
• Grune, Dick. ”Concurrent versions system, a method for independent coop- eration”. Technical report, IR 113, Vrije Universiteit, 1986.
• C. Michael Pilato, Ben Collins-Sussman, and Brian W. Fitzpatrick. ”Version Control With Subversion”. O’Reilly & Associates, Inc., Sebastopol, CA, USA, 2 edition, 2008.
• T. Swicegood. ”Pragmatic version control using Git”. Pragmatic Bookshelf, 2008.
• B. O’Sullivan. ”Distributed revision control with mercurial”. Mercurial project, 2007.
• Torvalds, Linus. Linux Kernel Repository.