M Montpas on Jun 25, 2020

Thinking Code Reviews Differently

Reviewing changes between two versions of a same file is something every software engineer is brought to do on a daily basis. diff has been a tool-of-choice for this task for decades now, it actually does its job so well we even built entire version control systems around it Let’s take a look at this simple commit from the Rails project:

diff --git a/activerecord/lib/active_record/connection_adapters/mysql/schema_statements.rb b/activerecord/lib/active_record/connection_adapters/mysql/schema_statements.rb
index af5683d23f02..0f2e206eddba 100644
--- a/activerecord/lib/active_record/connection_adapters/mysql/schema_statements.rb
+++ b/activerecord/lib/active_record/connection_adapters/mysql/schema_statements.rb
@@ -203,10 +203,14 @@ def add_options_for_index_columns(quoted_columns, **options)
           def data_source_sql(name = nil, type: nil)
             scope = quoted_scope(name, type: type)
 
-            sql = +"SELECT table_name FROM information_schema.tables"
-            sql << " WHERE table_schema = #{scope[:schema]}"
-            sql << " AND table_name = #{scope[:name]}" if scope[:name]
-            sql << " AND table_type = #{scope[:type]}" if scope[:type]
+            sql = +"SELECT table_name FROM (SELECT * FROM information_schema.tables "
+            sql << " WHERE table_schema = #{scope[:schema]}) _subquery"
+            if scope[:type] || scope[:name]
+              conditions = []
+              conditions << "_subquery.table_type = #{scope[:type]}" if scope[:type]
+              conditions << "_subquery.table_name = #{scope[:name]}" if scope[:name]
+              sql << " WHERE #{conditions.join(" AND ")}"
+            end
             sql
           end

Understanding what is going on here is relatively easy:

  • We know the files affected by this change.
  • We know which lines are added and removed to produce the updated file.
  • We have some basic information about the function/method affected by this change.

There are also several unknowns, which we could figure out by further inspecting the project’s source code at this specific revision:

  • Are these functions and methods contained in a class? A module? Multiple imbricated modules?
  • Is this functionality called often?
  • Do other projects use this code?
  • How important was this change, all things considered?

These are some of the fundamental questions reviewers need answers to when evaluating the importance of a given commit for all stakeholders.

Context Is Everything

It quickly becomes obvious that to get a sense of the impact a specific change brings to a code base, context is what we need. Unfortunately, this is also something diff wasn’t designed to help us with: its job is to find the most efficient way to go from file A to file B.

At Sasca, we are working to build a platform that enable teams to answer all of these questions, and more!

Are These Methods Contained In A Class? A Module? Multiple Imbricated Modules?

This kind of structural detail is easy to find manually. Simply opening the file that was patched and checking for yourself would work. There is, however, at least one issue with this approach: it doesn’t scale. Manually investigating a lot of commits is time consuming.

Not only that, but the format in which popular platforms like GitHub provide the information isn’t optimized for large code reviews. A good case in point: Often, the first thing a reviewer will get to see is the list of files that were changed, added and removed. While not useless at all, it can and does become noisy when, for example, dealing with minor syntax changes in configuration files.

We believe that the order in which information is presented to the user should be the other way around. The first thing a reviewer should see is what component was changed, identified by its Symbolic Fully Qualified Name (SFQN), instead of the file where it’s contained.

Is this functionality called often? Do other projects use this code?

Once we have a SFQN to identify something that changed, finding how it is used across this project and others is fairly straightforward using our in-house static code analysis tools. Yet, this piece of data also has a lot of value since it enables us to start thinking about measuring less intuitive concepts like the impact of a given change on an ecosystem.

e.g. Measuring how many projects are affected by a bug in <insert trendy JavaScript package name> because they use that functionality.

Having this data in hand allows for better prioritization of what changes should require more attention than others.

How important was this change, all things considered?

Having a component used by millions of other projects doesn’t necessarily indicate it’s always critical for the applications using it - even less that organizations rely on them on business critical systems! Gathering all the historical information we have on a project and sprinkling some machine learning on top of it will enable us to provide even more context on the moving parts of your projects.

Stay Updated

We aim to change the way people think about code reviews by making the whole process more intuitive.

Subscribe to our newsletter to receive the latest updates on this project and eventually get early-access invites.