Parsing VS Solution files with Sprache
Anyone who has worked with Visual Studio and more than one branch of active development has been bitten by the friction of merging solution files.
The format of solution files is a bit esoteric, with lots of key=value pairs and guids representing projects, project types, and build configurations. Merging individual lines often requires so much effort that most devs I’ve spoken wtih recommend picking one branch or the other and manually re-constructing the other branch’s changes. Needless to say this work is fidly and error-prone.
Somewhere I got it into my head that building a tool to help this would be a good idea, and apparently I’m not alone in that thought. When sitting down to write this post, I discovered the SLNTools project on codeplex; it looks interesting and I’ll have to try it out.
In the meantime, however, I wanted to share with you a bit about my experience building a parser for solution files using the Sprache library developed by Nicholas Blumhardt.
Some background
The idea behind Sprache is to fill the void between Regular Expression for parsing simple things and using full-blown DSL or parsing toolkits to describe complete grammars on the other.
Sprache takes a functional or parser combinator [PDF] approach; the library provides a set of .NET classes and extension methods that handle low-level parsing jobs like consuming input one character at a time and defining sequence and look-ahead expectations. These tools are exposed as a collection of Parser<T>
constructs (where T
is the type returned by a successful parse) that can be composed in a declarative style using LINQ query comprehension syntax.
The resulting parser can be very expressive, readable, and testable.
I’ll share some examples as I build up my SolutionFileGrammar
below, but here’s a taste to whet your whistle:
1
2
3
4
5
public static readonly Parser<SolutionFile> Solution =
from header in Header
from projects in Project.Many().Optional()
from globals in Global
select new SolutionFile(header, projects, globals);
About Solution Files
In order to define a parser for the Solution File format we have to understand how it is structured.
The best description of solution file syntax I have found is this excerpt: Hack the Project and Solution Files.
Let’s take a look at a empty VS2012 solution file:
1
2
3
4
5
6
7
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012
Global
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
And now one with a single project and it’s default build configurations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "HttpWebAdapters", "HttpWebAdapters\HttpWebAdapters.csproj", "{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.Build.0 = Debug|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.ActiveCfg = Release|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
And finally, one with Nuget Package Restore enabled, which adds a solution folder and some freestanding solution items:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "HttpWebAdapters", "HttpWebAdapters\HttpWebAdapters.csproj", "{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = ".nuget", ".nuget", "{8374A24A-6031-48CB-8B66-A2B510FA251F}"
ProjectSection(SolutionItems) = preProject
.nuget\NuGet.Config = .nuget\NuGet.Config
.nuget\NuGet.exe = .nuget\NuGet.exe
.nuget\NuGet.targets = .nuget\NuGet.targets
EndProjectSection
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.Build.0 = Debug|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.ActiveCfg = Release|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
And now, the parser
In order to parse this with Sprache we need to decompose it into smaller and smaller pieces, build parsers for each of those pieces, then assemble those pieces into a grammar.
We experess that Grammar with a series of static methods that can be built-up test-first by starting with one of the smallest or inner-most pieces of nested content.
For eample, let’s take the header first:
1
2
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012
The relevant pieces of information are:
- a version declaration; and
- a product description
Taking that first part, the version declaration, we can focus smaller still on just the version number itself:
1
2
3
4
5
6
7
8
9
10
[Test]
public void SolutionVersionNumber_is_Number_period_Number()
{
var input = @"12.00";
var result = SolutionFileGrammar.SolutionVersionNumber.Parse(input);
Assert.AreEqual(12, result.Major);
Assert.AreEqual(00, result.Minor);
}
And write a parser for it:
1
2
3
4
5
6
7
public static readonly Parser<SolutionVersionNumber> SolutionVersionNumber =
from rawMajor in Parse.Number.Token()
from period in Parse.Char('.')
from rawMinor in Parse.Number.Token()
let major = int.Parse(rawMajor)
let minor = int.Parse(rawMinor)
select new SolutionVersionNumber(major, minor);
There are a number of things going on here so let’s break it down.
First off we declare a Parser<SolutionVersionNumber>
that will return our parsed result and then declare that a SolutionVersionNumber
is composed of:
- A number…
- …followed by a period…
- …followed by a number
- then I use the
let
keyword to transform that parsed text intointeger
s
- and finally the
select
keyword to create a new instance of my result:
Where a SolutionFileVersionNumber
is part of our data model:
1
2
3
4
5
6
7
8
9
10
11
public class SolutionVersionNumber
{
public SolutionVersionNumber(int major, int minor)
{
Major = major;
Minor = minor;
}
public int Major { get; private set; }
public int Minor { get; private set; }
}
then came the product name
Now let’s focus on the second piece that we want to parse, the product name:
1
# Visual Studio 2012
with a test:
1
2
3
4
5
6
7
8
9
[Test]
public void ProductName_is_pound_followed_by_text()
{
var input = @"# Visual Studio 2012";
var result = SolutionFileGrammar.ProductName.Parse(input);
Assert.AreEqual("Visual Studio 2012", result);
}
and a parser:
1
2
3
4
public static readonly Parser<string> ProductName =
from pound in Parse.Char('#').Token()
from name in Parse.AnyChar.Until(NewLine.Or(Eof)).Text()
select name;
and now the fun begins
Now that we have a simple, tested parser for the version number and another one for the product name, let’s write a test and parser for the whole header:
Given:
1
2
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012
here’s the test:
1
2
3
4
5
6
7
8
9
10
11
12
13
[Test]
public void Header_contains_version_information()
{
var input =
@"Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012";
var result = SolutionFileGrammar.Header.Parse(input);
Assert.AreEqual("Visual Studio 2012", result.ProductName);
Assert.AreEqual(12, result.MajorVersion);
Assert.AreEqual(00, result.MinorVersion);
}
the parser:
1
2
3
4
5
public static readonly Parser<SolutionFileHeader> Header =
from ignore1 in Parse.String("Microsoft Visual Studio Solution File, Format Version").Token()
from version in SolutionVersionNumber
from name in ProductName
select new SolutionFileHeader(version, name);
and the object model:
1
2
3
4
5
6
7
8
9
10
11
12
13
public class SolutionFileHeader
{
public SolutionFileHeader(SolutionVersionNumber version, string productName)
{
MajorVersion = version.Major;
MinorVersion = version.Minor;
ProductName = productName;
}
public int MajorVersion { get; private set; }
public int MinorVersion { get; private set; }
public string ProductName { get; private set; }
}
in conclusion
With that I hope that you can see how expressive a Sprache-based grammar is. It takes a bit of discipline to start with an individual token and grow your grammar test-first, but as your library of tokens grows, the size of your parsers grows also, and before long the declarative and combinatorial nature of Sprache has you moving along at a fast clip.
Full source code for this article, and a more comprehensive (though not yet complete) grammar, is available on github: https://github.com/davidalpert/viper