More Sprache goodness
Following my experiment to write a parser for Visual Studio solution files using the Sprache library, I’d like to share a few Sprache techniques that I found useful.
Parse a token into an enum value
The Visual Studio solution file format includes a set of Project
definitions, each with one or more ProjectSection
definitions, as well as a collection of GlobalSection
definitions.
1
2
3
4
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
Each ProjectSection
and GlobalSection
contains a token (preSolution
in this case) that instructs Visual Studio when it’s contents are required during the process of opening a solution file:
To represent this in my data model I created the following enum
to express these loading sequence tokens:
1
2
3
4
5
6
7
8
public enum SectionLoadSequence
{
Unrecognized,
PreSolution,
PostSolution,
PreProject,
PostProject
}
The Pre- and Post- nodes should be self-explanitory, but I want to temporarily call attention to the Unrecognized
value.
I have found it useful to introduce in my grammars the concept of an unrecognized section which, while it may not survive into the final draft of a particular grammer, has allowed my parsers to handle structured content for which I have not yet written a detailed parser. In short, it has helped me during development, and whether it survives into the final draft becomes a question of how you want your parser to respond to input that is either not well-formed, or for which the format has changed.
But I digress.
Two of the fun constructs that ship with the Sprache library are the concept of Or
, which lets you link alternative elements together, and Return
, which lets you substitute anything you want in place of the parsed input.
Using these constructs, it becomes straightforward to write a parser for a range of enum values:
1
2
3
4
5
6
7
public static readonly Parser<SectionLoadSequence> LoadSequence =
from sequence in Parse.String("preSolution").Token().Return(SectionLoadSequence.PreSolution)
.Or(Parse.String("postSolution").Token().Return(SectionLoadSequence.PostSolution))
.Or(Parse.String("preProject").Token().Return(SectionLoadSequence.PreProject))
.Or(Parse.String("postProject").Token().Return(SectionLoadSequence.PostProject))
.Or(Parse.Return(SectionLoadSequence.Unrecognized))
select sequence;
Very expressive.
Parse unique inner content based on an opening token
Another useful construct that ships with Sprache is Then
, which lets you determine which expectation follows based on the contents matched in the previous expression.
This is useful in the case of solution files because there are different types of GlobalSection
, each with their own inner format.
Take, for example, the difference between a SolutionProperties
global section:
1
2
3
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
and a ProjectConfigurationPlatforms
global section:
1
2
3
4
5
6
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.Build.0 = Debug|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.ActiveCfg = Release|Any CPU
{AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
Parsing (and eventually visualizing or manipulating) this last section’s inner content was my whole motivation for parsing solution files in the first place, but looking at the two together it is clear that the parser required to extract relevant details from the ProjectConfigurationPlatforms
section would fail to parse a SolutionProperties
section.
Luckily, the Then
construct takes a lambda accepting the parsed content as an argument, so you can do something funky like this:
1
2
3
4
5
6
7
8
9
public static readonly Parser<SolutionFileGlobalSection> GlobalSection =
from start in Parse.String("GlobalSection").Token()
from section in RoundBracketedString.Then(s =>
s == "SolutionProperties" ? SolutionPropertiesGlobalSection
: s == "SolutionConfigurationPlatforms" ? SolutionConfigurationPlatformsGlobalSection
: s == "ProjectConfigurationPlatforms" ? ProjectConfigurationPlatformsGlobalSection
: UnrecognizedGlobalSection(s))
from end in Parse.String("EndGlobalSection").Token()
select section;
In between the GlobalSection
and EndGlobalSection
tags, we first parse the RoundBracketedString
that differentiates what kind of global section we’re dealing with. Then we accept the inner content of that round bracketed string and supply different parsers customized for the expected format of each type of section.
Notice again that I have created an UnrecognizedGlobalSection
that accepts the section type as an argument. This parser simply swallows everything until the next EndGlobalSection
tag, saving it for later use in a diagnostic message while allowing the parsing to continue without exception.
Tip of the iceberg
These two use cases represent just the tip of the iceberg in terms of the possibliies offered by Sprache.
I’m very excited by this library and look forward to using it to explore even more complex grammars.
What content have you had to parse that could benefit from this level of expressiveness?