Following my experiment to write a parser for Visual Studio solution files using the Sprache library, I’d like to share a few Sprache techniques that I found useful.

Parse a token into an enum value

The Visual Studio solution file format includes a set of Project definitions, each with one or more ProjectSection definitions, as well as a collection of GlobalSection definitions.

1
2
3
4
    GlobalSection(SolutionConfigurationPlatforms) = preSolution
        Debug|Any CPU = Debug|Any CPU
        Release|Any CPU = Release|Any CPU
    EndGlobalSection

Each ProjectSection and GlobalSection contains a token (preSolutionin this case) that instructs Visual Studio when it’s contents are required during the process of opening a solution file:

To represent this in my data model I created the following enum to express these loading sequence tokens:

1
2
3
4
5
6
7
8
    public enum SectionLoadSequence
    {
        Unrecognized,
        PreSolution,
        PostSolution,
        PreProject,
        PostProject
    }

The Pre- and Post- nodes should be self-explanitory, but I want to temporarily call attention to the Unrecognized value.

I have found it useful to introduce in my grammars the concept of an unrecognized section which, while it may not survive into the final draft of a particular grammer, has allowed my parsers to handle structured content for which I have not yet written a detailed parser. In short, it has helped me during development, and whether it survives into the final draft becomes a question of how you want your parser to respond to input that is either not well-formed, or for which the format has changed.

But I digress.

Two of the fun constructs that ship with the Sprache library are the concept of Or, which lets you link alternative elements together, and Return, which lets you substitute anything you want in place of the parsed input.

Using these constructs, it becomes straightforward to write a parser for a range of enum values:

1
2
3
4
5
6
7
    public static readonly Parser<SectionLoadSequence> LoadSequence =
        from sequence in Parse.String("preSolution").Token().Return(SectionLoadSequence.PreSolution)
                     .Or(Parse.String("postSolution").Token().Return(SectionLoadSequence.PostSolution))
                     .Or(Parse.String("preProject").Token().Return(SectionLoadSequence.PreProject))
                     .Or(Parse.String("postProject").Token().Return(SectionLoadSequence.PostProject))
                                                    .Or(Parse.Return(SectionLoadSequence.Unrecognized))
        select sequence;

Very expressive.

Parse unique inner content based on an opening token

Another useful construct that ships with Sprache is Then, which lets you determine which expectation follows based on the contents matched in the previous expression.

This is useful in the case of solution files because there are different types of GlobalSection, each with their own inner format.

Take, for example, the difference between a SolutionProperties global section:

1
2
3
    GlobalSection(SolutionProperties) = preSolution
        HideSolutionNode = FALSE
    EndGlobalSection

and a ProjectConfigurationPlatforms global section:

1
2
3
4
5
6
    GlobalSection(ProjectConfigurationPlatforms) = postSolution
        {AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
        {AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Debug|Any CPU.Build.0 = Debug|Any CPU
        {AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.ActiveCfg = Release|Any CPU
        {AE7D2A46-3F67-4986-B04B-7DCE79A549A5}.Release|Any CPU.Build.0 = Release|Any CPU
    EndGlobalSection

Parsing (and eventually visualizing or manipulating) this last section’s inner content was my whole motivation for parsing solution files in the first place, but looking at the two together it is clear that the parser required to extract relevant details from the ProjectConfigurationPlatforms section would fail to parse a SolutionProperties section.

Luckily, the Then construct takes a lambda accepting the parsed content as an argument, so you can do something funky like this:

1
2
3
4
5
6
7
8
9
    public static readonly Parser<SolutionFileGlobalSection> GlobalSection =
        from start in Parse.String("GlobalSection").Token()
        from section in RoundBracketedString.Then(s => 
            s == "SolutionProperties" ? SolutionPropertiesGlobalSection
            : s == "SolutionConfigurationPlatforms" ? SolutionConfigurationPlatformsGlobalSection
            : s == "ProjectConfigurationPlatforms" ? ProjectConfigurationPlatformsGlobalSection
                                                   : UnrecognizedGlobalSection(s))
        from end in Parse.String("EndGlobalSection").Token()
        select section;

In between the GlobalSection and EndGlobalSection tags, we first parse the RoundBracketedString that differentiates what kind of global section we’re dealing with. Then we accept the inner content of that round bracketed string and supply different parsers customized for the expected format of each type of section.

Notice again that I have created an UnrecognizedGlobalSection that accepts the section type as an argument. This parser simply swallows everything until the next EndGlobalSection tag, saving it for later use in a diagnostic message while allowing the parsing to continue without exception.

Tip of the iceberg

These two use cases represent just the tip of the iceberg in terms of the possibliies offered by Sprache.

I’m very excited by this library and look forward to using it to explore even more complex grammars.

What content have you had to parse that could benefit from this level of expressiveness?