Spreadsheets to YAML – Zachary Yarnot Art Portfolio

One of the most important ways to learn why mistakes are made is by documenting them. Often, recurring treads show why an issue or mistake arises. However, large volumes of data aren’t considered human readable, at least for most people. While in the process of documenting numbers associated with the units on the campus that I work, I found issues with many of our badges and the software that creates them that needed to be documented. Reporting such findings though could be done with software like Microsoft Excel, but the person in charge of overseeing the software does not have a history in data or computer science. As my experience with The Wind Waker Randomizer dv_im reminded me, YAML allows for the construction, maintenance, and readability of data sets, at the small cost of physical space.

YAML is a data language built on many JSON and Python sensibilities. White space denotes relations between objects and their properties. Objects can be built within dictionaries or lists. These two properties are perfect for reporting types of issues as well as keeping things easy to understand.

A basic VBA YAML constructor wouldn’t be too difficult, but a more flexible solution would prove better in the long run. I had originally written this article just after building the more flexible solution, which actually became quite useful. I was only documenting issues that were presenting themselves on visitor badges and patient profiles, but it became clear we also needed to record when software misbehaved or crashed. Having the solution already accommodate new object types was helpful.

Here’s how it was constructed. I knew I wanted two object types: patients and visitors. Each object type would have its own properties we needed to keep track of, as what issues we cared about were different between these two in our visitor management software. Having two different interpreters would be a lot of work for very little benefit. Instead, we know our data table exists in the UsedRange property of our Worksheet object. The top of each column, with the exception of the first, represents the properties of our YAML objects, such as Creator. Each row, with the exception of the first, represents an object. Each cell in a row, with the exception of the first, then is the value or quality of that property, such as zyarnot. The first column represents the dictionary we are building and the keys to the objects within the dictionary.

Knowing how the data is structured in Excel will greatly assist in building a good and efficient system for reading then writing it in the destination format. First, we build a template for the dictionary entries. These will always start with "{1}:" so that we can find where the key must go later. Each cell in this first row represents the title of each property as well as the data type of each property. Unlike VBA, YAML is not explicit with its data types, we are more concerned with data types like lists or non-data types like comments. Each property is added to our template as "{t#}: {v#}" with v being replaced with another character so later the interpreter can recognize what it needs to with the line. Despite the fact comments will be full width without title, I found titles must be declared and inserted before replacing the line, otherwise VBA would error out. Repeat this for each column until the template is complete.

Filling the templates with Replace() is quick work given that every type is implicit except lists and non-data comments, though lists are just an extra step of replacing semicolon-spaces with a newline, an extra tab and a dash with the data then following. Comments must find and replace the title as well as the empty data. Unlike with most YAML interpreters, I elected to have lines with undeclared properties delete the line. If we needed to convert from the YAML back to other methods, I would have the object declaration have optional variables that have default values of empty strings.

This will complete a whole dictionary without much effort. Each dictionary then is just reading a sheet in excel. Again, instead of explicitly declaring “read sheets in this list”, instead it goes through sheets between the how-to and output sheet. Every time the user clicks on the output sheet, data is read and written to it, dividing at about every 37 objects to prevent potential data loss due to cell rendering limits.

Here’s an example of what this simple approach can do:

Input

Visitors	Badge	Creator	In-op	Created	Issue	Comments
McTest, John	99995	zyarnot	zyarnot	2023.02.05.17.08.25	Misspelled first name (no “h”); Misspelled last name (extra “t” at end)	How many S’s are in “misspelled”?
Doe, John	99994	zyarnot	zyarnot	2023.02.04.21.02.12	Wrong category (Should be “Visitor”)

Output

Visitors: McTest, John: Badge: 99995 Creator: zyarnot In-op: zyarnot Created: 2023.02.05.17.08.25 Issue: - Misspelled first name (no "h") - Misspelled last name (extra "t" at end) # How many S's are in "misspelled"? Doe, John: Badge: 99994 Creator: zyarnot In-op: zyarnot Created: 2023.02.04.21.02.12 Issue: - Wrong category (Should be "Visitor")

Legal

Zachary Yarnot and DualVission do not hold any rights to these owners’ contents.
Microsoft Excel and Visual Basic for Applications are products of Microsoft Corporation.
Microsoft, Microsoft Office, and Microsoft Excel are registered trademarks of Microsoft Corporation.