Parsing XML with PowerShell

I'm addicted to PowerShell. This cool scripting environment is simple to use, and with very few lines of script; it is possible to accomplish tasks that otherwise often would be a lot of tedious work. (If we didn't have PowerShell, I would propably wip up a C# program to do the same, but PowerShell is really lightweight, is interactive and is generally very forgiving for small tasks where you just "want the job done".

As an example, today I needed to look at a log files generated by Visual Studio to figure out why the environment wouldn't start on my home PC. As it turns out, these log files are actually XML files. Of course I could have just started reading through the XML, but all the angle brackets confuses my brain; when I'm actually mostly interested in the text content of the log file.

So, five minutes later, this 3-line script; parse-vslog.ps1 was born:

1: param( [string]$file = $(throw "required parameter" ) )
2: $log = [xml](get-content $file)
3: $log.activity.entry | select record,type,description | format-table -wrap -auto

This is what happens in the script:

On line 1, we declare that we need a $file parameter (variables and parameters is prefixed with $ in PowerShell), that should be required.

On line 2 we use the get-content cmdlet to get the contents of a file. PowerShell has a lot of XML helping features, one of which is the ability to "cast" the content to XML using the [xml] construct. What really happens behind the scenes, is that PowerShell instantiates an XmlDocument and loads the text content of the file in that.

Last, on line 3, we take advantage of the fact that PowerShell let's us select XML nodes by using simple dotted notation. Here we are interested in all the the /activity/entry nodes. We pass the result along the pipeline and selects the 3 most important values using the select cmdlet. And, lastly, we format the output nicely with format-table, specifying that we would like the cmdlet to auto-select the column widths (-auto) and that text output should be wrapped on multiple lines (-wrap).

So insted of having to look at XML that goes on like this:

1: xml-stylesheet type="text/xsl" href="ActivityLog.xsl"?>
2: activity>
3:   entry>
4:     record>1record>
5:     time>2008/06/15 15:44:18.220time>
6:     type>Informationtype>
7:     source>Microsoft Visual Studiosource>
8:     description>Visual Studio Version: 9.0.21022.8description>
9:   entry>
10:   entry>
11:     record>2record>
12:     time>2008/06/15 15:44:18.221time>
13:     type>Informationtype>
14:     source>Microsoft Visual Studiosource>
15:     description>Running in User Groups: Administrators Usersdescription>
16:   entry>
17:   entry>
18:     record>3record>
19:     time>2008/06/15 15:44:18.221time>
20:     type>Informationtype>
21:     source>Microsoft Visual Studiosource>
22:     description>ProductID: 91904-270-0003722-60402description>
23:   entry>
24:   entry>
25:     record>19record>
26:     time>2008/06/15 15:44:19.094time>
27:     type>type>
28:     source>Microsoft Visual Studiosource>
29:     description>Destroying Main Windowdescription>
30:   entry>
31: activity>

Now, I can get this much nicer output in the console (note that the XML above has been shortened for the blog. It was actually around 150 lines):

record type        description
------ ----        -----------
1      Information Visual Studio Version: 9.0.21022.8
2      Information Running in User Groups: Administrators Users
3      Information ProductID: 91904-270-0003722-60402
4      Information Available Drive Space: C:\ drive has 42128211968 bytes; D:\ drive has 38531145728 bytes; E:\ drive h
                   as 127050969088 bytes; F:\ drive has 117087354880 bytes
5      Information Internet Explorer Version: 7.0.6001.18063
6      Information Microsoft Data Access Version: 6.0.6001.18000
7      Information .NET Framework Version: 2.0.50727.1434
8      Information MSXML Version: 6.20.1076.0
9      Information Loading UI library
10     Information Entering function CVsPackageInfo::HrInstantiatePackage
11     Information Begin package load [Visual Studio Source Control Integration Package]
12     Information Entering function CVsPackageInfo::HrInstantiatePackage
13     Information Begin package load [team foundation server provider stub package]
14     Information End package load [team foundation server provider stub package]
15     Information End package load [Visual Studio Source Control Integration Package]
16     Information Entering function VBDispatch::GetTypeLib
17     Information Entering function LoadDTETypeLib
18     Error       Leaving function LoadDTETypeLib
19                 Destroying Main Window

I think this is a good representative of the strength of PowerShell. Using only a few lines of script and a minimum of time, I created a reusable script, that will probaply save a lot of time in the future.