Sunday, August 02, 2009

Creating Thesaurus files

I had a look at the excel file that Mauro Cardarelli wrote as a tool to generate thesaurus plans. While the tool is nice, I was dissapointed it did not include a facility to load an existing XML file and analyze it to see if it contains faults such as duplicates.

So what I did was write my own macro that loads a thesaurus file into Mauro's excel spreadsheet. The macro only loads expansions (as his spreadsheet does not deal with replacements) - but you can easily change it to load replacements if you want.

To use this macro, open Mauro's file, and open the visual basic window (alt-f11) and paste the procedure at the bottom of the "sheet1" module. To run it, either click the "run macro" button in excel, or from the VBA environment, hit F5 when the cursor is in the procedure. The macro will ask you for the path of the file, and will load it and populate the excel file.

Sub LoadThesaurusFile()
    Dim total As Integer
    total = 0
    Dim thesaurusFile As New DOMDocument
    Dim path As String
    path = InputBox("Path to the Thesaurus xml file", , ActiveWorkbook.path + "\thesaurus.xml")
    If (path = "") Then Exit Sub
    thesaurusFile.async = False
    thesaurusFile.validateOnParse = False
    'to avoid errors because of a thesaurus schema file, we will load the file as a text file, and then remove the schema reference
    Dim fs As TextStream
    Dim fso As New FileSystemObject
    Set fs = fso.OpenTextFile(path, ForReading, False, TristateUseDefault)
    s = fs.ReadAll
    s = Replace(s, "xmlns=""x-schema:tsSchema.xml""", "")

    Dim sht As Worksheet
    Set sht = ActiveSheet
    If (thesaurusFile.LoadXML(s)) Then
        Dim expansions As IXMLDOMNodeList
        Dim expansion As IXMLDOMNode
        Set expansions = thesaurusFile.FirstChild.SelectNodes("//expansion")
        'start from row 4
        Dim iCurrentRow As Integer
        iCurrentRow = 4
        'loop over the expansion tags, and populate the spreadsheet
        For i = 0 To expansions.Length - 1
            Dim iCurrentColumn As Integer
            Dim subs As IXMLDOMNodeList
            'load current exapnsion
            Set expansion = expansions.Item(i)
            'load the expansion subs
            Set subs = expansion.SelectNodes("sub")
            For iCurrentColumn = 0 To subs.Length - 1
                sht.Cells(iCurrentRow, iCurrentColumn + 1) = subs.Item(iCurrentColumn).Text
                total = total + 1
            Next iCurrentColumn
            iCurrentRow = iCurrentRow + 1
        Next i
        MsgBox ("Total entries: " & total)
    MsgBox ("Error loading the XML file: " + x.parseError.reason + vbCrLf + "Source:" + x.parseError.srcText + vbCrLf + "Link:" & x.parseError.Line)
   End If
   Set thesaurusFile = Nothing
End Sub

1 comment:

Mauro Cardarelli said...

Thanks for the mention and the extension of the code. I hadn't thought about importing files! Nice code.