最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Debugging a string parser in PowerShell - Stack Overflow

programmeradmin0浏览0评论

The issue is that in my Excel output it does not stop after the subject line has been proceeded. I have a utm9 smtp proxy log and try to output date sender receiver and subject only. So it should stop as soon as the subject ends.

Here is some example of the log file:

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [104\126] F From: xxxS harma <[email protected]>

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [105\126] T To: "[email protected]" <[email protected]>,

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [106\126]    "[email protected]" <[email protected]>

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [107\126] C CC: "xx, xsss [ADD IT]" <[email protected]>, "xxx, xx[ADD

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [108\126]  IT]" <[email protected]>

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [109\126]   Subject: xxArcserve-Wartungsvertrahhhg  N. xxx& Co. GmbH hh wird

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [110\126]  am 1.3.2025 ablaufen

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [111\126]   Thread-Topic: Ihr xx-Wartungsvertrag  xx. xx& xx. GmbH xx

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [112\126]  wird am 1.3.2025 ablaufen

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [113\126]   Thread-Index: AQHbZEVvoZQu1Fou30O6c9m84wfi5rMX+aUw

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [114\126]   Date: Wed, 15 Jan 2025 15:21:07 +0000

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [115\126] I Message-ID: <PH0PR10MB4711CA54F8FD03D520F60105E9192@PH0PR10MB4711.namprd10.prod.outlook>

So what my script does in the subject line it copy everything after Thread-Index, but it should not do that but go to the next line that starts by F From: and process the next and so on.

The Excel app says as well the file is broken and need repair, but it still seems to have all I need. How can I fix my code?

# Define paths
$logFilePath = "C:\1.log"
$outputExcelPath = "C:\path\to\your\output.xlsx"

# Ensure ImportExcel module is installed
if (-not (Get-Module -ListAvailable -Name ImportExcel)) {
    Write-Host "Installing ImportExcel module..."
    Install-Module ImportExcel -Force -Scope CurrentUser
}

# Initialize arrays and variables
$emailData = @()
$mailaddress_regex = "([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"
$addtonextline = $false
$lastline = ""

$date = $null
$senders = @()
$receivers = @()
$ccs = @()
$subject = ""

# Read the log file into an array (reverse order for multi-line handling)
$text = Get-Content -Path $logFilePath

for ($i = $text.Count - 1; $i -ge 0; $i--) {
    $line = $text[$i]

    # Append last line if necessary (handling multi-line fields)
    if ($addtonextline) {
        $line = $line + " " + $lastline
    }

    # Capture date/time from log entry
    if ($line -match '(\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2})') {
        if ($date -and $senders -and $receivers -and $subject) {
            # Store the parsed email data
            $emailData += [PSCustomObject]@{
                Date     = $date
                Sender   = ($senders -join ", ")
                Receiver = ($receivers -join ", ")
                CC       = ($ccs -join ", ")
                Subject  = $subject
            }

            # Reset values for next email entry
            $senders = @()
            $receivers = @()
            $ccs = @()
            $subject = ""
        }
        $date = $matches[1]
    }

    # Extract sender email using Select-String
    if ($line -match "F From") {
        $senders += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
        $addtonextline = $false
    }
    # Extract recipient email(s) using Select-String
    elseif ($line -match "T To") {
        $receivers += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
        $addtonextline = $false
    }
    # Extract CC email(s) using Select-String
    elseif ($line -match "C CC") {
        $ccs += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
        $addtonextline = $false
    }
    # Extract subject line (may be multi-line)
    elseif ($line -match "Subject: (.+)") {
        $subject = $matches[1]
        $addtonextline = $false
    }
    # Handle multi-line continuation
    else {
        if ($line -match "\[\d+\\\d+\]\s(.*)") {
            $lastline = $matches[1]
            $addtonextline = $true
        }
    }
}

# Add the last email entry if any data exists
if ($date -and $senders -and $receivers -and $subject) {
    $emailData += [PSCustomObject]@{
        Date     = $date
        Sender   = ($senders -join ", ")
        Receiver = ($receivers -join ", ")
        CC       = ($ccs -join ", ")
        Subject  = $subject
    }
}

# Export the data to an Excel file
$emailData | Export-Excel -Path $outputExcelPath -AutoSize

Write-Host "Data has been successfully exported to $outputExcelPath"

The issue is that in my Excel output it does not stop after the subject line has been proceeded. I have a utm9 smtp proxy log and try to output date sender receiver and subject only. So it should stop as soon as the subject ends.

Here is some example of the log file:

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [104\126] F From: xxxS harma <[email protected]>

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [105\126] T To: "[email protected]" <[email protected]>,

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [106\126]    "[email protected]" <[email protected]>

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [107\126] C CC: "xx, xsss [ADD IT]" <[email protected]>, "xxx, xx[ADD

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [108\126]  IT]" <[email protected]>

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [109\126]   Subject: xxArcserve-Wartungsvertrahhhg  N. xxx& Co. GmbH hh wird

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [110\126]  am 1.3.2025 ablaufen

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [111\126]   Thread-Topic: Ihr xx-Wartungsvertrag  xx. xx& xx. GmbH xx

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [112\126]  wird am 1.3.2025 ablaufen

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [113\126]   Thread-Index: AQHbZEVvoZQu1Fou30O6c9m84wfi5rMX+aUw

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [114\126]   Date: Wed, 15 Jan 2025 15:21:07 +0000

2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [115\126] I Message-ID: <PH0PR10MB4711CA54F8FD03D520F60105E9192@PH0PR10MB4711.namprd10.prod.outlook>

So what my script does in the subject line it copy everything after Thread-Index, but it should not do that but go to the next line that starts by F From: and process the next and so on.

The Excel app says as well the file is broken and need repair, but it still seems to have all I need. How can I fix my code?

# Define paths
$logFilePath = "C:\1.log"
$outputExcelPath = "C:\path\to\your\output.xlsx"

# Ensure ImportExcel module is installed
if (-not (Get-Module -ListAvailable -Name ImportExcel)) {
    Write-Host "Installing ImportExcel module..."
    Install-Module ImportExcel -Force -Scope CurrentUser
}

# Initialize arrays and variables
$emailData = @()
$mailaddress_regex = "([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"
$addtonextline = $false
$lastline = ""

$date = $null
$senders = @()
$receivers = @()
$ccs = @()
$subject = ""

# Read the log file into an array (reverse order for multi-line handling)
$text = Get-Content -Path $logFilePath

for ($i = $text.Count - 1; $i -ge 0; $i--) {
    $line = $text[$i]

    # Append last line if necessary (handling multi-line fields)
    if ($addtonextline) {
        $line = $line + " " + $lastline
    }

    # Capture date/time from log entry
    if ($line -match '(\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2})') {
        if ($date -and $senders -and $receivers -and $subject) {
            # Store the parsed email data
            $emailData += [PSCustomObject]@{
                Date     = $date
                Sender   = ($senders -join ", ")
                Receiver = ($receivers -join ", ")
                CC       = ($ccs -join ", ")
                Subject  = $subject
            }

            # Reset values for next email entry
            $senders = @()
            $receivers = @()
            $ccs = @()
            $subject = ""
        }
        $date = $matches[1]
    }

    # Extract sender email using Select-String
    if ($line -match "F From") {
        $senders += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
        $addtonextline = $false
    }
    # Extract recipient email(s) using Select-String
    elseif ($line -match "T To") {
        $receivers += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
        $addtonextline = $false
    }
    # Extract CC email(s) using Select-String
    elseif ($line -match "C CC") {
        $ccs += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
        $addtonextline = $false
    }
    # Extract subject line (may be multi-line)
    elseif ($line -match "Subject: (.+)") {
        $subject = $matches[1]
        $addtonextline = $false
    }
    # Handle multi-line continuation
    else {
        if ($line -match "\[\d+\\\d+\]\s(.*)") {
            $lastline = $matches[1]
            $addtonextline = $true
        }
    }
}

# Add the last email entry if any data exists
if ($date -and $senders -and $receivers -and $subject) {
    $emailData += [PSCustomObject]@{
        Date     = $date
        Sender   = ($senders -join ", ")
        Receiver = ($receivers -join ", ")
        CC       = ($ccs -join ", ")
        Subject  = $subject
    }
}

# Export the data to an Excel file
$emailData | Export-Excel -Path $outputExcelPath -AutoSize

Write-Host "Data has been successfully exported to $outputExcelPath"
Share Improve this question edited Feb 2 at 22:20 halfer 20.3k19 gold badges109 silver badges202 bronze badges asked Feb 2 at 8:59 Markus MeyerMarkus Meyer 11 silver badge2 bronze badges 3
  • A good way to fix code is to commit it to a version control system like Git, so that it is easy to see what has been changed. Then add some debug statements, so you can test intermediate values. In any program that does not work, there will be a difference between your expectation and reality. Your debug commands will help you find that bug. Once the bug has been identified, your version control system will make it easy to remove the debug commands, and then you can add the fix and re-commit. – halfer Commented Feb 2 at 22:22
  • Again, your question about parsing this log file has been closed and this time I personally think that this should not have been closed.. (I voted to reopen, but that takes three of us) The thing is that your file is just about the worst formatted log file ever and the explanation of what you need to filter out of that data is not all that clear. – Theo Commented Feb 4 at 12:33
  • Anyway, since you have to deal with a file so badly constructed, I do believe you should give it another try and post the question again, but this time make sure you specify what you need as output more clearly. Things like The Excel app says as well the file is broken and need repair means nothing and should be avoided in your post. Show a better example with 2 blocks of data so we can see how the text continues after the Message-ID line. – Theo Commented Feb 4 at 12:33
Add a comment  | 

2 Answers 2

Reset to default 1

As commented, this is one of the worst formatted log file I have come across, but nevertheless it is still possible to get the wanted data out of it using the code below.

Try:

# start by clearing all used variables
$sender = $receiver = $cc = $subject = $null
$skip = $date = $what = $value = $readon = $null

# loop over the file line-by-line and produce ar array of objects in variable $result
# using switch is fast and very memory friendly, so also perfect for huge files
$result = switch -Regex -File 'C:\path\to\1.log' {
    '^\s*$' {
        # skip empty or whitespace-only lines
        continue    # continue onto the next line or exit the switch if there are no more items
    }
    '(Thread-Topic|Thread-Index|Date|Message-ID):' {
        # skip lines for Thread-Topic, Thread-Index, Date, Message-ID
        $readon = $false
        continue
    }
    '^(\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2}).+(F From|T To|C CC|Subject):\s(.+)$' {
        # the lines of interest. Determine if we need to append the next line's data
        # or if we start a new property for the output
        if ($readon) {
            $value = $value -replace '\s{2,}', ' '  # normalize erraneous extra spaces
            if ($what -eq 'To') { $receiver = ($value -split ',').Trim() -join '; ' }
            elseif ($what -eq 'From') { $sender = ($value -split ',').Trim() -join '; ' }
            elseif ($what -eq 'CC') { $cc = ($value -split ',').Trim() -join '; ' }
            elseif ($what -eq 'Subject') { $subject = $value }
        }
        # get the string index for $matches[2] so we can collate the next part (if needed)
        $skip   = $_.IndexOf($matches[2]) - 1     # or use $skip = $_.LastIndexOf(']') + 1
        $date   = [DateTime]::ParseExact($matches[1], 'yyyy:MM:dd-HH:mm:ss', [CultureInfo]::InvariantCulture)
        $what   = ($matches[2] -split '\s')[-1]   # --> From, To, CC or Subject
        $value  = $matches[3]
        $readon = $true
        continue
    }
    default {
        # here's where we either append the next line's data to the $value 
        # or if we can start outputting the finished object
        if ($readon) {
            $value += (' {0}' -f $_.Substring($skip))  # append next data with a space in between
        }
        else {
            $value = $value -replace '\s{2,}', ' '  # normalize erraneous extra spaces
            if ($what -eq 'To') { $receiver = ($value -split ',').Trim() -join '; ' }
            elseif ($what -eq 'From') { $sender = ($value -split ',').Trim() -join '; ' }
            elseif ($what -eq 'CC') { $cc = ($value -split ',').Trim() -join '; ' }
            elseif ($what -eq 'Subject') { $subject = $value }
            # this is assuming CC will always be used..
            # if the CC field can be empty (unused) simply remove '-and $cc' in the line below
            if ($sender -and $receiver -and $cc -and $subject) {
                # output an object
                [PsCustomObject]@{
                    Date     = $date
                    Sender   = $sender
                    Receiver = $receiver
                    CC       = $cc
                    Subject  = $subject
                }
                # and clear the variables for next time
                $sender = $receiver = $cc = $subject = $null
                $skip = $date = $what = $value = $readon = $null
            }
        }
    }
}

# finally, export the resulting data. You can use `Export-Excel`, but easier is to export to a CSV file
# you can simply double-click to open in Excel. That way you don't need to use module ImportExcel

$result | Export-Csv -Path 'C:\path\to\ParsedLog.csv' -Encoding UTF8 -UseCulture

If it is possible for you to somehow get better logfiles out of your system I would certainly recommend you to make that happen.

I tested following code and it works very well

$logFilePath = "C:\temp\test.txt"
$outputExcelPath = "C:\temp\test.xlsx"

$reader = [System.IO.StreamReader]::new($logFilePath)
$table = [System.Collections.ArrayList]::new()

$pattern = '^(?<prefix>.{2})(?<header>[A-Z\-a-z]+):(?<message>.*)|^(?<message>.*)'
While(($line = $reader.ReadLine()) -ne $null)
{
   if ($line -match '^\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2}')
   {
      $date = $line.Substring(0,19)
      $text = $line.Substring(57)
      
      $match = $text | Select-String -Pattern $pattern
      if($match.Matches.groups[2].length -eq 0) #continuation line
      {
          $message = $text.Trim()
          $emailData.$oldHeader += $message
      }
      else `
      {
         $prefix = $match.Matches.groups[1].value
         $header = $match.Matches.groups[2].value
         $message = $match.Matches.groups[3].value
         
         switch($header)
         {
             'From' {
                      $emailData = New-Object -TypeName psobject
                      $emailData | Add-Member -NotePropertyName Date -NotePropertyValue $date
                      $emailData | Add-Member -NotePropertyName Sender -NotePropertyValue $message.Trim()
                      $table.Add($emailData) | out-null
                      $oldHeader = 'Sender'
             }
             'To' { $emailData | Add-Member -NotePropertyName Receiver -NotePropertyValue $message.Trim(); $oldHeader = 'Receiver' }
             'CC' { $emailData | Add-Member -NotePropertyName CC -NotePropertyValue $message.Trim(); $oldHeader = 'CC' }
             'Subject' { $emailData | Add-Member -NotePropertyName Subject -NotePropertyValue $message.Trim(); $oldHeader = 'Subject' }
         }
      }
   }
}
$table | Format-List
发布评论

评论列表(0)

  1. 暂无评论