The issue is that in my Excel output it does not stop after the subject line has been proceeded. I have a utm9 smtp proxy log and try to output date sender receiver and subject only. So it should stop as soon as the subject ends.
Here is some example of the log file:
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [104\126] F From: xxxS harma <[email protected]>
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [105\126] T To: "[email protected]" <[email protected]>,
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [106\126] "[email protected]" <[email protected]>
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [107\126] C CC: "xx, xsss [ADD IT]" <[email protected]>, "xxx, xx[ADD
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [108\126] IT]" <[email protected]>
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [109\126] Subject: xxArcserve-Wartungsvertrahhhg N. xxx& Co. GmbH hh wird
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [110\126] am 1.3.2025 ablaufen
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [111\126] Thread-Topic: Ihr xx-Wartungsvertrag xx. xx& xx. GmbH xx
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [112\126] wird am 1.3.2025 ablaufen
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [113\126] Thread-Index: AQHbZEVvoZQu1Fou30O6c9m84wfi5rMX+aUw
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [114\126] Date: Wed, 15 Jan 2025 15:21:07 +0000
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [115\126] I Message-ID: <PH0PR10MB4711CA54F8FD03D520F60105E9192@PH0PR10MB4711.namprd10.prod.outlook>
So what my script does in the subject line it copy everything after Thread-Index, but it should not do that but go to the next line that starts by F From: and process the next and so on.
The Excel app says as well the file is broken and need repair, but it still seems to have all I need. How can I fix my code?
# Define paths
$logFilePath = "C:\1.log"
$outputExcelPath = "C:\path\to\your\output.xlsx"
# Ensure ImportExcel module is installed
if (-not (Get-Module -ListAvailable -Name ImportExcel)) {
Write-Host "Installing ImportExcel module..."
Install-Module ImportExcel -Force -Scope CurrentUser
}
# Initialize arrays and variables
$emailData = @()
$mailaddress_regex = "([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"
$addtonextline = $false
$lastline = ""
$date = $null
$senders = @()
$receivers = @()
$ccs = @()
$subject = ""
# Read the log file into an array (reverse order for multi-line handling)
$text = Get-Content -Path $logFilePath
for ($i = $text.Count - 1; $i -ge 0; $i--) {
$line = $text[$i]
# Append last line if necessary (handling multi-line fields)
if ($addtonextline) {
$line = $line + " " + $lastline
}
# Capture date/time from log entry
if ($line -match '(\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2})') {
if ($date -and $senders -and $receivers -and $subject) {
# Store the parsed email data
$emailData += [PSCustomObject]@{
Date = $date
Sender = ($senders -join ", ")
Receiver = ($receivers -join ", ")
CC = ($ccs -join ", ")
Subject = $subject
}
# Reset values for next email entry
$senders = @()
$receivers = @()
$ccs = @()
$subject = ""
}
$date = $matches[1]
}
# Extract sender email using Select-String
if ($line -match "F From") {
$senders += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
$addtonextline = $false
}
# Extract recipient email(s) using Select-String
elseif ($line -match "T To") {
$receivers += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
$addtonextline = $false
}
# Extract CC email(s) using Select-String
elseif ($line -match "C CC") {
$ccs += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
$addtonextline = $false
}
# Extract subject line (may be multi-line)
elseif ($line -match "Subject: (.+)") {
$subject = $matches[1]
$addtonextline = $false
}
# Handle multi-line continuation
else {
if ($line -match "\[\d+\\\d+\]\s(.*)") {
$lastline = $matches[1]
$addtonextline = $true
}
}
}
# Add the last email entry if any data exists
if ($date -and $senders -and $receivers -and $subject) {
$emailData += [PSCustomObject]@{
Date = $date
Sender = ($senders -join ", ")
Receiver = ($receivers -join ", ")
CC = ($ccs -join ", ")
Subject = $subject
}
}
# Export the data to an Excel file
$emailData | Export-Excel -Path $outputExcelPath -AutoSize
Write-Host "Data has been successfully exported to $outputExcelPath"
The issue is that in my Excel output it does not stop after the subject line has been proceeded. I have a utm9 smtp proxy log and try to output date sender receiver and subject only. So it should stop as soon as the subject ends.
Here is some example of the log file:
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [104\126] F From: xxxS harma <[email protected]>
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [105\126] T To: "[email protected]" <[email protected]>,
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [106\126] "[email protected]" <[email protected]>
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [107\126] C CC: "xx, xsss [ADD IT]" <[email protected]>, "xxx, xx[ADD
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [108\126] IT]" <[email protected]>
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [109\126] Subject: xxArcserve-Wartungsvertrahhhg N. xxx& Co. GmbH hh wird
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [110\126] am 1.3.2025 ablaufen
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [111\126] Thread-Topic: Ihr xx-Wartungsvertrag xx. xx& xx. GmbH xx
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [112\126] wird am 1.3.2025 ablaufen
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [113\126] Thread-Index: AQHbZEVvoZQu1Fou30O6c9m84wfi5rMX+aUw
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [114\126] Date: Wed, 15 Jan 2025 15:21:07 +0000
2025:01:15-16:22:31 sutm1kor-2 exim-in[11594]: [115\126] I Message-ID: <PH0PR10MB4711CA54F8FD03D520F60105E9192@PH0PR10MB4711.namprd10.prod.outlook>
So what my script does in the subject line it copy everything after Thread-Index, but it should not do that but go to the next line that starts by F From: and process the next and so on.
The Excel app says as well the file is broken and need repair, but it still seems to have all I need. How can I fix my code?
# Define paths
$logFilePath = "C:\1.log"
$outputExcelPath = "C:\path\to\your\output.xlsx"
# Ensure ImportExcel module is installed
if (-not (Get-Module -ListAvailable -Name ImportExcel)) {
Write-Host "Installing ImportExcel module..."
Install-Module ImportExcel -Force -Scope CurrentUser
}
# Initialize arrays and variables
$emailData = @()
$mailaddress_regex = "([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"
$addtonextline = $false
$lastline = ""
$date = $null
$senders = @()
$receivers = @()
$ccs = @()
$subject = ""
# Read the log file into an array (reverse order for multi-line handling)
$text = Get-Content -Path $logFilePath
for ($i = $text.Count - 1; $i -ge 0; $i--) {
$line = $text[$i]
# Append last line if necessary (handling multi-line fields)
if ($addtonextline) {
$line = $line + " " + $lastline
}
# Capture date/time from log entry
if ($line -match '(\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2})') {
if ($date -and $senders -and $receivers -and $subject) {
# Store the parsed email data
$emailData += [PSCustomObject]@{
Date = $date
Sender = ($senders -join ", ")
Receiver = ($receivers -join ", ")
CC = ($ccs -join ", ")
Subject = $subject
}
# Reset values for next email entry
$senders = @()
$receivers = @()
$ccs = @()
$subject = ""
}
$date = $matches[1]
}
# Extract sender email using Select-String
if ($line -match "F From") {
$senders += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
$addtonextline = $false
}
# Extract recipient email(s) using Select-String
elseif ($line -match "T To") {
$receivers += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
$addtonextline = $false
}
# Extract CC email(s) using Select-String
elseif ($line -match "C CC") {
$ccs += Select-String -Pattern $mailaddress_regex -InputObject $line -AllMatches | ForEach-Object { $_.Matches.Value }
$addtonextline = $false
}
# Extract subject line (may be multi-line)
elseif ($line -match "Subject: (.+)") {
$subject = $matches[1]
$addtonextline = $false
}
# Handle multi-line continuation
else {
if ($line -match "\[\d+\\\d+\]\s(.*)") {
$lastline = $matches[1]
$addtonextline = $true
}
}
}
# Add the last email entry if any data exists
if ($date -and $senders -and $receivers -and $subject) {
$emailData += [PSCustomObject]@{
Date = $date
Sender = ($senders -join ", ")
Receiver = ($receivers -join ", ")
CC = ($ccs -join ", ")
Subject = $subject
}
}
# Export the data to an Excel file
$emailData | Export-Excel -Path $outputExcelPath -AutoSize
Write-Host "Data has been successfully exported to $outputExcelPath"
Share
Improve this question
edited Feb 2 at 22:20
halfer
20.3k19 gold badges109 silver badges202 bronze badges
asked Feb 2 at 8:59
Markus MeyerMarkus Meyer
11 silver badge2 bronze badges
3
|
2 Answers
Reset to default 1As commented, this is one of the worst formatted log file I have come across, but nevertheless it is still possible to get the wanted data out of it using the code below.
Try:
# start by clearing all used variables
$sender = $receiver = $cc = $subject = $null
$skip = $date = $what = $value = $readon = $null
# loop over the file line-by-line and produce ar array of objects in variable $result
# using switch is fast and very memory friendly, so also perfect for huge files
$result = switch -Regex -File 'C:\path\to\1.log' {
'^\s*$' {
# skip empty or whitespace-only lines
continue # continue onto the next line or exit the switch if there are no more items
}
'(Thread-Topic|Thread-Index|Date|Message-ID):' {
# skip lines for Thread-Topic, Thread-Index, Date, Message-ID
$readon = $false
continue
}
'^(\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2}).+(F From|T To|C CC|Subject):\s(.+)$' {
# the lines of interest. Determine if we need to append the next line's data
# or if we start a new property for the output
if ($readon) {
$value = $value -replace '\s{2,}', ' ' # normalize erraneous extra spaces
if ($what -eq 'To') { $receiver = ($value -split ',').Trim() -join '; ' }
elseif ($what -eq 'From') { $sender = ($value -split ',').Trim() -join '; ' }
elseif ($what -eq 'CC') { $cc = ($value -split ',').Trim() -join '; ' }
elseif ($what -eq 'Subject') { $subject = $value }
}
# get the string index for $matches[2] so we can collate the next part (if needed)
$skip = $_.IndexOf($matches[2]) - 1 # or use $skip = $_.LastIndexOf(']') + 1
$date = [DateTime]::ParseExact($matches[1], 'yyyy:MM:dd-HH:mm:ss', [CultureInfo]::InvariantCulture)
$what = ($matches[2] -split '\s')[-1] # --> From, To, CC or Subject
$value = $matches[3]
$readon = $true
continue
}
default {
# here's where we either append the next line's data to the $value
# or if we can start outputting the finished object
if ($readon) {
$value += (' {0}' -f $_.Substring($skip)) # append next data with a space in between
}
else {
$value = $value -replace '\s{2,}', ' ' # normalize erraneous extra spaces
if ($what -eq 'To') { $receiver = ($value -split ',').Trim() -join '; ' }
elseif ($what -eq 'From') { $sender = ($value -split ',').Trim() -join '; ' }
elseif ($what -eq 'CC') { $cc = ($value -split ',').Trim() -join '; ' }
elseif ($what -eq 'Subject') { $subject = $value }
# this is assuming CC will always be used..
# if the CC field can be empty (unused) simply remove '-and $cc' in the line below
if ($sender -and $receiver -and $cc -and $subject) {
# output an object
[PsCustomObject]@{
Date = $date
Sender = $sender
Receiver = $receiver
CC = $cc
Subject = $subject
}
# and clear the variables for next time
$sender = $receiver = $cc = $subject = $null
$skip = $date = $what = $value = $readon = $null
}
}
}
}
# finally, export the resulting data. You can use `Export-Excel`, but easier is to export to a CSV file
# you can simply double-click to open in Excel. That way you don't need to use module ImportExcel
$result | Export-Csv -Path 'C:\path\to\ParsedLog.csv' -Encoding UTF8 -UseCulture
If it is possible for you to somehow get better logfiles out of your system I would certainly recommend you to make that happen.
I tested following code and it works very well
$logFilePath = "C:\temp\test.txt"
$outputExcelPath = "C:\temp\test.xlsx"
$reader = [System.IO.StreamReader]::new($logFilePath)
$table = [System.Collections.ArrayList]::new()
$pattern = '^(?<prefix>.{2})(?<header>[A-Z\-a-z]+):(?<message>.*)|^(?<message>.*)'
While(($line = $reader.ReadLine()) -ne $null)
{
if ($line -match '^\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2}')
{
$date = $line.Substring(0,19)
$text = $line.Substring(57)
$match = $text | Select-String -Pattern $pattern
if($match.Matches.groups[2].length -eq 0) #continuation line
{
$message = $text.Trim()
$emailData.$oldHeader += $message
}
else `
{
$prefix = $match.Matches.groups[1].value
$header = $match.Matches.groups[2].value
$message = $match.Matches.groups[3].value
switch($header)
{
'From' {
$emailData = New-Object -TypeName psobject
$emailData | Add-Member -NotePropertyName Date -NotePropertyValue $date
$emailData | Add-Member -NotePropertyName Sender -NotePropertyValue $message.Trim()
$table.Add($emailData) | out-null
$oldHeader = 'Sender'
}
'To' { $emailData | Add-Member -NotePropertyName Receiver -NotePropertyValue $message.Trim(); $oldHeader = 'Receiver' }
'CC' { $emailData | Add-Member -NotePropertyName CC -NotePropertyValue $message.Trim(); $oldHeader = 'CC' }
'Subject' { $emailData | Add-Member -NotePropertyName Subject -NotePropertyValue $message.Trim(); $oldHeader = 'Subject' }
}
}
}
}
$table | Format-List
Message-ID
line. – Theo Commented Feb 4 at 12:33