I am attempting to summarize data in an event log. I have 2 event I want to track. Even A and Event B. I would like to count how many times event B occurs in between occurrences of event A. for example:
Date Time Event
0 2025-02-01 03:51:40 A
1 2025-02-01 05:53:31 B
2 2025-02-01 07:55:05 B
3 2025-02-01 10:14:52 B
4 2025-02-01 12:17:01 A
5 2025-02-01 14:20:15 B
6 2025-02-01 20:26:04 A
7 2025-02-01 22:31:27 A
8 2025-02-02 03:50:48 B
9 2025-02-02 05:52:28 B
10 2025-02-02 14:00:45 A
I would like to return,
Date Time Event B Count
0 2025-02-01 03:51:40 A 0
1 2025-02-01 12:17:01 A 3
2 2025-02-01 20:26:04 A 1
3 2025-02-01 22:31:27 A 0
4 2025-02-02 14:00:45 A 2
I have no idea how to accomplish this. Any help is appreciated. Also this is my first stack overflow question, so I apologize if I have done/formatted anything wrong.
I am attempting to summarize data in an event log. I have 2 event I want to track. Even A and Event B. I would like to count how many times event B occurs in between occurrences of event A. for example:
Date Time Event
0 2025-02-01 03:51:40 A
1 2025-02-01 05:53:31 B
2 2025-02-01 07:55:05 B
3 2025-02-01 10:14:52 B
4 2025-02-01 12:17:01 A
5 2025-02-01 14:20:15 B
6 2025-02-01 20:26:04 A
7 2025-02-01 22:31:27 A
8 2025-02-02 03:50:48 B
9 2025-02-02 05:52:28 B
10 2025-02-02 14:00:45 A
I would like to return,
Date Time Event B Count
0 2025-02-01 03:51:40 A 0
1 2025-02-01 12:17:01 A 3
2 2025-02-01 20:26:04 A 1
3 2025-02-01 22:31:27 A 0
4 2025-02-02 14:00:45 A 2
I have no idea how to accomplish this. Any help is appreciated. Also this is my first stack overflow question, so I apologize if I have done/formatted anything wrong.
Share Improve this question asked Mar 26 at 19:07 RMAC52RMAC52 1 4- 1 What you are looking for is a simple python based state machine – JonSG Commented Mar 26 at 19:24
- Is the 1st file on going and you need the 2nd file to be automatically generated on each change? Is this supposed to be run manually? Is the 1st file a CSV? – Uberhumus Commented Mar 26 at 19:58
- If you show us where you are code wise, we can offer suggestions. – JonSG Commented Mar 26 at 20:28
- Is the data a Pandas dataframe or .csv file or what? – user19077881 Commented Mar 26 at 23:36
2 Answers
Reset to default 0It looks as though you're working with a pandas DataFrame. If that's the case then let's assume that the origin of the data is a CSV file that looks like:
Date,Time,Event
2025-02-01,03:51:40,A
2025-02-01,05:53:31,B
2025-02-01,07:55:05,B
2025-02-01,10:14:52,B
2025-02-01,12:17:01,A
2025-02-01,14:20:15,B
2025-02-01,20:26:04,A
2025-02-01,22:31:27,A
2025-02-02,03:50:48,B
2025-02-02,05:52:28,B
2025-02-02,14:00:45,A
Construct a DataFrame based on the CSV file contents. Iterate over the DataFrame rows and build a dictionary taking into account the number of B events counted before any A event. Create a new DataFrame from the dictionary.
import pandas as pd
from collections import defaultdict
FILENAME = "foo.csv"
b_count = 0
d = defaultdict(list)
for _, (_date, _time, _event) in pd.read_csv(FILENAME).iterrows():
if _event == "A":
d["Date"].append(_date)
d["Time"].append(_time)
d["Event"].append(_event)
d["B Count"].append(b_count)
b_count = 0
else:
b_count += 1
print(pd.DataFrame.from_dict(d))
Output:
Date Time Event B Count
0 2025-02-01 03:51:40 A 0
1 2025-02-01 12:17:01 A 3
2 2025-02-01 20:26:04 A 1
3 2025-02-01 22:31:27 A 0
4 2025-02-02 14:00:45 A 2
The user has mentioned 'event.log' and displayed what seems to be a SPACE+ separated file - not a 'traditional' csv. Here's a trivial example to load and process as per requirements.
cat bilvit.py
import sys
import re
if len( sys.argv ) != 2:
print(f"usage:{sys.argv[0]} FILENAME")
sys.exit(1)
bCount = 0
i=0
with open(sys.argv[1],'r') as file:
records = file.readlines()
for record in records:
record = record.strip()
if not record: continue # skip empty lines
_junk, _date, _time, _event = re.split(r'\s+', record )
if _event == "A":
if ( i == 0 ): print(" Date Time Event B Count") # header
print(f"{i:-3d} {_date} {_time} {_event} {bCount}")
bCount = 0
i += 1
else:
bCount += 1
#
# run it
python bilvit.py event.log
Date Time Event B Count
0 2025-02-01 03:51:40 A 0
1 2025-02-01 12:17:01 A 3
2 2025-02-01 20:26:04 A 1
3 2025-02-01 22:31:27 A 0
4 2025-02-02 14:00:45 A 2