I have a csv file of the following format:
12;0;5/15/2008;1:01:09;1;0;0;None;97;39;0.279;;;0;;;;;0;0;0;;;;;;;;;;;;;;;;
Then I read the file into a pandas dataframe:
df = pd.read_csv(config.get("OS", "output_file"), header=None, delimiter=config.get("OS", "delimiter"), decimal=".")
df=df.fillna(value='')
print(df)
and in the output I only have 13 columns ending with the last non-null column.
How can I preserve the number of columns in the dataframe and have the trailing columns in the dataframe set to null?
I have a csv file of the following format:
12;0;5/15/2008;1:01:09;1;0;0;None;97;39;0.279;;;0;;;;;0;0;0;;;;;;;;;;;;;;;;
Then I read the file into a pandas dataframe:
df = pd.read_csv(config.get("OS", "output_file"), header=None, delimiter=config.get("OS", "delimiter"), decimal=".")
df=df.fillna(value='')
print(df)
and in the output I only have 13 columns ending with the last non-null column.
How can I preserve the number of columns in the dataframe and have the trailing columns in the dataframe set to null?
Share Improve this question asked Mar 14 at 1:21 ShmygShmyg 132 bronze badges 3- Welcome to SO, what datatype do you want those columns be, they're defaulting to float46 (for me at least) with 'NaN' as the default ( i'm also seeing 36 columns !, when you say you only have 13 columns do you mean the rest are missing or you only want 13 !, is this on windows? what does config.get("OS", delimiter") resolve to ?. – ticktalk Commented Mar 14 at 1:23
- Delimiter is set to semicolon. I have only 13 or so. After the last 0, there is nothing. I ran this on both Windows and Linux. I haven't thought about the data type; at the moment, my concern is getting the columns back. – Shmyg Commented Mar 14 at 1:30
- 1 show the output of the operations (copy/paste - no screenshots) tks – ticktalk Commented Mar 14 at 6:13
1 Answer
Reset to default 0Here's what i get (linux mint 20.x , python 3.12.2, pandas 2.2.2)
cat shmyg.py
import pandas as pd
import sys
if len(sys.argv) < 3:
print(f'usage: {sys.argv[0]} separator filename')
sys.exit(1)
df = pd.read_csv(sys.argv[2], sep=sys.argv[1], header=None)
df.info()
print(df)
####
cat shmyg.txt
12;0;5/15/2008;1:01:09;1;0;0;None;97;39;0.279;;;0;;;;;0;0;0;;;;;;;;;;;;;;;;
####
python shmyg.py ';' shmyg.txt
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 37 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 1 non-null int64
1 1 1 non-null int64
2 2 1 non-null object
3 3 1 non-null object
4 4 1 non-null int64
5 5 1 non-null int64
6 6 1 non-null int64
7 7 0 non-null float64
8 8 1 non-null int64
9 9 1 non-null int64
10 10 1 non-null float64
11 11 0 non-null float64
12 12 0 non-null float64
13 13 1 non-null int64
14 14 0 non-null float64
15 15 0 non-null float64
16 16 0 non-null float64
17 17 0 non-null float64
18 18 1 non-null int64
19 19 1 non-null int64
20 20 1 non-null int64
21 21 0 non-null float64
22 22 0 non-null float64
23 23 0 non-null float64
24 24 0 non-null float64
25 25 0 non-null float64
26 26 0 non-null float64
27 27 0 non-null float64
28 28 0 non-null float64
29 29 0 non-null float64
30 30 0 non-null float64
31 31 0 non-null float64
32 32 0 non-null float64
33 33 0 non-null float64
34 34 0 non-null float64
35 35 0 non-null float64
36 36 0 non-null float64
dtypes: float64(24), int64(11), object(2)
memory usage: 428.0+ bytes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
0 12 0 5/15/2008 1:01:09 1 0 0 NaN 97 39 0.279 NaN NaN 0 NaN NaN NaN NaN 0 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN