I have 2 dataframes that are the results of 2 queries, returned in JSON. One of the dataframes sap_scrapped
looks like this and was generated by:
sap_scrapped = pd.DataFrame(response.json()["d"]["results"]))
Material Scrapped
0 08-008099-00 2
1 10-000001-00 5
The other dataframe sap_warehouse
looks like this and was generated by:
sap_warehouse = pd.DataFrame(response.json()["d"]["results"]))
Material SSP Scrapped
0 10-000001-00 0 0
1 10-789001-00 10 7
I want to compare each value in the sap_scrapped
'Material' column to the sap_warehouse
'Material' column. If the scrapped material is not found anywhere in sap_warehouse
, then I want to insert that missing material to the sap_warehouse
dataframe. Using the data above, I'd want to insert 08-008099-00
into the sap_warehouse
dataframe into the 'Material' column, zero in the SSP column and copy the 'Scrapped' value from the sap_scrapped
dataframe (2) into 'Scrapped' in sap_warehouse
.
How can I do that?
I have 2 dataframes that are the results of 2 queries, returned in JSON. One of the dataframes sap_scrapped
looks like this and was generated by:
sap_scrapped = pd.DataFrame(response.json()["d"]["results"]))
Material Scrapped
0 08-008099-00 2
1 10-000001-00 5
The other dataframe sap_warehouse
looks like this and was generated by:
sap_warehouse = pd.DataFrame(response.json()["d"]["results"]))
Material SSP Scrapped
0 10-000001-00 0 0
1 10-789001-00 10 7
I want to compare each value in the sap_scrapped
'Material' column to the sap_warehouse
'Material' column. If the scrapped material is not found anywhere in sap_warehouse
, then I want to insert that missing material to the sap_warehouse
dataframe. Using the data above, I'd want to insert 08-008099-00
into the sap_warehouse
dataframe into the 'Material' column, zero in the SSP column and copy the 'Scrapped' value from the sap_scrapped
dataframe (2) into 'Scrapped' in sap_warehouse
.
How can I do that?
Share Improve this question edited Mar 23 at 18:20 wjandrea 33.2k10 gold badges69 silver badges98 bronze badges asked Mar 23 at 17:25 JoeJoe 12 bronze badges 3 |2 Answers
Reset to default 1You can use data-fingerprint
tool: https://pypi./project/data-fingerprint/#description to check differencess between those two dataframes, and than based on the difference insert the ones that are missing, here is an example:
import pandas as pd
import polars as pl
from data_fingerprint.srcparator import get_data_report
from data_fingerprint.src.models import DataReport
from data_fingerprint.src.utils import get_dataframe
if __name__ == "__main__":
sap_scrapped: pl.DataFrame = pl.DataFrame(
{"Material": ["08-008099-0", "10-000001-00"], "Scrapped": [2, 5]}
)
sap_warehouse: pl.DataFrame = pl.DataFrame(
{
"Material": ["10-000001-00", "10-789001-00"],
"SSP": [0, 10],
"Scrapped": [0, 7],
}
)
# get difference information
report: DataReport = get_data_report(
sap_scrapped,
sap_warehouse.drop("Scrapped"),
"sap_scrapped",
"sap_warehouse",
grouping_columns=["Material"],
)
# transform difference to dataframe
difference: pl.DataFrame = get_dataframe(report)
# filter out difference from "sap_scrapped"
to_insert_materials: pl.DataFrame = difference.filter(
pl.col("source") == "sap_scrapped"
)
# gather information
to_insert: pl.DataFrame = sap_scrapped.filter(
pl.col("Material").is_in(to_insert_materials["Material"])
).with_columns(pl.lit(0).alias("SSP"))
print(to_insert)
You can also use pandas.DataFrame instead of polars.DataFrame
Here's one approach:
out = (
sap_warehouse.set_index('Material')
bine_first(
sap_scrapped.set_index('Material').assign(SSP=0)
)
.reset_index()
)
Output:
Material SSP Scrapped
0 08-008099-00 0 2
1 10-000001-00 0 0
2 10-789001-00 10 7
Explanation
- Use
.set_index
to make 'Material' the index for both dfs and applybine_first
. - Add 'SSP' to
sap_scrapped
as '0' for missing values via.assign
. - Reset the index with
.reset_index
.
Data used
import pandas as pd
data = {'Material': {0: '08-008099-00', 1: '10-000001-00'},
'Scrapped': {0: 2, 1: 5}}
sap_scrapped = pd.DataFrame(data)
data2 = {'Material': {0: '10-000001-00', 1: '10-789001-00'},
'SSP': {0: 0, 1: 10}, 'Scrapped': {0: 0, 1: 7}}
sap_warehouse = pd.DataFrame(data2)
sap_warehouse
? Please edit to clarify and add desired output. See How to make good reproducible pandas examples. (See also minimal reproducible example.) – wjandrea Commented Mar 23 at 18:21response
here, which is confusing on first glance, but obviously that would be a different object depending on the context. – wjandrea Commented Mar 23 at 18:35