I'm trying to reduce a string to two values and attach the previous year's value in Hive.
I'm looking to substring the column as follows:
create table substring_income_dataset
select year, substring(income_total, 1, 2), benefit_type, sum(household_income),
count(*)
from income_dataset
group by year, benefit_type, substring(income_total, 1, 2)
I'm also looking to use the lag function to include the previous year's value based on a primary key:
create table previous_year_income as
select*,
lag(benefit_type,1,0) over (partition by primary_key) as previous_benefit_type,
lag(income_total,1,0) over (partition by primary_key) as previous_income_type
from income_dataset;
Can somebody please suggest how I can combine the two?