I'm trying to convert all my columns to numeric except the first column which I want to keep character. They are currently character-based and are missing. I tried the following code below.
/*fake example to show*/
data test_set;
input color $ char1 $ char2 $;
datalines;
blue
.
.
;
run;
data converted_data;
set test_set;
/* Create an array to store all column names */
array _all_vars {*} _ALL_;
/* Loop through the array starting from the second column (skip the first one) */
do i = 2 to dim(_all_vars);
_all_vars{i} = input(_all_vars{i}, 4.); /* Convert to numeric */
end;
drop i;
run;
I'm sure the solution is simple but I couldn't find it. Does anyone know why this doesn't work or another way to approach this? I need this to be dynamic where the only thing I know is the first column is character but I can't use the name of the columns themselves.
I'm trying to convert all my columns to numeric except the first column which I want to keep character. They are currently character-based and are missing. I tried the following code below.
/*fake example to show*/
data test_set;
input color $ char1 $ char2 $;
datalines;
blue
.
.
;
run;
data converted_data;
set test_set;
/* Create an array to store all column names */
array _all_vars {*} _ALL_;
/* Loop through the array starting from the second column (skip the first one) */
do i = 2 to dim(_all_vars);
_all_vars{i} = input(_all_vars{i}, 4.); /* Convert to numeric */
end;
drop i;
run;
I'm sure the solution is simple but I couldn't find it. Does anyone know why this doesn't work or another way to approach this? I need this to be dynamic where the only thing I know is the first column is character but I can't use the name of the columns themselves.
Share Improve this question asked Mar 14 at 19:18 Health_Code13Health_Code13 934 bronze badges 2- "they are currently character and missing" By that do you mean that the character variables have missing values in every observation? If so then just DROP them and then you can make new variables re-using the names. – Tom Commented Mar 14 at 21:34
- How did you get in this situation? My guess is you used PROC IMPORT. If you are reading in CSV files then switch to writing your own data step to read the file. – Tom Commented Mar 14 at 21:40
2 Answers
Reset to default 3You cannot change the TYPE of an existing variable. You will need to make NEW variables.
data converted_data;
set test_set;
nchar1 = input(char1,32.);
nchar2 = input(char2,32.);
run;
You could then rename all of them. Or drop the character variables and just rename the numeric ones.
data converted_data;
set test_set;
nchar1 = input(char1,32.);
nchar2 = input(char2,32.);
drop char1 char2;
rename nchar1=char1 nchar2=char2;
run;
To avoid having to type the names for a lot of variables you will need to use some code generation. You might be able to do it with macro variables and/or macro logic. But you could also just do it with simple SAS data step that writes the generated code to a file which can then be executed by using the %INCLUDE command.
proc contents data=test_set noprint out=contents; run;
proc sort data=contents; by varnum; run;
filename code temp;
data _null_;
file code;
set contents;
if varnum > 1 and type=2 ;
length nliteral $64;
nliteral=nliteral(name);
put '__' varnum '=input(' nliteral ',32.);'
/ 'drop ' nliteral ';'
/ 'rename __' varnum '=' nliteral ';'
;
run;
data converted_data;
set test_set;
%include code / source2 ;
run;
Which for your example results in a SAS log like this:
25 data converted_data;
26 set test_set;
27 %include code / source2 ;
NOTE: %INCLUDE (level 1) file CODE is file ...\#LN00156.
28 +__2 =input(char1 ,32.);
29 +drop char1 ;
30 +rename __2 =char1 ;
31 +__3 =input(char2 ,32.);
32 +drop char2 ;
33 +rename __3 =char2 ;
NOTE: %INCLUDE (level 1) ending.
34 run;
NOTE: There were 1 observations read from the data set WORK.TEST_SET.
NOTE: The data set WORK.CONVERTED_DATA has 1 observations and 3 variables.
There are a few issues here:
- an array can only contain variables of the same type: no mixing of characters and numerics.
- you cannot change the type of a variable like that.
var = input(var, 4.)
will not turn a character numeric, it expects a numeric variable on the left-hand side and a character on the right-hand side butvar
can only have one type.
I'll add one more to that:
- there is as far as I know no way to create variables dynamically in a data step at runtime, meaning you can't rely on for example
dim()
or[*]
to figure out the number of variables you'll need.
A solution therefore is:
- Figure out how many variables you need to convert to pre-size the array
(alternatively: overdimension and drop later). - Create new numeric variables and fill those in from the characters.
/* Many ways to do this -- let's count variables using SASHELP.VCOLUMN */
proc sql noprint;
select count(distinct name) into :NCHARVAR from sashelp.vcolumn
where libname eq "WORK" and memname eq "TEST_SET" and type eq "char";
quit;
data converted_data;
set test_set;
array ALL_CHAR [*] _CHARACTER_;
array TO_NUM [&NCHARVAR]; /* Could give them a different name here. */
do i = 2 to dim(ALL_CHAR);
/* Consider using e.g. ??best. to handle invalid values. */
TO_NUM[i] = input(ALL_CHAR[i], 4.);
end;
drop i;
/* Optional: drop the first numeric placeholder (or fill from i-1) */
drop TO_NUM1;
run;