I am trying to select rows from a Julia DataFrame
according to a query, and to update a single column.
So far I tried two methods, using filter
and select
. However, in both cases these appear to return a copy of the original dataframe, and therefore changing a value in these objects does not change the original dataframe data.
A short example:
An example dataframe has 4 columns. The first 3 columns are A
, B
and C
. These are used to query the dataframe rows. The final column is enabled
which is either true
or false
.
using DataFrames
df = DataFrame(A=String[], B=String[], C=String[], enabled=Bool[])
push!(df, Dict(:A=>"A", :B=>"B", :C=>"C", :enabled=>true)
# one method, which does not work
selected_rows = subset(df, [:A, :B, :C] => (A, B, C)->(A=="A", B=="B", C=="C"))
selected_rows[:enabled] = false
# another method, which also does not work
filtered_rows = filter([:A, :B, :C] => (A, B, C)->(A=="A", B=="B", C=="C"), df)
filtered_rows[:enabled] = false
I suspect no method similar to this will work. I tried to
- filter the dataframe by some column values
- modify the object returned by a filtering operation
However it is difficult to see how a filtering operation would return anything other than a copy of the original dataframe. I don't know of a way to get a "view" object out of a filtering operation.
Therefore, I suspect the way this has to be done would be to
- find the index of rows matching a filter query
- update the values of a column using the index
However I don't know how to do that either.
I have looked at the Julia DataFrames documentation and cannot find any functions in the API which seem like they might provide the behavior I am looking for.
I am trying to select rows from a Julia DataFrame
according to a query, and to update a single column.
So far I tried two methods, using filter
and select
. However, in both cases these appear to return a copy of the original dataframe, and therefore changing a value in these objects does not change the original dataframe data.
A short example:
An example dataframe has 4 columns. The first 3 columns are A
, B
and C
. These are used to query the dataframe rows. The final column is enabled
which is either true
or false
.
using DataFrames
df = DataFrame(A=String[], B=String[], C=String[], enabled=Bool[])
push!(df, Dict(:A=>"A", :B=>"B", :C=>"C", :enabled=>true)
# one method, which does not work
selected_rows = subset(df, [:A, :B, :C] => (A, B, C)->(A=="A", B=="B", C=="C"))
selected_rows[:enabled] = false
# another method, which also does not work
filtered_rows = filter([:A, :B, :C] => (A, B, C)->(A=="A", B=="B", C=="C"), df)
filtered_rows[:enabled] = false
I suspect no method similar to this will work. I tried to
- filter the dataframe by some column values
- modify the object returned by a filtering operation
However it is difficult to see how a filtering operation would return anything other than a copy of the original dataframe. I don't know of a way to get a "view" object out of a filtering operation.
Therefore, I suspect the way this has to be done would be to
- find the index of rows matching a filter query
- update the values of a column using the index
However I don't know how to do that either.
I have looked at the Julia DataFrames documentation and cannot find any functions in the API which seem like they might provide the behavior I am looking for.
Share Improve this question asked Mar 12 at 15:04 user2138149user2138149 17.7k30 gold badges149 silver badges296 bronze badges 2 |1 Answer
Reset to default 5The Julia DataFrames.jl documentation has a section called getindex
and view
which spells out which methods return a copy of the selected columns or subset rows and which return a view into the data. In addition, the documentation for subset
details the view
keyword argument which might be of interest to you.
You have several options for the kind of operation you are trying to do. Here are some examples:
using DataFrames
reset() = DataFrame(A=["A","X"], B=["B","X"], C=["C","X"], enabled=[true,false])
# using simple indexing and @view
df = reset()
selected_rows = @view df[(df.A .== "A") .& (df.B .== "B") .& (df.C .== "C"), :]
@assert all(selected_rows.enabled .== true)
selected_rows.enabled .= false
@assert all(selected_rows.enabled .== false)
@assert all(df.enabled .== false) # original DataFrame modified
# using subset and view=true
df = reset()
selected_rows = subset(df, :A => ByRow(==("A")), :B => ByRow(==("B")), :C => ByRow(==("C")), view=true)
selected_rows.enabled .= false
@assert all(df.enabled .== false) # original DataFrame modified
df[df.A .== "A" .&& df.B .== "B" .&& df.C .== "C", "enabled"] .= false
does what you want? – Andre Wildberg Commented Mar 12 at 18:16