I am new to pandera
and am still learning how it works. What is the easiest way to check that the datetime
units of an index are in nanoseconds and not milliseconds?
In a perfect world, I am looking for compact declarations of this check inside of the class-based API definitions. If the solution attempt 2 is the best way of doing this, I will be happy with this answer as I am looking for a more experienced perspective.
Solution Attempt 1
First I tried the way that looked as intuitive after studying the docs, but this did not produce the desired result. The index does not cause a schema error.
import pandas as pd
import pandera as pa
from pandera import DataFrameModel, Field
from pandera.typing import Index
from pandera.engines import pandas_engine
class DateIndexSchema(DataFrameModel):
date: Index[pandas_engine.DateTime] = Field(nullable=False, dtype_kwargs={'unit': 'ns'})
df_wrong_index_type = pd.DataFrame(
{'value': [100, 200]},
index=pd.to_datetime(['2023-01-01', '2023-01-02']).astype('datetime64[ms]'),
)
DateIndexSchema.validate(df_wrong_index_type)
Solution Attempt 2
This solution works as expected, but it feels a bit verbose and makes me feel that I am missing something obvious.
class DateIndexSchemaThrow(DataFrameModel):
date: Index[pandas_engine.DateTime] = Field(nullable=False)
@pa.dataframe_check
def index_should_be_in_ns(cls, dataframe: pd.DataFrame) -> bool:
if dataframe.index.dtype != "datetime64[ns]":
return False
return True
I am new to pandera
and am still learning how it works. What is the easiest way to check that the datetime
units of an index are in nanoseconds and not milliseconds?
In a perfect world, I am looking for compact declarations of this check inside of the class-based API definitions. If the solution attempt 2 is the best way of doing this, I will be happy with this answer as I am looking for a more experienced perspective.
Solution Attempt 1
First I tried the way that looked as intuitive after studying the docs, but this did not produce the desired result. The index does not cause a schema error.
import pandas as pd
import pandera as pa
from pandera import DataFrameModel, Field
from pandera.typing import Index
from pandera.engines import pandas_engine
class DateIndexSchema(DataFrameModel):
date: Index[pandas_engine.DateTime] = Field(nullable=False, dtype_kwargs={'unit': 'ns'})
df_wrong_index_type = pd.DataFrame(
{'value': [100, 200]},
index=pd.to_datetime(['2023-01-01', '2023-01-02']).astype('datetime64[ms]'),
)
DateIndexSchema.validate(df_wrong_index_type)
Solution Attempt 2
This solution works as expected, but it feels a bit verbose and makes me feel that I am missing something obvious.
class DateIndexSchemaThrow(DataFrameModel):
date: Index[pandas_engine.DateTime] = Field(nullable=False)
@pa.dataframe_check
def index_should_be_in_ns(cls, dataframe: pd.DataFrame) -> bool:
if dataframe.index.dtype != "datetime64[ns]":
return False
return True
Share
Improve this question
edited Mar 13 at 19:09
J.K.
asked Mar 13 at 13:30
J.K.J.K.
1,6151 gold badge15 silver badges26 bronze badges
1 Answer
Reset to default 0I found a solution that suits my application. One can register custom checks with pandera.extensions
, which allows compact class-based API declarations inside of the Field
constructor and in the Config
overrides.
import pandas as pd
from pandas.api.types import is_datetime64_ns_dtype
from pandera import DataFrameModel, Field
from pandera.typing import Index
import pandera.extensions as extensions
extensions.register_check_method(is_datetime64_ns_dtype)
class DateIndexSchema(DataFrameModel):
date: Index[pd.Timestamp] = Field(nullable=False, is_datetime64_ns_dtype=())
df_wrong_index_type = pd.DataFrame(
{'value': [100, 200]},
index=pd.to_datetime(['2023-01-01', '2023-01-02']).astype('datetime64[ms]'),
)
DateIndexSchema.validate(df_wrong_index_type)
Now the SchemaError
is reliably thrown when the datetime is not in nanoseconds.