最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Check units of pandas DateTime index with pandera - Stack Overflow

programmeradmin2浏览0评论

I am new to pandera and am still learning how it works. What is the easiest way to check that the datetime units of an index are in nanoseconds and not milliseconds?

In a perfect world, I am looking for compact declarations of this check inside of the class-based API definitions. If the solution attempt 2 is the best way of doing this, I will be happy with this answer as I am looking for a more experienced perspective.

Solution Attempt 1

First I tried the way that looked as intuitive after studying the docs, but this did not produce the desired result. The index does not cause a schema error.

import pandas as pd
import pandera as pa
from pandera import DataFrameModel, Field
from pandera.typing import Index
from pandera.engines import pandas_engine

class DateIndexSchema(DataFrameModel):
    date: Index[pandas_engine.DateTime] = Field(nullable=False, dtype_kwargs={'unit': 'ns'})

df_wrong_index_type = pd.DataFrame(
    {'value': [100, 200]},
    index=pd.to_datetime(['2023-01-01', '2023-01-02']).astype('datetime64[ms]'),
)

DateIndexSchema.validate(df_wrong_index_type)

Solution Attempt 2

This solution works as expected, but it feels a bit verbose and makes me feel that I am missing something obvious.

class DateIndexSchemaThrow(DataFrameModel):
    date: Index[pandas_engine.DateTime] = Field(nullable=False)

    @pa.dataframe_check
    def index_should_be_in_ns(cls, dataframe: pd.DataFrame) -> bool:
        if dataframe.index.dtype != "datetime64[ns]":
            return False
        return True

I am new to pandera and am still learning how it works. What is the easiest way to check that the datetime units of an index are in nanoseconds and not milliseconds?

In a perfect world, I am looking for compact declarations of this check inside of the class-based API definitions. If the solution attempt 2 is the best way of doing this, I will be happy with this answer as I am looking for a more experienced perspective.

Solution Attempt 1

First I tried the way that looked as intuitive after studying the docs, but this did not produce the desired result. The index does not cause a schema error.

import pandas as pd
import pandera as pa
from pandera import DataFrameModel, Field
from pandera.typing import Index
from pandera.engines import pandas_engine

class DateIndexSchema(DataFrameModel):
    date: Index[pandas_engine.DateTime] = Field(nullable=False, dtype_kwargs={'unit': 'ns'})

df_wrong_index_type = pd.DataFrame(
    {'value': [100, 200]},
    index=pd.to_datetime(['2023-01-01', '2023-01-02']).astype('datetime64[ms]'),
)

DateIndexSchema.validate(df_wrong_index_type)

Solution Attempt 2

This solution works as expected, but it feels a bit verbose and makes me feel that I am missing something obvious.

class DateIndexSchemaThrow(DataFrameModel):
    date: Index[pandas_engine.DateTime] = Field(nullable=False)

    @pa.dataframe_check
    def index_should_be_in_ns(cls, dataframe: pd.DataFrame) -> bool:
        if dataframe.index.dtype != "datetime64[ns]":
            return False
        return True
Share Improve this question edited Mar 13 at 19:09 J.K. asked Mar 13 at 13:30 J.K.J.K. 1,6151 gold badge15 silver badges26 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

I found a solution that suits my application. One can register custom checks with pandera.extensions, which allows compact class-based API declarations inside of the Field constructor and in the Config overrides.

import pandas as pd
from pandas.api.types import is_datetime64_ns_dtype

from pandera import DataFrameModel, Field
from pandera.typing import Index
import pandera.extensions as extensions

extensions.register_check_method(is_datetime64_ns_dtype)

class DateIndexSchema(DataFrameModel):
    date: Index[pd.Timestamp] = Field(nullable=False, is_datetime64_ns_dtype=())

df_wrong_index_type = pd.DataFrame(
    {'value': [100, 200]},
    index=pd.to_datetime(['2023-01-01', '2023-01-02']).astype('datetime64[ms]'),
)

DateIndexSchema.validate(df_wrong_index_type)

Now the SchemaError is reliably thrown when the datetime is not in nanoseconds.

发布评论

评论列表(0)

  1. 暂无评论