最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

dask - Use Python streamz to parallel process many realtime updating files? - Stack Overflow

programmeradmin0浏览0评论

Using Python streamz and dask, I want to distribute the data of textfiles that are generated to threads. Which then will process every newline generated inside those files.

from streamz import Stream
source = Stream.filenames(r'C:\Documents\DataFiles\*.txt')
# recv = source.scatter().map(print)
recv = source.map(Stream.from_textfile)
recv.start()

In the above code how will I get the data coming from the streamz of the text files? I access and make dataframes for each text files like this. But cannot join this stream of executions nor use a dask distributed to scatter the work.

from streamz import Stream
recv = Stream.from_textfile(r'C:\Users\DataFiles\stream_output.txt')
import pandas as pd
dataframe = recv.partition(10).map(pd.DataFrame)
发布评论

评论列表(0)

  1. 暂无评论