I am trying to write a function that reads the data in it using transform decorator and return the dataframe to the calling module.
The data_read
defines data_output
as a nested function inside it and applies the @transform
decorator but the approach doesn't work.
I tried the below approach and it isn’t working. Can someone let me know what I am doing wrong. I added the last three lines just to check
def data_read(rid, output_file):
@transform(
output_fi=Output(output_file),
src_input=Input('{rid}'.format(rid=rid))
)
def data_output(output_fi, src_input):
df = src_input.dataframe()
return df
return data_output
I am trying to write a function that reads the data in it using transform decorator and return the dataframe to the calling module.
The data_read
defines data_output
as a nested function inside it and applies the @transform
decorator but the approach doesn't work.
I tried the below approach and it isn’t working. Can someone let me know what I am doing wrong. I added the last three lines just to check
def data_read(rid, output_file):
@transform(
output_fi=Output(output_file),
src_input=Input('{rid}'.format(rid=rid))
)
def data_output(output_fi, src_input):
df = src_input.dataframe()
return df
return data_output
Share
Improve this question
edited Mar 6 at 16:45
steve-ed
3392 silver badges10 bronze badges
asked Mar 3 at 12:28
Newuser_202126Newuser_202126
112 bronze badges
1
|
1 Answer
Reset to default 2Where are you trying to call this function from ? Workshop ?
This is not a valid syntax unfortunately.
High level, you can use code repository to write code. This code can do a lot of things:
- be registered as a transform (1 to N datasets in and 1 to N datasets out, defines how to transition from datasets to datasets by transforming the data). You can as well consume/create models, mediasets, trigger API calls, etc. When you chain transforms (A > B > C ...) you essentially obtain a data pipeline which does things one after each other (you need to schedule the different transform to take care of the orchestration of those)
- be registered as functions (little code snippet, that can be called from different places). There are 2 big flavor of functions:
- Functions to call from Workshop. Can be written in typescript, python, ... You can think of it like AWS lambdas but with full access to the rest of the platform (Models, APIs, Ontology, ...). Once you write and register such function, you can call it anytime from many places (Actions, Workshop, Slate, from other functions, AIP Logic, ...)
- Functions (UDFs) to call from Pipeline Builder. It needs to be written in python. Here the goal is to rather define a bit of logic that is reusable from no-code/low-code interfaces (like Pipeline Builder) to itself defines pipeline logic.
Now as to what you did:
I believe you wrote a function (like Functions to call from Workshop
) which itself defines transform, which unfortunately doesn't fit in the list of above "possibilities".
What is the closest that would work ?
I believe your goal was to process some dataframe and get back the result in some UI (e.g. in Workshop).
Unfortunately, it is not directly possible. There are good reasons there: one being latency, as your compute could be very well intense or the datase very large and take a lot of time to process. You would as well maybe not like this to be processed for each and every user opening your app, as this would trigger compute "per user", etc.
What you can do is to sync your dataset as an object (which should be a reflection of your business "real things" (Customers, Orders, etc.) and then you will be able to display that from your applications (e.g. Workshop)
If your goal was to define a UDF to use in Pipeline Builder, then you should write a function that operates at row level.
Docs for functions - https://www.palantir/docs/foundry/functions/overview
it isn’t working
? Do you get error message? Show full message in question. Do you get wrong data? Show it in question. – furas Commented Mar 3 at 23:01