最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Regex based split column to table in snowflake - Stack Overflow

programmeradmin11浏览0评论

I am using a snowflake query to split the data into multiple columns based on the input string from the column of a table. I am not getting the expected output. Need some help:

Lets just say the input is :

Activity type [DP - mcr modifyand quac endo; bio fert]; PharmSon BT acticity code [AYx765]

Output should be:

Activity type in column1.
DP - mcr modifyand quac endo; bio fert in column2

PharmSon BT acticity code in column1
AYx765 in column2

instead i am getting output as :

Activity type  in column1-- column2 is null
bio fert] in column1--column2 is null
PharmSon BT acticity code in column1--AYx765 in column2
WITH parsed_data AS (
    -- Split FILATT data by semicolons and flatten the array into rows
    SELECT
        INTRNL AS EQ_INTRNL,
        'DEV' AS ENVIRONMENT,
        FILATT AS raw_data, 
        SPLIT(FILATT, ';') AS data_array -- Split the data by ';'
    FROM table1
),

exploded_data AS (
    -- Flatten the array so each item is in a separate row
    SELECT
        EQ_INTRNL,
        ENVIRONMENT,
        raw_data,
        TRIM(value) AS part, -- Each value is now in 'part'
        ROW_NUMBER() OVER (PARTITION BY EQ_INTRNL ORDER BY CURRENT_TIMESTAMP()) AS column1 -- Assign a sequential number, reset for each EQ_INTRNL
    FROM parsed_data,
    LATERAL FLATTEN(input => data_array)
),

extracted_columns AS (
    SELECT
        EQ_INTRNL,
        ENVIRONMENT,
        column1, -- The sequential number for the row
        -- Extract the identifier part (before '[')
        REGEXP_SUBSTR(part, '^[^\\[]+') AS column2,
        -- Extract the content inside brackets, excluding the brackets themselves
        REGEXP_SUBSTR(part, '\\[([^\\]]+)\\]', 1, 1, 'e') AS column3
    FROM exploded_data
)

SELECT
    EQ_INTRNL,
    column2,
    column3,
    ENVIRONMENT,
    column1
FROM extracted_columns 
WHERE column3 !=''
ORDER BY EQ_INTRNL, column1;

I am using a snowflake query to split the data into multiple columns based on the input string from the column of a table. I am not getting the expected output. Need some help:

Lets just say the input is :

Activity type [DP - mcr modifyand quac endo; bio fert]; PharmSon BT acticity code [AYx765]

Output should be:

Activity type in column1.
DP - mcr modifyand quac endo; bio fert in column2

PharmSon BT acticity code in column1
AYx765 in column2

instead i am getting output as :

Activity type  in column1-- column2 is null
bio fert] in column1--column2 is null
PharmSon BT acticity code in column1--AYx765 in column2
WITH parsed_data AS (
    -- Split FILATT data by semicolons and flatten the array into rows
    SELECT
        INTRNL AS EQ_INTRNL,
        'DEV' AS ENVIRONMENT,
        FILATT AS raw_data, 
        SPLIT(FILATT, ';') AS data_array -- Split the data by ';'
    FROM table1
),

exploded_data AS (
    -- Flatten the array so each item is in a separate row
    SELECT
        EQ_INTRNL,
        ENVIRONMENT,
        raw_data,
        TRIM(value) AS part, -- Each value is now in 'part'
        ROW_NUMBER() OVER (PARTITION BY EQ_INTRNL ORDER BY CURRENT_TIMESTAMP()) AS column1 -- Assign a sequential number, reset for each EQ_INTRNL
    FROM parsed_data,
    LATERAL FLATTEN(input => data_array)
),

extracted_columns AS (
    SELECT
        EQ_INTRNL,
        ENVIRONMENT,
        column1, -- The sequential number for the row
        -- Extract the identifier part (before '[')
        REGEXP_SUBSTR(part, '^[^\\[]+') AS column2,
        -- Extract the content inside brackets, excluding the brackets themselves
        REGEXP_SUBSTR(part, '\\[([^\\]]+)\\]', 1, 1, 'e') AS column3
    FROM exploded_data
)

SELECT
    EQ_INTRNL,
    column2,
    column3,
    ENVIRONMENT,
    column1
FROM extracted_columns 
WHERE column3 !=''
ORDER BY EQ_INTRNL, column1;
Share Improve this question edited Jan 8 at 16:37 neeru0303 1799 bronze badges asked Dec 11, 2024 at 9:41 Jeet ChatterjeeJeet Chatterjee 112 bronze badges 1
  • Please edit the title of your question to be descriptive, unambiguous, and specific to what you are asking. For more guidance, see How do I write a good title?. – DarkBee Commented Dec 11, 2024 at 9:44
Add a comment  | 

1 Answer 1

Reset to default 0

This should be possible using UDTF - you can use Javascript or Python-based

This should be dynamic enough for multiple rows 1+



create or replace function dev._neeru.split_data(data varchar)
returns table (data_part varchar, col1 varchar, col2 varchar)
language python
RUNTIME_VERSION = 3.9
handler='SplitData'
as $$
import re

class SplitData:
    def process(self, data):
      data_parts = re.findall(r'.*?\];?', data)
      for data_part in data_parts:
          pattern = repile(r'(.*?)\[(.*?)\]')
          cols = pattern.findall(data_part)
          yield (data_part, cols[0][0], cols[0][1])
$$;


with details as (
    select 'Activity type [DP - mcr modifyand quac endo; bio fert]; PharmSon BT acticity code [AYx765]' as raw_data
    union all
    select 'Name [John]; Age [15]' as raw_data
    union all
    select 'Name [Jack]' as raw_data
    union all
    select 'Name [Jack]; Activity type [DP - mcr modifyand quac endo; bio fert];  Age [15]' as raw_data
)
select
    raw_data,
    trim(col1) as col1,
    trim(col2) as col2
from
details,
table ( dev._neeru.split_data(details.raw_data) );

+------------------------------------------------------------------------------------------+-------------------------+--------------------------------------+
|RAW_DATA                                                                                  |COL1                     |COL2                                  |
+------------------------------------------------------------------------------------------+-------------------------+--------------------------------------+
|Activity type [DP - mcr modifyand quac endo; bio fert]; PharmSon BT acticity code [AYx765]|Activity type            |DP - mcr modifyand quac endo; bio fert|
|Activity type [DP - mcr modifyand quac endo; bio fert]; PharmSon BT acticity code [AYx765]|PharmSon BT acticity code|AYx765                                |
|Name [John]; Age [15]                                                                     |Name                     |John                                  |
|Name [John]; Age [15]                                                                     |Age                      |15                                    |
|Name [Jack]                                                                               |Name                     |Jack                                  |
|Name [Jack]; Activity type [DP - mcr modifyand quac endo; bio fert];  Age [15]            |Name                     |Jack                                  |
|Name [Jack]; Activity type [DP - mcr modifyand quac endo; bio fert];  Age [15]            |Activity type            |DP - mcr modifyand quac endo; bio fert|
|Name [Jack]; Activity type [DP - mcr modifyand quac endo; bio fert];  Age [15]            |Age                      |15                                    |
+------------------------------------------------------------------------------------------+-------------------------+--------------------------------------+



发布评论

评论列表(0)

  1. 暂无评论