I'm trying to query a table to find all instances where a character repeats at least 5 times in a row.
I've tried:
Select Column
From Table
where Column REGEXP '(.)\1{4,}'
but it returns nothing.
The table includes the following entries that SHOULD be returned:
1.111111111111E31
00000000000000000
xxxxxxxxxxxxxxxxx
I'm trying to query a table to find all instances where a character repeats at least 5 times in a row.
I've tried:
Select Column
From Table
where Column REGEXP '(.)\1{4,}'
but it returns nothing.
The table includes the following entries that SHOULD be returned:
1.111111111111E31
00000000000000000
xxxxxxxxxxxxxxxxx
Share
Improve this question
edited Mar 28 at 18:32
Lukasz Szozda
177k26 gold badges273 silver badges314 bronze badges
asked Mar 28 at 17:00
KingTerrorKingTerror
512 bronze badges
6
|
Show 1 more comment
2 Answers
Reset to default 4Using backreferences
Snowflake does not support backreferences in regular expression patterns (known as “squares” in formal language theory); however, backreferences are supported in the replacement string of the REGEXP_REPLACE function.
Second when the backslash is used inside single-quoted string it has to be escaped:
In single-quoted string constants, you must escape the backslash character in the backslash-sequence. For example, to specify
\d
, use\\d
Possible workaround is to write a custom UDF:
CREATE OR REPLACE FUNCTION regexp_test(r TEXT, t TEXT)
RETURNS BOOLEAN
LANGUAGE PYTHON
STRICT
RUNTIME_VERSION = '3.12'
HANDLER = 'main'
AS $$
import re
def main(r,t):
return bool(re.search(r,t))
$$;
WITH cte(Col) AS (
SELECT '1.111111111111E31' UNION ALL
SELECT '00000000000000000' UNION ALL
SELECT 'xxxxxxxxxxxxxxxxx' UNION ALL
SELECT '12345678'
)
SELECT *, regexp_test('(.)\\1{4,}', col)
FROM cte;
Output:
+-------------------+--------------------------------+
| COL | REGEXP_TEST('(.)\\1{4,}', COL) |
+-------------------+--------------------------------+
| 1.111111111111E31 | TRUE |
| 00000000000000000 | TRUE |
| xxxxxxxxxxxxxxxxx | TRUE |
| 12345678 | FALSE |
+-------------------+--------------------------------+
I would like to propose a pure SQL solution. A MySQL dialect was used, but I think it would not be difficult to rewrite the query in Snowflake.
WITH recursive
cte(col) AS (
SELECT '' UNION ALL
SELECT 'aaaa' UNION ALL
SELECT 'aaaaa' UNION ALL
SELECT 'zaaaaa' UNION ALL
SELECT '1.111111111111E31' UNION ALL
SELECT '00000000000000000' UNION ALL
SELECT 'xxxxxxxxxxxxxxxxx' UNION ALL
SELECT '12345678'
),
rcte(col, pos, fnd) as (
select
col, 1,
case
when length(col) >= 5 and
substr(col, 1, 5) = repeat(substr(col, 1, 1), 5) then 1
else 0
end
from cte
union all
select
col,
pos + 1,
case substr(col, pos + 1, 5)
when repeat(substr(col, pos + 1, 1), 5) then 1
else 0
end
from rcte
where fnd = 0 and length(col) - pos >= 5
)
SELECT col
FROM rcte
where fnd = 1;
Result:
+-------------------+
| col |
+-------------------+
| aaaaa |
| 00000000000000000 |
| xxxxxxxxxxxxxxxxx |
| zaaaaa |
| 1.111111111111E31 |
+-------------------+
Try it on db<>fiddle.
'(.)\\1{4}'
. – Barmar Commented Mar 28 at 18:03