I'm trying to split street name, house number, and box number from a String.
Let's say the string is "SomeStreet 59A"
For this case I already have a solution with regex. I'm using this function:
address.split(/([0-9]+)/) //output ["SomeStreet","59","A"]
The problem I'm having now, is that some addresses have weird formats. Meaning, the above method does not fit for strings like:
"Somestreet 59-65" // output ["SomeStreet", "59", "-", "65"] Not good
My question for this case is, how to group the numbers to get this desired output:
["Somestreet", "59-65"]
Another weird example is:
"6' SomeStreet 59" // here "6' Somestreet" is the exact street-name.
Expected output: ["6' Somestreet", "59"]
"6' Somestreet 324/326 A/1" // Example with box number
Expected output: ["6' Somestreet", "324/326", "A/1"]
Bear in mind that this has to be in one executable function to loop through all of the addresses that I have.
I'm trying to split street name, house number, and box number from a String.
Let's say the string is "SomeStreet 59A"
For this case I already have a solution with regex. I'm using this function:
address.split(/([0-9]+)/) //output ["SomeStreet","59","A"]
The problem I'm having now, is that some addresses have weird formats. Meaning, the above method does not fit for strings like:
"Somestreet 59-65" // output ["SomeStreet", "59", "-", "65"] Not good
My question for this case is, how to group the numbers to get this desired output:
["Somestreet", "59-65"]
Another weird example is:
"6' SomeStreet 59" // here "6' Somestreet" is the exact street-name.
Expected output: ["6' Somestreet", "59"]
"6' Somestreet 324/326 A/1" // Example with box number
Expected output: ["6' Somestreet", "324/326", "A/1"]
Bear in mind that this has to be in one executable function to loop through all of the addresses that I have.
Share Improve this question edited Apr 1, 2021 at 7:23 Roko C. Buljan 207k41 gold badges328 silver badges340 bronze badges asked Mar 30, 2021 at 8:17 YorbjörnYorbjörn 4561 gold badge6 silver badges26 bronze badges 2- 4 There are so many different forms of street addresses, trying to e up with a simple function to parse them is futile. – Barmar Commented Mar 30, 2021 at 8:19
-
1
Try
.split(/\s*(\d+(?!['’\d])(?:-\d+)?)/)
(see demo) if all acceptable formats are those you listed in the question. – Wiktor Stribiżew Commented Mar 30, 2021 at 8:19
3 Answers
Reset to default 4To support all string formats listed in the question, you can use
.match(/^(.*?)\s+(\d+(?:[-.\/]\d+)?)(?:\s*(\S.*))?$/)
.match(/^(.*)\s+(\d+(?:[-.\/]\d+)?)(?:\s*(\S.*))?$/)
See the regex demo.
Details:
^
- start of string(.*?)
- Group 1: any zero or more chars other than line break chars, as few as possible (if you need to match the last number as Group 2, theNumber
, you need to use.*
, a greedy variant)\s+
- one or more whitespaces(\d+(?:[-.\/]\d+)?)
- Group 2: one or more digits optionally followed with-
/.
//
and then one or more digits(?:\s*(\S.*))?
- an optional occurrence of zero or more whitespaces and - Group 3 - a non-whitespace char and the rest of the string$
- end of string.
See a JavaScript demo:
const texts = ['SomeStreet 59A','Somestreet 59-65',"6' SomeStreet 59", 'Somestreet 1.1', 'Somestreet 65 A/1', "6' Somestreet 324/326 A/1"];
const rx = /^(.*?)\s+(\d+(?:[-.\/]\d+)?)(?:\s*(\S.*))?$/;
for (const text of texts) {
const [_, street, number, box] = text.match(rx);
console.log(text, '=>', {"Street":street, "Number":number, "Box":box});
}
If you don't mind a bit of string trimming afterwards, here's a solution:
.split(/(?= \d|\D+$)/)
or to account also for 65 A/1
or 324/326 A/1
.split(/(?= \d|\D+$|(?<!\D) )/)
Regex101. demo
[
"Some Street 59A",
"Some Street 59-69",
"Some Street 1.1",
"6' Street 45b",
"6' Some street 324/326 A/1",
"Some Street 65 A/1",
"42th Stack ave. 59-69",
].forEach(str => console.log( str.split(/(?= \d|\D+$|(?<!\D) )/) ));
If you want to keep the number i.e: 59A
as a whole, here's another simple solution:
.split(/(?= \d| [\w\d/]+$)/);
Regex101. demo
[
"Some Street 59A",
"Some Street 59-69",
"Some Street 1.1",
"6' Street 45b",
"6' Some street 324/326 A/1",
"Some Street 65 A/1",
"42th Stack ave. 59-69",
].forEach(str => console.log( str.split(/(?= \d| [\w\d/]+$)/) ));
<xsl:stylesheet xmlns:xsl="http://www.w3/1999/XSL/Transform" version="1.0">
<!-- Template for matching the root element -->
<xsl:template match="/">
<!-- Call the split-address template with the full address -->
<xsl:call-template name="split-address">
<xsl:with-param name="address" select="/root/address" />
</xsl:call-template>
</xsl:template>
<!-- Template to split the address into lines -->
<xsl:template name="split-address">
<xsl:param name="address" />
<xsl:choose>
<!-- If the length of the address is less than or equal to 35 characters, output it directly -->
<xsl:when test="string-length($address) <= 35">
<xsl:value-of select="$address" />
</xsl:when>
<!-- Otherwise, find the last space before the 35th character and split the line -->
<xsl:otherwise>
<!-- Get the substring of the address up to the 35th character -->
<xsl:variable name="substring" select="substring($address, 1, 35)" />
<!-- Find the last space before the 35th character -->
<xsl:variable name="split-pos" select="string-length(substring-before($substring, ' '))" />
<!-- Output the first part of the address -->
<xsl:value-of select="substring($address, 1, $split-pos)" />
<!-- Add a line break -->
<xsl:text> </xsl:text>
<!-- Recursively call the template with the remaining address -->
<xsl:call-template name="split-address">
<xsl:with-param name="address" select="substring-after($address, substring($address, 1, $split-pos))" />
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>