I am pretty new to Regex and am trying to learn.
I am creating a mail merge tool and would like to use RegExp
to give me more flexibility and control. One of the placeholders that I replace is pany_name
.
I have a list of panies. Many have the pany type in their name (e.g. My Company , Inc., or My Company LLC). I would like to use regex to standardize the results. However, I am not sure how to write it, other than to manually list each and every option. For example, each of these names should result in the same value at the end:
- My Company LLC
- My Company, LLC
- My Company, Inc.
- My Company, Inc
- MY Company Inc.
- My Company Inc
- My Company Co
- My Company
And on and on...
I believe I can use this to achieve my desired results:
var panyName = leadpany_name;
panyName = panyName.replace(/(, Inc.)|( Inc.)|(, LLC)/gi, '');
However, I was hoping there is a more efficient way to:
- Capture the variations
- Ensure the pany type is always at the end
- Include mas and periods if they exist, but not have to list all options with and without
CAUTION:
I have to account for the possibility of the pany type characters existing in the actual name (e.g. My Co
mpany Co
) and only remove the organization type at the end.
Can this be done easily?
I am pretty new to Regex and am trying to learn.
I am creating a mail merge tool and would like to use RegExp
to give me more flexibility and control. One of the placeholders that I replace is pany_name
.
I have a list of panies. Many have the pany type in their name (e.g. My Company , Inc., or My Company LLC). I would like to use regex to standardize the results. However, I am not sure how to write it, other than to manually list each and every option. For example, each of these names should result in the same value at the end:
- My Company LLC
- My Company, LLC
- My Company, Inc.
- My Company, Inc
- MY Company Inc.
- My Company Inc
- My Company Co
- My Company
And on and on...
I believe I can use this to achieve my desired results:
var panyName = lead.pany_name;
panyName = panyName.replace(/(, Inc.)|( Inc.)|(, LLC)/gi, '');
However, I was hoping there is a more efficient way to:
- Capture the variations
- Ensure the pany type is always at the end
- Include mas and periods if they exist, but not have to list all options with and without
CAUTION:
I have to account for the possibility of the pany type characters existing in the actual name (e.g. My Co
mpany Co
) and only remove the organization type at the end.
Can this be done easily?
Share Improve this question edited Apr 5, 2017 at 16:33 davids asked Apr 5, 2017 at 15:55 davidsdavids 5,59712 gold badges60 silver badges96 bronze badges 3- "CAUTION: I have to account for the possibility of the pany type characters existing in the actual name" does that mean that if those characters exist in the actual name, they should (or should not) be removed? – Sᴀᴍ Onᴇᴌᴀ Commented Apr 5, 2017 at 16:00
- Generally, for example, "Company, LLC" is the pany name. If you remove "LLC", you're no longer using the correct name. – Ouroborus Commented Apr 5, 2017 at 16:03
- @Ouroborus, you are correct, but no one would type the full pany name (with pany type) in the body of an email so it would be obvious that it is a generated email, or require extra manipulation to clean it up before sending. – davids Commented Apr 5, 2017 at 16:20
3 Answers
Reset to default 7If each pany name is a string on its own, you can try the following regex:
/,?\s*(llc|inc|co)\.?$/i
Explanation:
- Optional ma
- Optional whitespace
- Either one of LLC/Inc/Co (case-insensitive)
- Optional period
- All the above at the end of the string
const panyNames = [
'My Company LLC',
'My Company, LLC',
'My Company, Inc.',
'My Company, Inc',
'MY Company Inc.',
'My Company Inc',
'My Company Co',
'My Company',
];
console.log(panyNames.map(name => name.replace(/,?\s*(llc|inc|co)\.?$/i, '')));
I'd do:
panyName = panyName .replace(/,?\h*(?:\b(?:inc|LLC|co)\b\.?)?$/i,"");
Explanation:
/ : delimiter
,? : optional ma
\h* : optional horizontal spaces
(?: : non capture group
\b : word boundary
(?:inc|LLC|co) : non capture group, one of the alternatives
\b : word boundary
\.? : a dot, optional
)? : end group, optional
$ : end of string
/i : delimiter, case insensitive
Yes, there's a more efficient way (if by efficient we mean shorter), though multi-conditional patterns like this often lead to a trade-off between succinctness vs. readability.
It's a matter of sub-groups, which allows us to avoid repetition.
var rgx = /(, ?)?(LLC|Inc|Co)\.?$/i;
Let's break it down.
The first part,
(, ?)?
, says the pany name should be optionally followed by a bination of a ma and an optional space. So this would allow no ma, a ma with no space after it, or a ma with a space after it.The second part,
(LLC|Inc|Co)
is a simple sub-group allwing the different type suffixesThe final part,
\.?
, allows for an optional period at the end (we escape the period because in most REGEX implementations the period has special meaning, matching any non-space character).
Note also you don't need the g
flag, since (presumably) no pany name will have more than one type suffix. Also, the $
flag is useful here as it ensures our match must be at the end of the pany name, not merely somewhere within it.