I need to extract the quantity and unit from strings like this
1 tbsp
1tbsp
300ml
300 ml
10grams
10 g
The quantities will always be numbers, then there may or may not be a space then the unit. They may be 15 - 20 different units which can e from a list that we define (perhaps an array)
The solution can be in either javascript or PHP as I need to split them before storing them in a database. ie they need to be stored separately.
Thanks
EDIT: Sorry to be clear. Each new line represents a new string. That is the string would only contain 10g OR 300ml - so we just need to split one unit and one quantity at a time.
I need to extract the quantity and unit from strings like this
1 tbsp
1tbsp
300ml
300 ml
10grams
10 g
The quantities will always be numbers, then there may or may not be a space then the unit. They may be 15 - 20 different units which can e from a list that we define (perhaps an array)
The solution can be in either javascript or PHP as I need to split them before storing them in a database. ie they need to be stored separately.
Thanks
EDIT: Sorry to be clear. Each new line represents a new string. That is the string would only contain 10g OR 300ml - so we just need to split one unit and one quantity at a time.
Share Improve this question edited Aug 7, 2010 at 15:18 32423hjh32423 asked Aug 7, 2010 at 15:10 32423hjh3242332423hjh32423 3,0887 gold badges46 silver badges62 bronze badges 2- Will they always be in a list like this? Or will there sometimes be other text around? – hookedonwinter Commented Aug 7, 2010 at 15:15
- @hookedonwinter - just on their own. No other text. – 32423hjh32423 Commented Aug 7, 2010 at 15:19
3 Answers
Reset to default 4Regex:
/(\d+)\s*(\D+)/
Code:
preg_match_all('/(\d+)\s*(\D+)/', $ingredients, $m);
$quantities = $m[1];
$units = array_map('trim', $m[2]);
$quantities
and $units
are:
Array
(
[0] => 1
[1] => 1
[2] => 300
[3] => 300
[4] => 10
[5] => 10
)
Array
(
[0] => tbsp
[1] => tbsp
[2] => ml
[3] => ml
[4] => grams
[5] => g
)
See: http://ideone./MSH8t
If you use this you don't have to have a list of units ready. But this assumes your units will have no numeric characters on them, and your quantities are numbers only.
Okay, what you can do is create an array of allowed units, and then use array_map
to apply preg_quote
on each unit in the array (so that if there are any characters in the unit that are special characters in a regular expression they will be escaped), and then construct a regular expression:
$units = array("tbsp", "ml", "g", "grams"); // add whatever other units are allowed
$pattern = '/^(\d+)\s*(' . join("|", array_map("preg_quote", $units)) . ')$/';
The $pattern
will thus bee something like /^(\d+)\s*(tbsp|ml|g|grams)$/
, and then you can use it to detect things that look like units in your string:
$matches = array();
// assuming you have an array of measurement strings...
foreach ($measurement_strings as $measurement)
{
preg_match($pattern, $measurement, $matches);
list(, $quantity, $unit) = $matches;
// ...
}
Because the pattern defines two capturing groups, for the quantity and unit respectively, you can then extract those out of the match and do what you want with them.
(I've updated my answer, based on the question update that each line is a separate string).
Mabye something simple is enough, just like that:
^([0-9]+)\s*([a-zA-Z]+)\s*$