I am attempting to extract valid cell references and range references from a spreadsheet formula, using Google Apps Script (Javascript).
A valid cell reference is one or two letters, followed by consecutive numbers not starting with a zero. Either the letter(s) or the number(s) may or may not be preceded by a $ character. The entire reference can't be preceded/proceeded by a letter, number or underscore (in which case it may be part of either a spreadsheet function or the name of a named range) or a colon (in which case it may be part of range reference).
The range reference regex (rangeRefRe
) seems to work well; but my cell reference regex (cellRefRe
) fails to find a match. Would be great if someone could point out what I'm doing wrong.
function myFunction()
{
var formula = '=A100+B$2:2+INDIRECT("A2:B")+$C3-SUM($D$1:$E5)';
var fSegments = formula.split('"'); // I want to exclude references within double quotation marks
var rangeRefRe = /[^0-9a-zA-Z_$]([0-9a-zA-Z$]+?:[0-9a-zA-Z$]+)(?![0-9a-zA-Z_])/g;
var cellRefRe = /[^0-9a-zA-Z_$:](\${,1}[a-zA-Z]{1,2}\${,1}[1-9][0-9]*)(?![0-9a-zA-Z_:])/g;
var refResult;
var references = [];
for (var i = 0; i < fSegments.length; i += 2)
{
while (refResult = rangeRefRe.exec(fSegments[i]))
{
references.push(refResult[1]);
}
while (refResult = cellRefRe.exec(fSegments[i]))
{
references.push(refResult[1]);
}
}
Logger.log(references);
}
I am attempting to extract valid cell references and range references from a spreadsheet formula, using Google Apps Script (Javascript).
A valid cell reference is one or two letters, followed by consecutive numbers not starting with a zero. Either the letter(s) or the number(s) may or may not be preceded by a $ character. The entire reference can't be preceded/proceeded by a letter, number or underscore (in which case it may be part of either a spreadsheet function or the name of a named range) or a colon (in which case it may be part of range reference).
The range reference regex (rangeRefRe
) seems to work well; but my cell reference regex (cellRefRe
) fails to find a match. Would be great if someone could point out what I'm doing wrong.
function myFunction()
{
var formula = '=A100+B$2:2+INDIRECT("A2:B")+$C3-SUM($D$1:$E5)';
var fSegments = formula.split('"'); // I want to exclude references within double quotation marks
var rangeRefRe = /[^0-9a-zA-Z_$]([0-9a-zA-Z$]+?:[0-9a-zA-Z$]+)(?![0-9a-zA-Z_])/g;
var cellRefRe = /[^0-9a-zA-Z_$:](\${,1}[a-zA-Z]{1,2}\${,1}[1-9][0-9]*)(?![0-9a-zA-Z_:])/g;
var refResult;
var references = [];
for (var i = 0; i < fSegments.length; i += 2)
{
while (refResult = rangeRefRe.exec(fSegments[i]))
{
references.push(refResult[1]);
}
while (refResult = cellRefRe.exec(fSegments[i]))
{
references.push(refResult[1]);
}
}
Logger.log(references);
}
Share
Improve this question
edited Jan 18, 2014 at 2:56
AdamL
asked Jan 18, 2014 at 2:50
AdamLAdamL
24.7k7 gold badges72 silver badges61 bronze badges
6 Answers
Reset to default 4JavaScript doesn't support this part of your regex: {,1}
. To allow 0 or 1 occurrences it would need to be {0,1}
, or you can replace that with just ?
:
/[^0-9a-zA-Z_$:](\$?[a-zA-Z]{1,2}\$?[1-9][0-9]*)(?![0-9a-zA-Z_:])/g;
The question and answers were incredibly helpful but I ran into a few problems so here are some notes for future readers:
It might be good to add "(" to the characters the regex can't end in. The formula could contain a call to a custom function named "a1" or something similar. Adding left-parenthesis would prevent matching a call to such badly named custom functions.
While "A2:A" and "A1:2" are valid ranges, ranges like "A:2" are not.
I needed the references ordered in the way they appeared in the formula. A single regex for both ranges and cells would solve that problem.
Here's the regex I came up with:
/[^0-9a-zA-Z_$:]\$?([a-zA-Z]+(\$?[1-9]\d*(:(\$?[a-zA-Z]+)?\$?([1-9]\d*)?)?|((:\$?[a-zA-Z]+\$?([1-9]\d*)?))))(?![0-9a-zA-Z_(])/g;
Variation to the regex posted by Josh Dawson to include sheet names.
var formula = '=data!A100+B$2:2+INDIRECT("A2:B")+\'Sheet 1\'!$C3-SUM($D$1:$E5)';
var fSegments = formula.split('"'); // I want to exclude references within double quotation marks
var re = /[^0-9a-zA-Z_$:]((((\'.+\')|([a-zA-Z0-9]+))\!)?\$?([a-zA-Z]+(\$?[1-9]\d*)(:(\$?[a-zA-Z]+)?\$?([1-9]\d*)?)?|((:\$?[a-zA-Z]+\$?([1-9]\d*)?))))/g;
var refResult;
var references = [];
for (var i = 0; i < fSegments.length; i += 2) {
while (refResult = re.exec(fSegments[i])) {
references.push(refResult[1]);
}
}
console.log(references);
The correct regex should be:
/[^0-9a-zA-Z_$:](\$?[a-zA-Z]{1,2}\$?[1-9][0-9]*)(?![0-9a-zA-Z_:])/
I have been doing the same thing in R
, and thought I'd add my method. It includes references to external workbooks. I did not include such things as B$2:2
as I have never seen them in the wild.
# Thanks to https://www.get-digital-help./2017/02/07/extract-cell-references-from-a-formula/
library(stringr)
formula <- "=data!A100+'[C:\\temp dir\\book.xlsx]Sheet 1'!$C3-SUM($D$1:$E5)"
book <- "\\[[a-zA-Z0-9][a-zA-Z0-9\\s\\+\\-\\&\\_\\.\\:\\\\]*\\]" # add any needed filepath characters
sheet <- "[a-zA-Z][a-zA-Z0-9\\s\\+\\-\\&\\_\\(\\)]*" # add any needed sheetname characters
range <- "\\$?[A-Z]+\\$?[0-9]+(:\\$?[A-Z]+\\$?[0-9]+)?(?!\\()" # not followed by (
pattern <- paste0("('?((", book, ")?", sheet, ")'?!)?", range)
pattern
#> [1] "('?((\\[[a-zA-Z0-9][a-zA-Z0-9\\s\\+\\-\\&\\_\\.\\:\\\\]*\\])?[a-zA-Z][a-zA-Z0-9\\s\\+\\-\\&\\_\\(\\)]*)'?!)?\\$?[A-Z]+\\$?[0-9]+(:\\$?[A-Z]+\\$?[0-9]+)?"
str_extract_all(formula, pattern, simplify=TRUE) # matrix
#> [,1] [,2] [,3]
#> [1,] "data!A100" "'[C:\\temp dir\\book.xlsx]Sheet 1'!$C3" "$D$1:$E5"
Created on 2019-03-14 by the reprex package (v0.2.1)
Looking at this problem for extracting cells from excel formulas, this could be a valid solution.
For sheet names where you use only any word character (\w) than you don't need apostrophe (') before and after the sheet name (Sheet1! or Sheet_1!).
For sheet name where you use any non-word character (\W\w) than you need apostrophe (') before and after the sheet name ('Sheet 1'! or 'Sheet.1'!).
sheet = ('[^']+'|\w+)!
cell = \$?[a-zA-Z]{1,3}\$?[1-9]{1,7}(:\$?[a-zA-Z]{1,3}\$?[1-9]{1,7})?