最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How do I cross join two, multi-column tables in Google Sheets — without string hacking or Apps Script? - Stack Overflow

programmeradmin0浏览0评论

(I will post my own answer, but I want to pose the question for the sake of others in a similar situation; and to encourage alternative answers.)

Sometimes I have two tables and I need to cross join them, i.e., generate the combinations of each row of the first table with each row of the second table. For example, suppose I have one table of chefs and the dishes they plated for a competition, and a table of evaluators and the attributes that each specializes on. (I'll specify them as formulas for easy pasting into Google Sheets.)

table1 = {
  { "Albion"    , "Artichoke Soufflé Omelett" };
  { "Burgess"   , "Lemony Braised Chicken" };
  { "Hamad"     , "Mabo Dofu Smoothie" };
  { "Berengari" , "Chicken-Fried Plantains" };
  { "Sengupta"  , "Smoky Vegan Corn Salad" }
}

table2 = {
  { "Cho"       , "flavor" };
  { "Nikkelson" , "texture" };
  { "Rodríguez" , "process" }
}

Then the cross join of these two tables has 15 rows beginning with:

output = {
  { "Albion"    , "Artichoke Soufflé Omelett" , "Cho"       , "flavor" }
  { "Albion"    , "Artichoke Soufflé Omelett" , "Nikkelson" , "texture" }
  { "Albion"    , "Artichoke Soufflé Omelett" , "Rodríguez" , "process" }
  { "Burgess"   , "Lemony Braised Chicken"    , "Cho"       , "flavor" }
  { "Burgess"   , "Lemony Braised Chicken"    , "Nikkelson" , "texture" }
  { "Burgess"   , "Lemony Braised Chicken"    , "Rodríguez" , "process" }
  ...
}

I am looking for a formula or Named Function to do this — not Google Apps Script. In addition, I want to avoid "string-hacking" methods in which you serialize arrays into strings with weird delimiters, then do text manipulation and concatenation that generates the desired structure, and finally deserialize into the reshaped array. As doubleunary pointed out in their answer to a similar question, such methods have side effects when they convert numeric types to strings. Personally I found they can have unpredictable behavior if any of the data contain certain emoji. I also find them difficult to troubleshoot and maintain.

My question is similar to the three below, but they were only about cross joining single-column tables.

  • Generate all possible combinations for Columns(cross join or Cartesian product)
  • How to cross join 2 lists?
  • Google sheets - cross join / cartesian join from two separate columns

There is a previous question that doesn't ask about multi-column tables explicitly, but includes one in its sample data; but the OP of the question allowed Apps Script:

  • How to perform Cartesian Join with Google Scripts & Google Sheets?

This question might be similar, but I can't tell because the OP did not include sample data in the question and the linked spreadsheet no longer exists.

  • Google Sheets Cross Join Function Tables with More than Two Columns

(I will post my own answer, but I want to pose the question for the sake of others in a similar situation; and to encourage alternative answers.)

Sometimes I have two tables and I need to cross join them, i.e., generate the combinations of each row of the first table with each row of the second table. For example, suppose I have one table of chefs and the dishes they plated for a competition, and a table of evaluators and the attributes that each specializes on. (I'll specify them as formulas for easy pasting into Google Sheets.)

table1 = {
  { "Albion"    , "Artichoke Soufflé Omelett" };
  { "Burgess"   , "Lemony Braised Chicken" };
  { "Hamad"     , "Mabo Dofu Smoothie" };
  { "Berengari" , "Chicken-Fried Plantains" };
  { "Sengupta"  , "Smoky Vegan Corn Salad" }
}

table2 = {
  { "Cho"       , "flavor" };
  { "Nikkelson" , "texture" };
  { "Rodríguez" , "process" }
}

Then the cross join of these two tables has 15 rows beginning with:

output = {
  { "Albion"    , "Artichoke Soufflé Omelett" , "Cho"       , "flavor" }
  { "Albion"    , "Artichoke Soufflé Omelett" , "Nikkelson" , "texture" }
  { "Albion"    , "Artichoke Soufflé Omelett" , "Rodríguez" , "process" }
  { "Burgess"   , "Lemony Braised Chicken"    , "Cho"       , "flavor" }
  { "Burgess"   , "Lemony Braised Chicken"    , "Nikkelson" , "texture" }
  { "Burgess"   , "Lemony Braised Chicken"    , "Rodríguez" , "process" }
  ...
}

I am looking for a formula or Named Function to do this — not Google Apps Script. In addition, I want to avoid "string-hacking" methods in which you serialize arrays into strings with weird delimiters, then do text manipulation and concatenation that generates the desired structure, and finally deserialize into the reshaped array. As doubleunary pointed out in their answer to a similar question, such methods have side effects when they convert numeric types to strings. Personally I found they can have unpredictable behavior if any of the data contain certain emoji. I also find them difficult to troubleshoot and maintain.

My question is similar to the three below, but they were only about cross joining single-column tables.

  • Generate all possible combinations for Columns(cross join or Cartesian product)
  • How to cross join 2 lists?
  • Google sheets - cross join / cartesian join from two separate columns

There is a previous question that doesn't ask about multi-column tables explicitly, but includes one in its sample data; but the OP of the question allowed Apps Script:

  • How to perform Cartesian Join with Google Scripts & Google Sheets?

This question might be similar, but I can't tell because the OP did not include sample data in the question and the linked spreadsheet no longer exists.

  • Google Sheets Cross Join Function Tables with More than Two Columns
Share Improve this question edited Mar 13 at 15:02 Simon Garcia asked Mar 12 at 23:30 Simon GarciaSimon Garcia 1301 silver badge8 bronze badges 4
  • I believe a simple combination of two BYROWs and one HSTACK can answer this too. – PatrickdC Commented Mar 12 at 23:31
  • This question is similar to a few others, but they were only about crossing single-column tables. Not sure that this means. But Google spreadsheet "=QUERY" join() equivalent function? looks relevant and it contains an excellent guide Mastering Join-formulas in Google Sheets – Tedinoz Commented Mar 13 at 7:09
  • @Tedinoz Thanks for the leads, but the question and guide you linked only refer to left join. Apologies, I should have clarified I was referring to cross-joining single-column tables. Cross join is also referred to as Cartesian product. – Simon Garcia Commented Mar 13 at 14:52
  • This question might be similar Very similar. refer my answer. – Tedinoz Commented Mar 14 at 5:06
Add a comment  | 

4 Answers 4

Reset to default 1

Use BYROW and HSTACK

As per ...to avoid "string-hacking" methods..., you may use HSTACK since it will generally join columns (in this case the two tables) as a whole without converting said values into strings.

You may use:

= LET(
  table1, A2:B6, table2, E2:F4,
  WRAPROWS(
    TOROW( BYROW( table1, LAMBDA( x,
      TOROW( BYROW( table2, LAMBDA( y,
        HSTACK( x, y)
      ) ) ) 
    ) ) ),
    COLUMNS( table1 ) + COLUMNS( table2 )
  )
)

Given the following tables:

Table 1 (A2:B6)
"Albion" "Artichoke Soufflé Omelett"
"Burgess" "Lemony Braised Chicken"
"Hamad" "Mabo Dofu Smoothie"
"Berengari" "Chicken-Fried Plantains"
"Sengupta" "Smoky Vegan Corn Salad"

and

Table 2 (E2:F4)
"Cho" "flavor"
"Nikkelson" "texture"
"Rodríguez" "process"

I got:

Output
"Albion" "Artichoke Soufflé Omelett" "Cho" "flavor"
"Albion" "Artichoke Soufflé Omelett" "Nikkelson" "texture"
"Albion" "Artichoke Soufflé Omelett" "Rodríguez" "process"
"Burgess" "Lemony Braised Chicken" "Cho" "flavor"
"Burgess" "Lemony Braised Chicken" "Nikkelson" "texture"
"Burgess" "Lemony Braised Chicken" "Rodríguez" "process"
"Hamad" "Mabo Dofu Smoothie" "Cho" "flavor"
"Hamad" "Mabo Dofu Smoothie" "Nikkelson" "texture"
"Hamad" "Mabo Dofu Smoothie" "Rodríguez" "process"
"Berengari" "Chicken-Fried Plantains" "Cho" "flavor"
"Berengari" "Chicken-Fried Plantains" "Nikkelson" "texture"
"Berengari" "Chicken-Fried Plantains" "Rodríguez" "process"
"Sengupta" "Smoky Vegan Corn Salad" "Cho" "flavor"
"Sengupta" "Smoky Vegan Corn Salad" "Nikkelson" "texture"
"Sengupta" "Smoky Vegan Corn Salad" "Rodríguez" "process"

References:

  • BYROW
  • HSTACK
  • TOROW
  • WRAPROWS

Use MAKEARRAY to construct each set of columns separately

I think PatrickdC's answer is better overall because it is more elegant and easier to understand and troubleshoot. I'm posting my solution as an alternative for the curious, with an explanation of how it works.

Formula

= LET(
  table1_height, rows( table1 ),
  table2_height, rows( table2 ),
  {
    WRAPCOLS(
      FLATTEN( 
        MAKEARRAY( 
          1, 
          table2_height, 
          LAMBDA( row, col, FLATTEN( TRANSPOSE(table1) ) ) ) 
      ),
      table2_height * table1_height
    ),
    WRAPROWS(
      FLATTEN(
        MAKEARRAY( 
          table1_height, 
          1, 
          LAMBDA( row, col, TRANSPOSE( FLATTEN(table2)) ) )
      ),
      COLUMNS( table2 )
    )
  }
)

How it works

It uses MAKEARRAY and FLATTEN and TRANSPOSE to make table2_height X duplicates of each row of table1 vertically, FLATTENs them to a single array, and then uses WRAPCOLS to reshape it into an array with the same width as table 1. It does the same with table2, except it makes table1_height X duplicates of each row of table 2 horizontally before flattening and wrapping them. The final array is constructed by concatenating the two arrays horizontally.

Speed

If you are working with source tables large enough that the cross join generates more than a thousand elements, then speed matters. I've tested both this solution and PatrickdC's and they are quite zippy, up until output array's size approaches 10^5 total elements. (Larger than that, it's time to turn to other tools if your work situation allows it.)

The question refers to Google Sheets Cross Join Function Tables with More than Two Columns noting that it "might be similar, but I can't tell because the OP did not include sample data in the question and the linked spreadsheet no longer exists."

This answer is based on the referenced question. Credit to @MattKing for the original answer:

Enter this formula in cell G2:

=ARRAYFORMULA(
 {
  HLOOKUP(
   {"A","B"},{"A","B";A2:B6},SEQUENCE((COUNTA(D2:D))*(COUNTA(A2:A)),1,0)/(COUNTA(D2:D))+2
  )
  ,
  HLOOKUP(
   {"E","F"},{"E","F";D2:E4},MOD(SEQUENCE((COUNTA(D2:D))*(COUNTA(A2:A)),1,0),(COUNTA(D2:D)))+2
  )
 }
)

SAMPLE PLUS OUTPUT

I haven't converted any data to markdown because the results are identical to @PatrickdC


LOGIC

  • The result uses HLOOKUP (wrapped in ARRAYFORMULA) to create two ranges stacked side-by-side
  • The index in HLOOKUP#1 = SEQUENCE((COUNTA(D2:D))*(COUNTA(A2:A)),1,0)/(COUNTA(D2:D))+2
    • this creates 15 INDEX values (five groups of three): "2,2,2,3,3,3,4,4,4,5,5,5,6,6,6" which generates three rows for each of the five rows in table 1.
  • HLOOKUP#1 creates this array:

  • The index in HLOOKUP#2 is slightly different:

MOD(SEQUENCE((COUNTA(D2:D))*(COUNTA(A2:A)),1,0),(COUNTA(D2:D)))+2

  • this creates 15 INDEX values (five groups of three): "2,3,4,2,3,4,2,3,4,2,3,4,2,3,4" which generates five rows for each row in table 2, created in sequence.

  • HLOOKUP#2 creates this array:

  • The two arrays are wrapped in curly brackets {} separated by a comma to create the result.

  • The answer in the referenced question included QUERY because there the result depended on "date in column 5 is greater than or equal to the date in column 2." This is not relevant in this case, so the QUERY is dropped.

This answer builds on Tedinoz's answer, and generalizes it to arbitrary-width tables. Please refer their explanation of how it implements the actual cross-join logic; I will explain just the changes needed to generalize it.

Here is the formula:

= LET(
  table1, A2:B6, table2, D2:E4,
  header1, TRANSPOSE(SEQUENCE(COLUMNS(table1))),
  header2, TRANSPOSE(SEQUENCE(COLUMNS(table2))),
  ARRAYFORMULA(
   {
    HLOOKUP(
     header1,
     { header1; table1 },
     SEQUENCE((ROWS(table2))*(ROWS(table1)),1,0)/(ROWS(table2))+2
    ),
    HLOOKUP(
     header2,
     { header2; table2 },
     MOD(SEQUENCE((ROWS(table2))*(ROWS(table1)),1,0),(ROWS(table2)))+2
    )
   }
  )
)

And a listing of the changes and what they do:

  • Use LET to define variables table1 and table2 that reference the source ranges. This is mostly for convenience and readbility, and my testing confirms it does not affect performance.
  • Use ROWS to determine the height of each table, which is more robust than using COUNTA. The heights are needed to generate the correct-size index arrays needed for the HLOOKUPs, regardless of each table's shape. This change actually improves speed.
  • Use TRANSPOSE(SEQUENCE(COLUMNS(table1))) to create a single-row array {1, 2, 3, ...} and define it as header1; instead of manually creating the array { "A", "B" } in the formula. This ensures the array has as many elements as there are columns in table1.

Explanation of header1 and header2. The array header1 is needed both to construct the search array ({ header1; table1 }) and the search key to HLOOKUP, which causes it to return an entire row (and not just a single value) for each iteration of the ARRAYFORMULA. (A very neat trick inherited from MattKing's answer to a similar question.) Likewise for header2.

To turn this into a Named Function, copy the formula into the Formula definition box, but delete the line table1, A2:B6, table2, D2:E4, and create argument placeholders named table1 and table2.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论