最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How to get the cartesian product (cross join) of two sets via Excel Formulas - Stack Overflow

programmeradmin1浏览0评论

I need to get the cartesian product of two "sets" via Excel 365 Formulas (not VBA, not Power Query). For instance, my two sets are the two left tables, and the expected cartesian product is the right table:

Cartesian product of two sets in Excel 365

How to get the cartesian product with one Excel formula?

If possible, I'd like a general answer, to work for any two sets, whatever values they contain, whatever the number of columns is in the two sets.

The two sets:

Column1 Column2 Column1 Column2
4500005010 A PO start D
4500005011 B header merge E
4500005012 C PO activate F

I need to get the cartesian product of two "sets" via Excel 365 Formulas (not VBA, not Power Query). For instance, my two sets are the two left tables, and the expected cartesian product is the right table:

Cartesian product of two sets in Excel 365

How to get the cartesian product with one Excel formula?

If possible, I'd like a general answer, to work for any two sets, whatever values they contain, whatever the number of columns is in the two sets.

The two sets:

Column1 Column2 Column1 Column2
4500005010 A PO start D
4500005011 B header merge E
4500005012 C PO activate F
Share Improve this question edited Nov 22, 2024 at 7:12 Sandra Rossi asked Nov 20, 2024 at 19:17 Sandra RossiSandra Rossi 13.7k6 gold badges25 silver badges56 bronze badges 1
  • This doesn't require a LAMBDA() helper: =TEXTSPLIT(TEXTAFTER("|"&TOCOL(DataOne[Column1]&"|"&DataOne[Column2]&"|"&TOROW(DataTwo[Column1]&"|"&DataTwo[Column2])),"|",{1,2,3,4}),"|") – Mayukh Bhattacharya Commented Nov 20, 2024 at 19:24
Add a comment  | 

4 Answers 4

Reset to default 3

A generalized option could be:

=CROSSJOIN(Table1,Table2)

Where CROSSJOIN is a custom function defined in Name Manager:

=LAMBDA(array1,array2,
    LET(
        i, SEQUENCE(ROWS(array1)),
        j, SEQUENCE(, ROWS(array2)),
        HSTACK(
            CHOOSEROWS(array1, TOCOL(IF(j, i))),
            CHOOSEROWS(array2, TOCOL(IF(i, j)))
        )
    )
)

Source: https://stackoverflow/a/78244826/22459970

If you also wanted the option to handle header rows, CROSSJOIN can be defined as follows:

=LAMBDA(array1,array2,[headers],
    LET(
        incl_h, CHOOSE(headers + 1, 0, 1, 0, 1),
        show_h, CHOOSE(headers + 1, 0, 0, 1, 1),
        a, DROP(array1, incl_h),
        b, DROP(array2, incl_h),
        i, SEQUENCE(ROWS(a)),
        j, SEQUENCE(, ROWS(b)),
        v, HSTACK(
            CHOOSEROWS(a, TOCOL(IF(j, i))),
            CHOOSEROWS(b, TOCOL(IF(i, j)))
        ),
        IF(
            show_h,
            VSTACK(
                IF(
                    incl_h,
                    HSTACK(
                        TAKE(array1, 1),
                        TAKE(array2, 1)
                    ),
                    HSTACK(
                        "tbl1.Col" & SEQUENCE(, COLUMNS(a)),
                        "tbl2.Col" & SEQUENCE(, COLUMNS(b))
                    )
                ),
                v
            ),
            v
        )
    )
)

The optional [headers] argument will behave in a similar fashion to that of the [field_headers] argument of GROUPBY, where the following options are accepted:

  • 0 | No
  • 1 | Yes but don't show
  • 2 | No but generate
  • 3 | Yes and show

For example:

=CROSSJOIN(Table1,Table2,2)

Or:

=CROSSJOIN(Table1[#All],Table2[#All],3)

Please note, the line breaks and spaces/indentations were used for improved readability. They can be removed when defining the function in Name Manager.

Without using LAMBDA() helper function:


=LET(
     a, TOROW(DataTwo[Column1]&"|"&DataTwo[Column2]),
     b, TOCOL(DataOne[Column1]&"|"&DataOne[Column2]&"|"&a),
     TEXTSPLIT(TEXTAFTER("|"&b,"|",SEQUENCE(,4)),"|"))

• Or, Option Two:

=LET(
     a, ROWS(DataTwo),
     b, ROWS(DataOne),
     c, CHOOSEROWS(DataOne,INT((SEQUENCE(a*b)-1)/a)+1),
     d, CHOOSEROWS(DataTwo,MOD((SEQUENCE(a*b)-1),a)+1),
     HSTACK(c,d))

Change variables at the end to evaluate:

=LET(
     a, ROWS(DataTwo),
     b, ROWS(DataOne),
     c, SEQUENCE(a*b),
     d, CHOOSEROWS(DataOne,INT((c-1)/a)+1),
     e, CHOOSEROWS(DataTwo,MOD((c-1),a)+1),
     f, HSTACK(d,e),
     f)

There's a solution based on MAKEARRAY:

=MAKEARRAY(
  ROWS(Set_1)*ROWS(Set_2),
  COLUMNS(Set_1)+COLUMNS(Set_2),
  LAMBDA(row,col,
         LET(set_num,IF(col<=COLUMNS(Set_1),1,2),CHOOSE(set_num,
            INDEX(Set_1,QUOTIENT(row-1,3)+1,col),
            INDEX(Set_2,MOD(row-1,3)+1,col-COLUMNS(Set_1))))))

Cartesian Product

=LET(data1,Table11,data2,Table12,
    Repeat,LAMBDA(fs,ss,by_col,
        TOCOL(IF(TOROW(ss),fs),,by_col)),
    fs,SEQUENCE(ROWS(data1)),
    ss,SEQUENCE(ROWS(data2)),
    HSTACK(CHOOSEROWS(data1,Repeat(fs,ss,0)),
        CHOOSEROWS(data2,Repeat(ss,fs,1))))

TOCOL/IF/SEQUENCE

Efficiency

  • Not once have I encountered this combo being more efficient than the INT/MOD combo. Not when using them as they are (no visible difference) but when combined with other functions.
  • In this particular case, on two tables of 1000 rows by 5 columns, this combo took 10s while the INT/MOD solution by Mayukh Bhattacharya (the nice one with c, SEQUENCE(a*b),) took 14s. Your MAKEARRAY solution took 23s.

IF/SEQUENCE

  • To get more familiar with this combination, feel free to download Repeat Column from my OneDrive.
发布评论

评论列表(0)

  1. 暂无评论