I'm developing custom subset-sum algorithms and have encountered a puzzling issue: it seems difficult to generate truly "hard" subset-sum instances (i.e., forcing exponential computational effort) without using very large integers (e.g., greater than about 2^22).
I'd specifically like to know:
- Are there known constructions or instance generators for subset-sum that reliably force exponential complexity—particularly against common subset-sum algorithms or custom heuristics—using only moderately sized integers (≤2^22)?
- Is the hardness of subset-sum instances inherently tied to the size of integers involved, or is it possible to create computationally difficult instances purely through numerical structure and relationships, even with smaller numbers?
For context, here are some attempts I've made at generating potentially hard instances (feedback or improvements welcome):
import random
def generate_exponential_instance(n):
max_element = 2 ** 22
A = [random.randint(1, max_element) for _ in range(n)]
while True:
mask = [random.choice([0, 1]) for _ in range(n)]
if sum(mask) != 0:
break
target = sum(A[i] * mask[i] for i in range(n))
return A, target
def generate_dense_high_values_instance(n):
base = 2 ** 22 - random.randint(0, 100)
A = [base + random.randint(0, 20) for _ in range(n)]
target = sum(random.sample(A, k=n // 2))
return A, target
def generate_merkle_hellman_instance(n, max_step=20):
total = 0
private_key = []
for _ in range(n):
next_val = total + random.randint(1, max_step)
private_key.append(next_val)
total += next_val
q = random.randint(total + 1, 2 * total)
r = random.randint(2, q - 1)
public_key = [(r * w) % q for w in private_key]
message = [random.randint(0, 1) for _ in range(n)]
ciphertext = sum(b * k for b, k in zip(message, public_key))
return public_key, ciphertext
I'm developing custom subset-sum algorithms and have encountered a puzzling issue: it seems difficult to generate truly "hard" subset-sum instances (i.e., forcing exponential computational effort) without using very large integers (e.g., greater than about 2^22).
I'd specifically like to know:
- Are there known constructions or instance generators for subset-sum that reliably force exponential complexity—particularly against common subset-sum algorithms or custom heuristics—using only moderately sized integers (≤2^22)?
- Is the hardness of subset-sum instances inherently tied to the size of integers involved, or is it possible to create computationally difficult instances purely through numerical structure and relationships, even with smaller numbers?
For context, here are some attempts I've made at generating potentially hard instances (feedback or improvements welcome):
import random
def generate_exponential_instance(n):
max_element = 2 ** 22
A = [random.randint(1, max_element) for _ in range(n)]
while True:
mask = [random.choice([0, 1]) for _ in range(n)]
if sum(mask) != 0:
break
target = sum(A[i] * mask[i] for i in range(n))
return A, target
def generate_dense_high_values_instance(n):
base = 2 ** 22 - random.randint(0, 100)
A = [base + random.randint(0, 20) for _ in range(n)]
target = sum(random.sample(A, k=n // 2))
return A, target
def generate_merkle_hellman_instance(n, max_step=20):
total = 0
private_key = []
for _ in range(n):
next_val = total + random.randint(1, max_step)
private_key.append(next_val)
total += next_val
q = random.randint(total + 1, 2 * total)
r = random.randint(2, q - 1)
public_key = [(r * w) % q for w in private_key]
message = [random.randint(0, 1) for _ in range(n)]
ciphertext = sum(b * k for b, k in zip(message, public_key))
return public_key, ciphertext
Share
Improve this question
edited Mar 23 at 17:46
Naseiva Juman
asked Mar 23 at 17:05
Naseiva JumanNaseiva Juman
1,0101 gold badge8 silver badges15 bronze badges
5
- 2 The text has 10^{22} but the code has 2^{22}, which one is what you mean as the limit? – Gassa Commented Mar 23 at 17:19
- The subset-sum tag is unhelpful. It claims importance but does not tell us what it is. – ravenspoint Commented Mar 23 at 17:21
- Maybe a link like en.wikipedia./wiki/Subset_sum_problem should be put there? Anyway, the tag description is a problem distinct from the current question. – Gassa Commented Mar 23 at 17:28
- @ravenspoint You can follow its link to Wikipedia. – Kelly Bundy Commented Mar 23 at 17:33
- The tag should show a quick explanation when the mouse cursor hovers over it. Links are for further details for anyone interested. A tag that just claims it is important is not just useless, it is absurd. – ravenspoint Commented Mar 23 at 18:53
2 Answers
Reset to default 7We know subset-sum to be solvable in pseudopolynomial time. "Pseudopolynomial time" means the worst-case running time on large inputs is bounded by a polynomial in the input length and the largest numeric value in the input. Because a string of L
bits can encode numbers of size O(2^L)
, pseudopolynomial time algorithms really take exponential time (that is, exponential in the input length), but psuedopolynomial time algorithms are still considered to be better than "usual" exponential time algorithms since you can avoid the exponential behavior if you just use small numbers.
For example, even this simple algorithm for subset-sum:
def exists_subset_sum(num_set, target):
reachable = {0}
for num in num_set: reachable |= {prev + num for prev in reachable}
return target in reachable
has pseudopolynomial running time. The key observation is that the reachable
set contains only values in the interval [-i*M, i*M]
at the i
th iteration, where M
is max(abs(n) for n in num_set))
. Then, the size of the set is always O(L*M)
(where L
is the size of the input in bits) and operations on it take time polynomial in L
and M
, so the whole algorithm takes time polynomial in L
and M
. Technically, as M
is exponential in L
, the time complexity of exists_subset_sum
in terms of only L
is exponential. But, you can only get the exponential behavior if you let M
grow exponentially. If you just increase L
without increasing M
you will get at worst polynomial growth.
You'll notice that many algorithms for solving subset-sum are also pseudopolynomial.
Part of what may be confusing you is that this problem is hard because in deciding what is NP or not, the size of integers is measured in bits, not in the value. Given k inputs with a maximum value of N, it is pretty easy to solve the problem in time polynomial to kN. Too many computer scientists have gotten papers showing the P = NP because people thing they have solved this problem.
The "size" of an integer N
is lg(N)
, not N
. For this problem in particular, the worst-case complexity is dependent on the number of inputs and the total size (in bits) of all of the inputs. Hard problems either require lots of inputs or long inputs.
Try as your input 100 random 100-bit numbers, and make your target be half their sum.