17 Dec, 2011, David Haley wrote in the 42nd comment:

So even setting aside difference in languages, I wouldn't rush to discard Twisol's 'obvious' approach.

Yeah, except for the fact that the first, fast one didn't work. :wink:

I care little for who writes some code the most quickly. Often in algorithms, one can throw together a solution that

IMHO you should look into formal algorithms theory before you venture too far down this particular argument. Some of the things you've said, as Runter indicated, suggest that you're not really familiar with the field.

For example,

Even the most amazing solution is useless if it takes so much time and resources that history moves to obsolete it.

This is a pretty ridiculous thing to say. If the solution is the most amazing possible, then necessarily, there is no more amazing solution. So if it's slow, well, tough luck: you've found a problem that is formally difficult to solve.

What does it even mean for history to obsolete the most amazing solution? Are you going to find a better solution? Then your solution wasn't the most amazing, now was it?

This argument might not make sense to you if you don't understand algorithmic proofs, but to be honest, that's kind of my point.

Another example:

That said, I do want to put it out there that it's wrong to assume "math people" are the smartest thing in the universe, or that math itself abstracts reality in a perfect way.

When it comes to formal algorithmic proof, unless you're going to claim that their proofs are wrong, then, well, the "math people" have in fact cornered many solutions and

Of course, there are not answers to all of these problems. The most famous question in this field is perhaps whether or not the P class of problems i.... So sure, it's not like algorithmic analysis gives all the answers. But it sure as hell gives some, and very frankly, you should do a little homework before venturing into the field of formal algorithmic analysis and proofs.

Sorry if I'm being a little harsh to you, but your statements made me a little grumpy. :sad:

I meant that the optimal answer is obvious to you because you're already aware of it. I wasn't - and it took some considerable thought to come up with the model I derived. (It's obvious in hindsight… which is quite the point I'm making here.)

Oh, sorry. FWIW I think your solution is indeed the more obvious one. I don't find the canonical one to be terribly obvious, until, of course, you've puzzled it through.

I don't understand how you know you can stop when you hit a negative sum. If you continue past that point, you could find a sufficiently large entry to counterbalance that loss and actually gain some value.

Here's a sketch of a formal proof…

Consider that you have a sum, x, that ends at index i.

Consider that index i+1 contains a number y such that x + y < 0. (in other words, y is a sufficiently large negative number to counter the sum x)

Now consider the positive number i+2 (or more generally, keep on skipping until you find a positive number).

There are two cases:

1. i+2 is so big that it brings the sum back to positive.

2. i+2 is not big enough to bring the sum back to positive.

Case 1: if i+2 is so big that it brings the sum back to positive, then array[i+2] > array[0…i+1]. Therefore, the sum consisting of array[i+2] is larger than the sum up until i+2. So we need only compare array[i+2] and array[0…i].

Case 2: there are two subcases here, really.

- there is no larger sum afterward: in this case, adding array[i+1] to the sum is counterproductive because it makes the sum negative, and by assumption, array[i+2] isn't big enough to compensate (and if it were, we still don't want array[i+1]).

- there is a larger sum afterward: in this case, the larger sum afterward will only be even larger if we don't add array[i+1].

Hopefully this will clear up why the canonical solution works. Happy to elaborate further… and, please do keep in mind that it's 1:15am as I write this proof

With less info (not telling it was an algo problem), no info on when it was solved (with newest machine multithreading can give huge gain with those kind of problems) then I would not have said it was obvious.

As an example with multithreading getting the max of an array is way faster.

As an example with multithreading getting the max of an array is way faster.

You are most welcome to provide a solution that is "way faster" and provides "huge gain". I'll be eagerly awaiting your answer…

17 Dec, 2011, David Haley wrote in the 43rd comment:

Twisol, to elaborate slightly, I think the difference you're missing is that finding a negative **number** is not a problem, indeed you are correct that you might counterbalance that, but it's when the **whole sum** goes negative that things change.

17 Dec, 2011, Tyche wrote in the 44th comment:

I don't understand how you know you can stop when you hit a negative sum. If you continue past that point, you could find a sufficiently large entry to counterbalance that loss and actually gain some value.

It wouldn't be "contiguous".

17 Dec, 2011, Sharmair wrote in the 45th comment:

In this thread, Twisol was the first one to go down to milliseconds, so in practical terms his solution may be on par with the canonical one. People with extremely mathematical minds (used to know one) may see a faster solution right away, and there may be a hundred-times faster solution than the canonical one that no human mind has puzzled out yet, but one that would take a genius and half a lifetime. Point is the time frame for delivery should be part of evaluating how good a solution is, not just efficiency. So even setting aside difference in languages, I wouldn't rush to discard Twisol's 'obvious' approach.

Ah, not sure what thread you are reading, but I posted my 31 micro second (.031 millisecond)

solution 2nd, and Twisol was 3rd. Though I am not really sure who or what you are really

griping about here. But as for me, it took more time to figure out how to get the timing data

then the code to do the scan. I don't think I thought about it more then a half hour or so and

writing it was only a few mins, so I don't really see where you are going with the time thing.

And as far as Twisol's method, though it might be overkill for this problem, it might have uses

with other more complex problems, and hope it is given some thought by him or others working

on a problem it is more well suited for.

17 Dec, 2011, Rarva.Riendf wrote in the 46th comment:

You are most welcome to provide a solution that is "way faster" and provides "huge gain". I'll be eagerly awaiting your answer…

Huh ? what is the link between me telling that it was obvious in THIS case that the gain was to look in the number of pass, and me providing a faster solution that the known working solution….(that is impossible). I can speed up my result with multithreading, easily enough, by parsing half the data to find its max, another core for the next half, then finding the max of both result. I will still be way slower than using the known solution….

17 Dec, 2011, Sharmair wrote in the 47th comment:

Well back to the real subject at hand. When I was thinking of my posted solution,

I was also giveing some thought to a brute force solution to maybe start at (though

I worked out the single pass solution before I got to the computer to actualy code).

Then later I figured out a fully brute force solution. Well I decided to code these

two solutions (though I guess it was in the reverse order from what the OP had in

mind). All three solutions came up with the same max range/sum.

First the full brute force. This is going through each range and then summing that,

keeping track of the max as you go (I suppose you COULD still make it worse by

maybe storing each sum and then later trying to search for the max, or summing

the same ranges twice when the ends come up in the reverse order). The scan

code for this was:

The 2nd is the refined brute force solution. This also looks at all the ranges,

but it starts at each start location and adds to the sum and looks for the max

as it adds each new element (so it does not resum near as much):

The 3rd you know and I call here the single scan solution. I ran these with the

given array of 5000 elements, and also looking at just the first 2500 of the array.

The results of a typical run, 5000 run first, then 2500 run:

Full brute force 86.062000 10.500000 sec seems to be O(n^3)

Refined brute force 0.063000 0.015000 sec seems to be O(n^2)

single scan 0.000031 0.000015 sec seems to be O(n)

I was also giveing some thought to a brute force solution to maybe start at (though

I worked out the single pass solution before I got to the computer to actualy code).

Then later I figured out a fully brute force solution. Well I decided to code these

two solutions (though I guess it was in the reverse order from what the OP had in

mind). All three solutions came up with the same max range/sum.

First the full brute force. This is going through each range and then summing that,

keeping track of the max as you go (I suppose you COULD still make it worse by

maybe storing each sum and then later trying to search for the max, or summing

the same ranges twice when the ends come up in the reverse order). The scan

code for this was:

int rstart = 0;

int rend = 0;

int maxsum = 0;

for(int start = 0; start < size; ++start){

for(int end = start; end < size; ++end){

int sum = 0;

for(int ct = start; ct <= end; ++ct)

sum += data[ct];

if(sum > maxsum){

rstart = start;

rend = end;

maxsum = sum;

}

}

}

The 2nd is the refined brute force solution. This also looks at all the ranges,

but it starts at each start location and adds to the sum and looks for the max

as it adds each new element (so it does not resum near as much):

int rstart = 0;

int rend = 0;

int maxsum = 0;

for(int start = 0; start < size; ++start){

int sum = 0;

for(int end = start; end < size; ++end){

sum += data[end];

if(sum > maxsum){

rstart = start;

rend = end;

maxsum = sum;

}

}

}

The 3rd you know and I call here the single scan solution. I ran these with the

given array of 5000 elements, and also looking at just the first 2500 of the array.

The results of a typical run, 5000 run first, then 2500 run:

Full brute force 86.062000 10.500000 sec seems to be O(n^3)

Refined brute force 0.063000 0.015000 sec seems to be O(n^2)

single scan 0.000031 0.000015 sec seems to be O(n)

17 Dec, 2011, Deimos wrote in the 48th comment:

@plamzi: Humans did not invent mathematical rules/laws. We simply came up with words and symbols to describe the already-existing concepts. At best, you could say we discover mathematical rules/laws. Sorry for the pedantism.

17 Dec, 2011, plamzi wrote in the 49th comment:

In this thread, Twisol was the first one to go down to milliseconds, so in practical terms his solution may be on par with the canonical one. People with extremely mathematical minds (used to know one) may see a faster solution right away, and there may be a hundred-times faster solution than the canonical one that no human mind has puzzled out yet, but one that would take a genius and half a lifetime. Point is the time frame for delivery should be part of evaluating how good a solution is, not just efficiency. So even setting aside difference in languages, I wouldn't rush to discard Twisol's 'obvious' approach.

Ah, not sure what thread you are reading, but I posted my 31 micro second (.031 millisecond)

solution 2nd, and Twisol was 3rd. Though I am not really sure who or what you are really

griping about here. But as for me, it took more time to figure out how to get the timing data

then the code to do the scan. I don't think I thought about it more then a half hour or so and

writing it was only a few mins, so I don't really see where you are going with the time thing.

And as far as Twisol's method, though it might be overkill for this problem, it might have uses

with other more complex problems, and hope it is given some thought by him or others working

on a problem it is more well suited for.

I was reading the thread in which Twisol was the first to post a matching solution together with time in milliseconds (#25). Your post at #18 didn't list any benchmarks. If you had the answer first and your code ran faster, then kudos.

For the record, I wasn't declaring Twisol a winner, but was merely trying to save him some work :)

@plamzi: Humans did not invent mathematical rules/laws. We simply came up with words and symbols to describe the already-existing concepts. At best, you could say we discover mathematical rules/laws. Sorry for the pedantism.

Absolutely true, and no need to apologize for your pedantism. But isn't it an article of faith to believe that our words and symbols describe those concepts perfectly?

Many scientists believe that math is the language of creation, and that could well be true, but at this point in time, all of our science is built on assumptions about reality:

Quoted from http://en.wikipedia.org/wiki/Axiom :

"… an axiom is a logical statement that is assumed to be true…"

Incidentally, even if we assume that math is a perfect abstraction of reality, Runter's certainty that there is no faster solution than the canonical one to me seems to be based on the logical fallacy that the smartest math people in the universe have already been born. It also makes the unsafe assumption that the smartest math people already born worked on this exact problem.

I have neither faith, nor certainty, but I do have lots of love for the scientific method :)

17 Dec, 2011, Nich wrote in the 50th comment:

I think that you're using the word "assumed" in a different sense then the article is. I'm not a mathematician, but my understanding is that at the lowest levels, mathematical axioms neither need to be proven, or taken on faith. They are **given**, and any proofs derived from them are taken as given *if* the axioms hold. There is no way to falsify the axioms, because they are given to a particular branch of math.

Also, math isn't science (though science is grounded in math). Math is not limited to only describing worlds that exist, or could exist, in reality. So while you're correct in saying that our understanding of the world is grounded in assumptions, math doesn't work like that. When doing math, it's perfectly correct to choose your assumptions differently then exist in the real world. The scientific method doesn't apply generally in math, because math is more interested in proving things then hypothesizing about them. While you can come up with a hypothesis in Math, (e.g. P =/= NP) it's usually not useful unless it can be eventually proven.

I can't find a proof saying that the canonical algorithm for this problem is minimal, but if such a proof exists, it's not a matter of faith to say that a faster algorithm will never be found, as long as the assumptions of the proof are maintained.

A trivial proof that the algorithm can never be run in sub-linear time is that the algorithm must examine each element for inclusion at least once. Thus, a super bright mathematician could find a way to shave off a constant amount to make the algorithm faster, but running the algorithm on a sequence will always take proportionately twice as long as a sequence of half the size.

This of course includes the assumptions that we can't examine all the elements in constant time, but that assumption is a given. If a computer is invented that can examine sequences of any length in constant time, it wouldn't invalidate the proof. Rather, another proof, possibly with a different result, would be constructed given the new assumption.

Also, math isn't science (though science is grounded in math). Math is not limited to only describing worlds that exist, or could exist, in reality. So while you're correct in saying that our understanding of the world is grounded in assumptions, math doesn't work like that. When doing math, it's perfectly correct to choose your assumptions differently then exist in the real world. The scientific method doesn't apply generally in math, because math is more interested in proving things then hypothesizing about them. While you can come up with a hypothesis in Math, (e.g. P =/= NP) it's usually not useful unless it can be eventually proven.

I can't find a proof saying that the canonical algorithm for this problem is minimal, but if such a proof exists, it's not a matter of faith to say that a faster algorithm will never be found, as long as the assumptions of the proof are maintained.

A trivial proof that the algorithm can never be run in sub-linear time is that the algorithm must examine each element for inclusion at least once. Thus, a super bright mathematician could find a way to shave off a constant amount to make the algorithm faster, but running the algorithm on a sequence will always take proportionately twice as long as a sequence of half the size.

This of course includes the assumptions that we can't examine all the elements in constant time, but that assumption is a given. If a computer is invented that can examine sequences of any length in constant time, it wouldn't invalidate the proof. Rather, another proof, possibly with a different result, would be constructed given the new assumption.

17 Dec, 2011, Deimos wrote in the 51st comment:

@plamzi: I don't know about math being the language of creation (what exactly does that entail, anyway?), however, I do believe that we have a firm grasp on algorithmic complexity, thanks to our understanding of mathematical concepts. As it applies to this discussion, it is possible to prove that a particular problem can be solved by no less than O(foo). And it can also be proven that X method of solving that problem is O(bar). So, if O(bar) == O(foo), then we can be sure that that method is the "best", and it then becomes the canonical solution.

17 Dec, 2011, Twisol wrote in the 52nd comment:

Hopefully this will clear up why the canonical solution works. Happy to elaborate further… and, please do keep in mind that it's 1:15am as I write this proof *sketch*. :smile:

So I gave this some thought while going to sleep, and I think I understand what's going on. You could look at it like this: We're looking for the greatest subvector. If we take the entire region on either side of the ideal subvector, we find two negative regions. This is because, if either one was positive, it would be included in the solution.

If we're iterating through a single time, we want to find the boundaries between these positive and negative regions. So you find the first positive number to start at, and begin summing. This is our first maxima. Once the sum becomes negative, we've reached the right side of our maxima. And since it became negative - and since the part before the region is already known to be negative - we know we have a single maxima, and begin anew from the current position. The earlier part of the vector is now known to be negative, it can't contribute to the new region… and this process continues until we've covered the entire vector.

So the trick to this algorithm is that it looks for [neg, pos] sequences of

Now, what I'd really be interested in is a proof that maxima can't overlap with other maxima. Wait - I think putting it like that, I just got it. :P

It's interesting because this is almost like calculus. We have some function f(x), and we want to know over what bounds we have the maximal area. So, we're taking consecutive integrals of the function every time the area dips its head above the x-axis… Wow. I think if you'd explained it to me that way from the start, I'd have understood it perfectly. That's actually really cool!

17 Dec, 2011, David Haley wrote in the 53rd comment:

Huh ? what is the link between me telling that it was obvious in THIS case that the gain was to look in the number of pass, and me providing a faster solution that the known working solution….(that is impossible). I can speed up my result with multithreading, easily enough, by parsing half the data to find its max, another core for the next half, then finding the max of both result. I will still be way slower than using the known solution….

Well, you said that you could speed things up with multithreading. So I'd like to know how. What if the largest subarray spans the two halves? Your algorithm will need extra work to account for that case. It's certainly not impossible. Like I said, if you think you can speed up your result with multithreading, please do say how you will do so while producing a correct result. There's little use in waving hands saying it's possible when discussing algorithmic efficiency and correctness. Until a solution is presented, it's just talk. :smile:

Your post at #18 didn't list any benchmarks.

Yes, it did. You should read it more closely.

And please, please, do some homework before coming in to tell people how algorithmic analysis works. You don't know the field and you are waxing philosophical about proofs and the nature of reality based on ignorance. It's fine to not know the field, it is after all a somewhat advanced area of computer science, but in that case, you shouldn't tell people how it works.

We're looking for the greatest subvector. If we take the entire region on either side of the ideal subvector, we find two negative regions. This is because, if either one was positive, it would be included in the solution.

Yep! Same for your next paragraphs… that's what's happening, and why we can stop once we hit a "range boundary".

It's interesting because this is almost like calculus. We have some function f(x), and we want to know over what bounds we have the maximal area. So, we're taking consecutive integrals of the function every time the area dips its head above the x-axis…

That's certainly an interesting way of thinking about it… I suppose that if your input data were defined by a function, then you could apply standard calculus to find the maximal area, rather than scan the numbers. Perhaps that would be even faster than the linear scan. (Of course, it's a different problem…)

17 Dec, 2011, Rarva.Riendf wrote in the 54th comment:

Well, you said that you could speed things up with multithreading. So I'd like to know how. What if the largest subarray spans the two halves? Your algorithm will need extra work to account for that case. It's certainly not impossible. Like I said, if you think you can speed up your result with multithreading, please do say how you will do so while producing a correct result. There's little use in waving hands saying it's possible when discussing algorithmic efficiency and correctness. Until a solution is presented, it's just talk. :smile:

Look at my algo, you will clearly see that when I am talking about the max, it is the max of the calculated sums. I calculate he sums, find the max, calculate the sums, find the max etc, the only thing I can speed up by multithreading is finding the max. I even showed how it worked with an example.

17 Dec, 2011, David Haley wrote in the 55th comment:

Your algorithm doesn't make a lot of sense, and you've said nothing about how you deal with the largest subarray sum spanning several chopped up data pieces. I'm not sure why you talk about finding "the max" because the problem is not finding one maximum number, it's finding the subarray with the largest sum.

17 Dec, 2011, Rarva.Riendf wrote in the 56th comment:

Your algorithm doesn't make a lot of sense,

Well if you did not get it, no wonder you do not get the multithreading part.

1 2 3 4 5 < max is 5, then I move the vector

1 2 3 4 and add it

1 3 5 7 9 <finding max-> 9 then move again and add it

1 2 3

1 3 6 9 12 < max is 12 (5 4 3)

I do this cause it is easy to make all the addition at once in binary (&), same for changing the vector content ( >>)

finding the max has to be done each step, and that is why I can heavily parallelize.

By the end of the loop this algo will have calculated all the possible sums

17 Dec, 2011, David Haley wrote in the 57th comment:

It's easy to say that somebody doesn't understand a process that you actually haven't laid out. :wink:

For example,

what does that even mean?

For example,

sum (data, (switch(data, sizeof(int))

what does that even mean?

17 Dec, 2011, Rarva.Riendf wrote in the 58th comment:

It's easy to say that somebody doesn't understand a process that you actually haven't laid out. :wink:

For example,

what does that even mean?

For example,

sum (data, (switch(data, sizeof(int))

what does that even mean?

Not much by itself, but with the examples I gave before to explain how it worked,I thought it would be enough, guess not.

If you understand it now, you see why there is no problem multithreading the part whereI need to find the max, as it is simply no more than that: finding the max element in an array.

18 Dec, 2011, Runter wrote in the 59th comment:

re mysterious multithreaded algorithm

Multithreading never makes an algorithm less costly. As we've discussed already, any speed gains here would be because you've doubled or quadrupled the number of processors working on the problem. In and of itself it's not really part of the algorithm. And this isn't even always going to be the case. So we shouldn't talk about hardware architecture as part of the algorithm, because it's not. Imagine a computer that only has a single processor. Now it's more obvious we're talking about the nuances of concurrency. There will be no gains here in terms of processing power, only overheads. The fact that it uses threading it achieve it is irrelevant to the algorithm. In other words, any "multithreading algorithm that's faster" suggestion means you've successfully taken any gains from hardware and attributed it to software architecture together, and you've confused the two things.

Also, re: anything too slow means something faster is out there (lol?)

If I gave you an array of length, say, 2**32 (4294967296), and wanted you to solve even the most basic problem on it…like sum the whole thing… it would be slow, and then you'd be stuck (i guess?) saying that some day someone is going to find a solution 100 times faster. Then I'd be in that same thread saying it's nonsense and nobody no matter the size of their math credentials is going to do the impossible and take the proven to be fastest algorithm and improve it. At some point, by Law of the Universe, you've simplified something down from an algorithmic standpoint to where there is nothing more efficient. At that point, the only way to solve the problem faster is to execute the algorithm faster. I.e. hardware advances, upgrades, multiple processors, whatever. But that's not based on algorithm, as previously stated.

Multithreading never makes an algorithm less costly. As we've discussed already, any speed gains here would be because you've doubled or quadrupled the number of processors working on the problem. In and of itself it's not really part of the algorithm. And this isn't even always going to be the case. So we shouldn't talk about hardware architecture as part of the algorithm, because it's not. Imagine a computer that only has a single processor. Now it's more obvious we're talking about the nuances of concurrency. There will be no gains here in terms of processing power, only overheads. The fact that it uses threading it achieve it is irrelevant to the algorithm. In other words, any "multithreading algorithm that's faster" suggestion means you've successfully taken any gains from hardware and attributed it to software architecture together, and you've confused the two things.

Also, re: anything too slow means something faster is out there (lol?)

If I gave you an array of length, say, 2**32 (4294967296), and wanted you to solve even the most basic problem on it…like sum the whole thing… it would be slow, and then you'd be stuck (i guess?) saying that some day someone is going to find a solution 100 times faster. Then I'd be in that same thread saying it's nonsense and nobody no matter the size of their math credentials is going to do the impossible and take the proven to be fastest algorithm and improve it. At some point, by Law of the Universe, you've simplified something down from an algorithmic standpoint to where there is nothing more efficient. At that point, the only way to solve the problem faster is to execute the algorithm faster. I.e. hardware advances, upgrades, multiple processors, whatever. But that's not based on algorithm, as previously stated.

18 Dec, 2011, Rarva.Riendf wrote in the 60th comment:

'The fact that it uses threading it achieve it is irrelevant to the algorithm'

In the canonic solution it is irrelevant, in my solution the speed gain can be significant. And sometimes the best algo for a single thread is not the best if you can parallelize. And there is nothing mysterious about parallelizing the parsing of an array to find the max element…

And my solution is n^2 so it will not be faster than the best solution anyway, but mine can be partly parallelized…

In the canonic solution it is irrelevant, in my solution the speed gain can be significant. And sometimes the best algo for a single thread is not the best if you can parallelize. And there is nothing mysterious about parallelizing the parsing of an array to find the max element…

And my solution is n^2 so it will not be faster than the best solution anyway, but mine can be partly parallelized…

Random Picks

40.0/78

Votes:0Tyche said:Using HLA/FASM.

Actually the first test is < 0 rather than < 1. I don't believe Sharmair's correctly handles the array Runter posted.

This - > {10,-10,9,2}

Yes, I was aware of how it worked with leading zero sum ranges (they don't contribute to the

maxsum and are not included). I had 3 different ways that line could work based on how you

wanted to handle that, and if your were using floating point numbers. I also have thought out

changes to other lines to handle trailing zero sum ranges or subsequent equal max sum ranges.

And even code to handle the no positive number case.

The OP only specified the max sum, and did not even include range limits in either of his solutions.

Let alone specified what one of mutiple equal max sum ranges was more 'correct' then another.