13 Aug, 2009, Silenus wrote in the 21st comment:
Votes: 0
Thanks David. I will take a look at the course. I suspect I may still consider building in some constructs in the language for managing processes (Erlang style) and a simple message passing like system. It maybe slightly beyond my current skill level to implement a message passing + queues system but i suspect I am close to that level now. Perhaps with a bit of effort (studying the Erlang VM) I could do this.
13 Aug, 2009, David Haley wrote in the 22nd comment:
Votes: 0
I don't think it's wise to be building in support for concurrency until you get a pretty good understanding of the issues involved. It's kind of hard to design something if you don't understand what it is you're designing after all. For example, the special bit about Erlang isn't that it passes messages; it's that program semantics are very constrained around message passing. Message passing on its own is not a solution to synchronization.
13 Aug, 2009, Silenus wrote in the 23rd comment:
Votes: 0
Hmm. Perhaps but I am wondering if it really is that complex? I assume one could get a good grasp on the subject by reading the research literature on message passing. I.e. maybe by taking a look at Actors. What I am thinking of doing is implementing something similar to Erlangs process model. I.e. having process all operate in independent address spaces and use message passing as the means of communicating between them. I have also a book on how to program in Erlang so I am not sure what additional complexity you are referring to. I thought that message passing was an alternate model for concurrency distinct from threads or am I wrong about this?

I dont feel this really is a "real" design problem but more of a semi cloning problem. Like most aspects of the language I am implementing there is hardly anything really new. It's just a blend of different features which I have come to understand better as I write the code.
13 Aug, 2009, David Haley wrote in the 24th comment:
Votes: 0
Until you have written and debugged large concurrent systems, I think it is difficult to have a true appreciation of the complexity involved in concurrent programming. The theory is in fact extremely simple: access to shared state needs to be synchronized.

What I meant about message passing is that, on its own, that is not a sufficient guaranteed solution to concurrency problems. A very simple example is when you need to pass several messages between two objects, while making sure that nobody else comes in and modifies state unexpectedly. To be even more fun, throw in side effects during message handling.
14 Aug, 2009, Silenus wrote in the 25th comment:
Votes: 0
Thanks David.

Well to some extent I am aware of these issues and how Erlang potentially copes with them construct wise. Implementationally to me this is probably sufficient. Unfortunately developing another application solely for the purpose of gaining experience with a large concurrent system seems like a fair amount of work. I.e. I have no such systems available and writing one as you noted is rather non-trivial. I could perhaps dream up some application and partially implement it which is reasonably concurrent to gain a better of how these things might work I suppose but I think perhaps given what I have in mind this isn't really needed. I have read a bit about how this is typically done but havent really been in a position to try it.
14 Aug, 2009, David Haley wrote in the 26th comment:
Votes: 0
Well, the homework assignment I linked to is a good starting point, at least. But to some extent, yes, you'll probably have to bite the bullet and just write the code. Trying to design a language with concurrency semantics without really knowing about concurrency problems in practice or how other language tackle them is kind of like trying to write a language that optimizes hardware usage without being aware of the issues involved in talking to hardware or how assembly works, etc.
14 Aug, 2009, Silenus wrote in the 27th comment:
Votes: 0
Thanks David. If you are trying to understand these semantics should you program things in a multitude of different concurrency models? I am thinking maybe I should try writing some code in Erlang and perhaps doing the same thing with POSIX threads and compare them. I assume with large amounts of shared data structures threading may be more efficient whereas otherwise message passing might be easier to deal with since you don't have to be concerned with instruction interleaving errors (which given the large number of possible permutations = extremely hard to debug).
14 Aug, 2009, David Haley wrote in the 28th comment:
Votes: 0
Part of the 'joy' of learning concurrent programming is precisely wrapping your head around the extremely hard to debug problems. Anyhow, my advice is as it has been up to now: the best way to gain practical familiarity with these issues is to go forth and practice. :smile:
15 Aug, 2009, Silenus wrote in the 29th comment:
Votes: 0
Thanks David. Out of curiousity how much overhead is typically associated with checking and acquring locks? I was looking over the Bakery Algorithm just now and though it seems quite simple with atomic registers I am uncertain what would happen on a multi-core system during the various memory reading latencies on various boolean flags (or other shared flags) and how it would interact with the cache hierarchy. How are such guarantees enforced in a typical multi-proc system?
15 Aug, 2009, David Haley wrote in the 30th comment:
Votes: 0
Overhead depends on all sorts of things including the language implementation, the OS, etc. It's kind of hard to answer in general. Generally you don't want to be acquiring and releasing locks over and over again in an inner loop if you can avoid it. In functions that don't run often, the cost is basically noise – most of the time.
15 Aug, 2009, flumpy wrote in the 31st comment:
Votes: 0
David Haley said:
Overhead depends on all sorts of things including the language implementation, the OS, etc. It's kind of hard to answer in general. Generally you don't want to be acquiring and releasing locks over and over again in an inner loop if you can avoid it. In functions that don't run often, the cost is basically noise – most of the time.


… that's pretty much what I said to 'im..
15 Aug, 2009, Silenus wrote in the 32nd comment:
Votes: 0
I spoke to someone else a bit about this issue and I think it confirms some of my suppositions about how thing operate in some respects at the hardware level and the cost potentially that is involved. I.e. that the flag sets in Lamport's Bakery algorithm would result in a neccesary write through to main memory (or the first shared cache) so it could be read by the other processor. It would seem to me that this might have to occur multiple times and reads would have to read from certain caches/main memory during the course of the algorithm resulting in some significant slowdown (since you would always have to pull data from a potentially distant shared memory source instead of the nearest cache with the value).

Flumpy and myself discussed this to some extent w.r.t. a design decision he made in Groovymud. My inexpert opinion was that what he did may result in alot of synchronization overhead because it may be too fine grained. He can tell you the rest if he's interested in an opinion :D.
16 Aug, 2009, David Haley wrote in the 33rd comment:
Votes: 0
As usual I think you're worrying about far too much at once. You're worrying about things like processor cache usage, when you're working in a dynamic language to begin with – sheesh! :tongue: If you want to squeeze out every last cycle of efficiency, you have chosen the wrong language to be writing in. In other words, if you're going to worry about every last hundredth of an inch on the canvas, you should not have chosen to paint with such a wide brush in the first place.

"Significant slowdown" is remarkably relevant. If you're talking about an inner loop that is executed millions of times per second, you can legitimately worry about adding a few extra cycle uses. If you're talking about something that runs a few hundred times or less per second, then if you worry about a cycle here or there, you are missing not just the forest but the entire planet for the single tree.

Especially since you are self-admittedly inexpert, I cannot say more strongly that you should gain practical experience first and benchmark things before trying to divine what the performance consequences will be. I kind of feel like we're going in circles though. :tongue:
16 Aug, 2009, Silenus wrote in the 34th comment:
Votes: 0
I dont think I am particularly worried about these potential consequences- I am just interested in understanding them. I believe I may have mentioned that I am working in C/C++ and I am hoping this understanding carries over to other projects I am working on (which potentially have extremely tight inner loops). As I mentioned in developing this particular server/language application I am interested in making things relatively efficient since I am not certain at present whether the mud system which runs on top of this will be mud-like in a traditional sense.

Basically I am looking for a deeper understanding of the actual mechanics of the system and the potential consequences. Obviously I can benchmark after it's implemented and attempt to optimize or refactor code to gain speedup after the fact. However a little foresight and understanding I feel might make it possible to eliminate some of these problems before I write the code. It's kind of like what's the purpose of asymptotic analysis if not on some level predict the efficacy of a given algorithm prior to implementing the code. I guess in principle one could implement something using whatever algorithm and finally decide woops it doesnt work (several orders of magnitude too slow!) and waste several months developing something which = impossible in the first place wasting several months before since it took that long to get something benchmarkable. Also I suspect that benchmarking/profiling might tell you to some extent where the problem might be (and thus hint at why) but without an understanding of the actual hardware side you wont have a good model as to precisely why.

And yes we are beginning to talk in circles :-). I think we just disagree a little on this :-).
20.0/34