Today I took 10 minutes to answer the 9 years old stackoverflow’s question: How do I collect return values from Parallel.ForEach? I though the improved solution deserved a quick blog post since it hasen’t been mentioned.
The accepted answer proposes to use a concurrent collection to store the result:
1 2 3 4 5 |
ConcurrentBag<string> resultCollection = new ConcurrentBag<string>(); ParallelLoopResult result = Parallel.ForEach(words, word => { resultCollection.Add(AddB(word)); }); // Do something with the result |
Concurrent collection is necessarily expensive since internally it relies on threading lock. There is 89 mentions of “lock” in the source code of ConcurrentBag<T>.
However in the particular case where the size of the collection is known upfront – and this is often the case in practice – an array of the size of the inputs can be used to store outputs. Now each loop receives its very own index, both to access the input and store the output. There is no risk of concurrent access collision here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Init words with index const int NB_WORDS = 1000; var inputs = new string[NB_WORDS]; for(var i= 0; i < NB_WORDS; i++) { inputs[i] = i.ToString(); } var outputs = new string[NB_WORDS]; Parallel.For(0, NB_WORDS, index => { string word = inputs[index]; string result = word + word; // Operation on word outputs[index] = result; // No need of a concurrent collection to store the result! }); Debug.Assert(outputs.All(result => !string.IsNullOrEmpty(result))); |
There is a bonus with this solution: outputs are stored with the same order as inputs, while the order is random when using a concurrent collection.
This is a trick we use for more than a decade in the NDepend code base, even before TPL (Task Parallel Library) was released. The rule of thumb is to avoid concurrent access when possible, and with a bit of astute this is often possible.