/Software Development

Performance testing LINQ to Objects Except Method

When .NET introduced LINQ and lambdas, I was blown away by the efficiency improvements in writing code. But, after reading countless articles online about how horribly slow and inefficient LINQ (even to objects) was compared with just plain for-loops, I’ve always felt guilty about using it.

The other day, however, I wanted to see how much slower linq would be in comparing 2 lists of objects to find all of the items from list1 that were not in list2. LINQ has an extension method called ”Except” just for this.

I coded the solution in LINQ, and then coded a few “solutions” using pre-LINQ techniques…spoiler alert: LINQ was the best for large sets of objects. The objects in the lists look like this:

private class TempFileInfo: IEquatable<TempFileInfo> { public ulong Id { get; set; } public string Name { get; set; } public ulong ParentId { get; set; } public string VirtualPath { get; set; } public string FileName { get; set; }

public bool Equals(TempFileInfo other) { if (other == null) return false; if (!EqualityComparer<ulong>.Default.Equals(Id, other.Id)) return false; if (!EqualityComparer<string>.Default.Equals(Name, other.Name)) return false; if (!EqualityComparer<ulong>.Default.Equals(ParentId, other.ParentId)) return false; if (!EqualityComparer<string>.Default.Equals(VirtualPath, other.VirtualPath)) return false; if (!EqualityComparer<string>.Default.Equals(FileName, other.FileName)) return false; return true; }

public override int GetHashCode() { int hash = 0; hash ^= EqualityComparer<ulong>.Default.GetHashCode(Id); hash ^= EqualityComparer<string>.Default.GetHashCode(Name); hash ^= EqualityComparer<ulong>.Default.GetHashCode(ParentId); hash ^= EqualityComparer<string>.Default.GetHashCode(VirtualPath); hash ^= EqualityComparer<string>.Default.GetHashCode(FileName); return hash; } } 

And then I have a test method that does all of the comparisons:

 [TestMethod] public void perfTest() { var localList = new List<TempFileInfo>(); var remoteList = new List<TempFileInfo>(); var diffs = new Dictionary<TempFileInfo, bool>();

var listLength = 1000;

for (int counter = 0; counter < listLength; counter ++) { var guidText = Guid.NewGuid().ToString(); localList.Add(new TempFileInfo() {Id = (ulong) counter, Name = guidText, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path"}); remoteList.Add(new TempFileInfo() {Id = (ulong) counter, Name = guidText, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path"}); }

var uniqueName = "different"; var differentFile = new TempFileInfo() { Id = 0, Name = uniqueName, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path" };

localList.Add(differentFile);

var watch = new Stopwatch();

//linq way watch.Start(); var linqDiffs= localList.Except(remoteList).ToList(); watch.Stop();

var linqDuration = watch.Elapsed.Duration(); watch.Reset();

//dictionary way watch.Start(); foreach (var a in localList) diffs.Add(a, true); foreach (var b in remoteList) diffs.Remove(b); watch.Stop();

var dictionaryDuration = watch.Elapsed.Duration();

watch.Reset(); diffs.Clear();

//contains way watch.Start(); foreach (var localInfo in localList) { if (!remoteList.Contains(localInfo)) { diffs.Add(localInfo, true); } } watch.Stop();

var containsDuration = watch.Elapsed.Duration();

watch.Reset(); diffs.Clear();

//double loop watch.Start(); foreach (var localInfo in localList) { var contains = false; foreach (var remoteInfo in remoteList) { contains = localInfo.Name == remoteInfo.Name; if (contains) break; } if (!contains) diffs.Add(localInfo, true); } watch.Stop();

var foreachDuration = watch.Elapsed.Duration();

watch.Reset(); diffs.Clear();

//remove way (if localList can be modified) watch.Start(); foreach (var remoteFi in remoteList) { localList.Remove(remoteFi); } watch.Stop(); var removeDuration = watch.Elapsed.Duration(); }

Let’s review the results

1K_Except_Durations

For just 1K items (listLength variable), LINQ isn’t very impressive

10K_LINQ_Duration

For 10K items…hmm, LINQ is working it’s way up to #1

If you plan to run this yourself, prepare to wait a while… the nested-looping techniques are effectively O(N^2)… so the time it takes to run is exponential.

100K_LINQ_Duration

And what about 100K items? LINQ wins hands down

If you have time to kill, you can try it with higher numbers… but waiting 5+minutes for the .Contains() method was long enough for me!

Conclusion

No need to feel guilty about using LINQ… where it matters (huge sets), it’s the best choice anyway.

Bogdan Varlamov

Bogdan Varlamov

I believe technology makes life more worthwhile :)

Read More