Windows Server 2008 R2 Failover Cluster Communication Failure

Failover Cluster communication can fail for a number of reasons, but the most common ones are: network infrastructure outages, NIC drivers and firmware,TCP offload and RSS, Antivirus software and so on. Some time ago I had a problem with Windows Server 2008 R2 Failover Cluster. All nodes were inaccessible, no Remote Desktop Connection, only access was from the console. It was obvious that something went wrong with network communication. For the cluster to resume the normal operation all nodes had to be restarted.

I started troubleshooting by checking the System Event log, but I couldn’t find any info that the private or public network failed for what ever reason.
I had to look further so I checked NIC teaming (not present), Antivirus (not present – cluster is operating in highly isolated environment), TCP offload and RSS were disabled (more details on http://support.microsoft.com/kb/951037/en-us). Cluster log didn’t help either, except telling me there was some network communication problem. As a part of the best practices I upgraded NIC divers just to rule out driver related problems, but I still didn’t know why communication failed.
I try to figure out where to look more and I started to go through specific event logs when I came across Network Profile log in Event Viewer\Applications and Services\Microsoft\Windows\NetworkProfile\Operational. There I noticed following entries:

Log Name: Microsoft-Windows-NetworkProfile/Operational
Source: Microsoft-Windows-NetworkProfile
Date: 12.5.2013 2:17:14
Event ID: 10000
Task Category: None
Level: Information
Keywords: (35184372088832)
User: LOCAL SERVICE
Computer: **********
Description:
Network Connected
 Name: Identifying...
 Desc: Identifying...
 Type: Unmanaged
 State: Connected
 Category: Public
Log Name: Microsoft-Windows-NetworkProfile/Operational
Source: Microsoft-Windows-NetworkProfile
Date: 12.5.2013 2:17:14
Event ID: 4002
Task Category: Wait for Identification
Level: Information
Keywords: Response Time,(35184372088832)
User: LOCAL SERVICE
Computer: **********
Description:
Transitioning to State: Identified Network Interface Guid: {**********}
Log Name: Microsoft-Windows-NetworkProfile/Operational
Source: Microsoft-Windows-NetworkProfile
Date: 12.5.2013 2:17:14
Event ID: 4003
Task Category: Wait for Identification
Level: Information
Keywords: Response Time,(35184372088832)
User: LOCAL SERVICE
Computer: **********
Description:
Transitioning to State: Unidentified Network Interface Guid: {**********}

After doing some research I realize that these messages meant network profile was changing from Domain to Public, which at the end blocked complete cluster communication.

Solution: After some more research I came across the solution described in Microsoft Knowledge Base article The network location profile changes from “Domain” to “Public” in Windows 7 or in Windows Server 2008 R2. Installed the Hotfix and haven’t noticed any problems since.

Here are some more options to explore in case you experience the cluster communication problems:
–  A Windows Server 2008 R2 failover cluster loses quorum when an asymmetric communication failure occurs.
– A transient communication failure causes a Windows Server 2008 R2 failover cluster to stop working.