Here is some tedious but necessary steps that need to take place when you encounter a memory DIMM with multiple ECC errors, otherwise Cisco will request you to do it anyway which will waste some time if you want to get that replacement part delivered immediately.
- Login to the UCS command line interface (Putty preferred)
- Reset memory errors using the commands below:
- CLI# scope server x/y (x = chassis number, y = slot number)
- CLI# reset-all-memory-errors
- CLI# commit-buffer
- CLI# clear sel
- CLI# commit-buffer
- CLI# scope cimc
- CLI# reset
- CLI# commit-buffer
Once all these commands are executed, keep an eye out for the errors to come back.
If they do not appear for 24-48 hours (I have never experienced this), then you are clear and the DIMM does not need to be replaced, otherwise collect the logs again and provide to Cisco support to expedite the process.
Have you ever had a DIMM with ECC errors fix itself after clearing them? Comment below!
Never!!!. Great blog post Danny.
Larry Mauch, Enterprise Account Executive, Verge.io
Haha, thank you Larry!