Nvidia GPU Memory Testing Guide and Apple T2 platform: Difference between pages

From Repair Wiki
(Difference between pages)
No edit summary
 
No edit summary
 
Line 1: Line 1:
{{Explanatory Guide
{{Explanatory Guide
|Device=Nvidia GPUs
|Device=Apple Laptops
|Type=Troubleshooting/Diagnostics
|Type=Troubleshooting/Diagnostics
|Difficulty=2. Medium
|Difficulty=3. Hard
}}
}}
So, your card has all voltages and you have verified that the bios circuit is working as it should but you still have no output from the card. Or there is output but you have artifacts, crashing under load, abnormal behavior etc. Well, you probably have a faulty memory chip and you've come to the right place.


-Replacing memory chips is a difficult procedure requiring BGA soldering experience and the proper equipment. If you do not have the tools or the experience, you should let an expert do it for you.-


[https://youtu.be/xWtrgq1G1fM Video example.]
==Nvidia MOdular Diagnostic Software (aka Nvidia MODS)==
[https://pdfcoffee.com/modspdf-pdf-free.html MODS] is a very powerful tool that tests Nvidia cards for different kinds of faults. It includes a standalone tool called MATS that tests memory specifically. If you do have access to it, this guide will show how to use MATS and identify faulty memory chips.
==Memory Channel Labeling==
[[File:Nvidia memory labeling pascal.jpg|link=link=https://repair.wiki/w/File:Nvidia_memory_labeling_pascal.jpg|thumb|Memory labeling example Pascal (Figure 1)]]
As shown in Figure 1 each channel consists of 2 memory chips, 0 and 1. For a card with N GB VRAM, there is N/2 channels. in that example, there are four memory channels (256 bit) in the 8GB GTX 1080.


Memory modules are counted counter clockwise starting from the OPPOSITE corner of the golden arrow on the core. Starting from A1, A0, B1, B0... to X1, X0. (X being the last channel)
== Devices using T2 / T8012 SoC ==
==Using MATS with a card that has no output==
All Intel MacBooks 2018-2020
You'll need either a CPU with an integrated GPU (any Intel CPU since Sandy Bridge, or an AMD APU) or a secondary video card to get the screen output.


After booting into MODS, type the following commands to start testing the memory:
Mac Pro 2019


<code>./mods gputest.js -skip_rm_state_init -mfg</code>
iMac 2020, iMac Pro


and then:
Mac Mini 2018
== Theory ==
Apple introduced T2 or T8012 in 2018 and discontinued it in 2020, fully fused into the M1 CPU. T2 is essentially a second processor used at the low level of the board, similar to SIO/EC on PC laptops; it serves as a board supervisor device. The most important aspect in repair would be its power state control and role in the power sequence. This chip is very close to the iPhone A10 SoC and was likely introduced as a bridge step to the Apple Silicon platform (its OS is even called BridgeOS, which is a huge hint). Understanding the T2 platform is key to understanding the M1+ platform, as well as the basics of Apple hardware design. Understanding T2/iBoot/BridgeOS platform will also help you to effectively read power sequence.
Main functions:


<code>./mats -n [card index] -e [memory size to test in MB]</code>
* SMC block: Battery charging, sensors management, power enables, and Intel S0-S5 state control.
* SSD Controller: The T2/M1 MacBook SSD is essentially a sort of RAID array built on custom-produced Toshiba/Hynix SIP (System in Package) SSDs. Each SSD on board (often called NAND, which is technically not correct) contains its RAID configuration block. To read more, refer to the (upcoming) MacBook SSD Repair page.
* Encryption processor: T2 has Apple's own security enclave processor (SIP), which is used for encryption, payment verification, authentication, and Touch ID.
* eSPI controller: The Intel EFI image is stored on the T2 firmware service partition as a file; it is fed to the PCH via the eSPI interface. For example to clean ME region after Intel SoC/PCH replacement you need to use DFU Revive in Apple Configurator. This will practically rebuild EFI image on BridgeOS partition
* Camera, Keyboard/Trackpad, Touchbar, Audio controller: Being basically a repurposed iPhone CPU, Apple reuses the audio codec, screen controller, and USB interfaces.
* Power Controller: T2 is paired with a universal configurable power IC called CALPE. This chip has dozens of integrated buck regulators, LDOs. Unlike old platforms, it does not have direct enable signals but is controlled via the i2c interface. T2 itself also controls the GPU power sequence and gMUX.
* Debug interfaces. T2 is also capable of feeding iBoot log as well as PCH/EFI Log into USB. It was used by Quanta with so called Potassium cable which worked as a debug terminal to read boot log of T2 / Intel PCH. Unfortunately since 2018 iBoot leak, Apple encoded all messages into stripped hashed message which cant be decoded. However PCH log is extrimely useful since it can pinpoint issue to RAM, GPU or other component depending on stalled block of EFI. It also shows full AHT log(Diagnostics on D button on start), listing all sensors and their values.


Index should be 1 if you are using integrated graphics or a dedicated GPU with a CPU that has no integrated.
== Firmware ==
T2 Firmware consists of three main parts:


Memory size to test should be at least 5, recommended 50. Higher numbers will take longer to finish.
'''LLB (low level bootloader):''' stripped version of iBoot stored on SPI Flash (for some reason called SoC ROM)  on board.  


After the test finishes, you will get a report.txt file that has the result of the test inside. Alternatively, you can add <code>|less</code> to the end of the 2nd command to show the results immediately after the test ends on the screen.
Most important parts of LLB: SMC  (Sensors, Power Contol, Battery charging), SSD / ANS2 (Apple Nand Storage 2) Firmware
==Using MATS with a card that has output.==
This is a bit easier since you don't have to enter the first command or an index, just enter <code>./mats -e [memory size to test in MB]</code> and the test will run. You can still add <code>|less</code> to the end to show the report on the screen.
==Identifying the faulty memory bank(s)==
[[File:Mats example.jpg|link=link=https://repair.wiki/w/File:Mats_example.jpg|thumb|Example report on an RTX 2060 (Figure 2)]]
[[File:2060 memory example.jpg|link=link=https://repair.wiki/w/File:2060_memory_example.jpg|thumb|RTX 2060 faulty memory chips (Figure 3)]]
Reading the report example in Figure 2, MATS found errors on D1 and C0, which correspond to the memory chips marked in Figure 3.


Usually, only one chip fails and makes the card not output a picture or displays artifacts. In this case however, there was a problem with 2 chips which points to a IMC (Integrated Memory Controller) fault which is inside the core. Luckily, this particular card was dropped by the user. Taking the memory chips off, cleaning the pads and resoldering the chips back fixed it.
'''BridgeOS:''' bigger version of modified watchOS. It is stored on SSD and works on higher level, providing interface to MacOS via USB.  


If you get errors on all channels though, it's either the IMC or a power related issue that either killed all the memories or is not suppling enough power to them. The failing bits can sometimes tell you if the issue is the memory itself or the IMC but replace the memory to make sure.
'''SEP:''' Secure Enclave Processor is a separate core inside of T2 which runs on its own completely separated firmware. The only aspect relevant to the repair is possible SEP Firmware corruption which yields error 9 during DFU restore in some cases.
==MODS/MATS version compatibility==
{| class="wikitable"
|+
!MODS/MATS version
!Supported cards
|-
|367.xxx
|GTX 1000 and below
|-
|400.xxx
|RTX 2000 and below (inc. GTX 16XX series)
|-
|455.xxx
|RTX 3000 and below
|}


T2 uses same bootrom as chackm8te vulnerable-devices. Checkmate can be used to upload patched ramdisk with SSH access and it seems to be used by OnTrack internal data recovery tool. There are also development iboot builds available which might help with hard case troubleshooting.


[[Category:Nvidia Computer Components]]
== Power Sequence ==
T2 Platform power sequence is one of the most confusing and difficult parts in MacBook repairs. Being a mixture of iPhone/iPad naming convention and power stages, it is, however, quite simple and much clearer compared to, say, Lenovo's ThinkEngine nightmare.
 
The power sequence could be possibly divided into 3 main stages:
 
# "Finite state" stage, which does not require any activity from T2:
#* Main G3H power state generators, PPBUS, PP3V3_G3H_RTC.
# T2-Calpe power stage:
#* Main T2 power states, "S2", "SLP" (not relevant to the Intel platform!), Intel S5-S3-S0 power sequence.
# Intel Power stage (controlled by PCH/CPU System Agent):
#* Rails like CPU VCORE, VCCGT, etc.

Revision as of 23:25, 31 March 2024

Apple T2 platform
Type Troubleshooting/Diagnostics
Device(s) Apple Laptops
Difficulty ◉◉◉◌ Hard



Devices using T2 / T8012 SoC

All Intel MacBooks 2018-2020

Mac Pro 2019

iMac 2020, iMac Pro

Mac Mini 2018

Theory

Apple introduced T2 or T8012 in 2018 and discontinued it in 2020, fully fused into the M1 CPU. T2 is essentially a second processor used at the low level of the board, similar to SIO/EC on PC laptops; it serves as a board supervisor device. The most important aspect in repair would be its power state control and role in the power sequence. This chip is very close to the iPhone A10 SoC and was likely introduced as a bridge step to the Apple Silicon platform (its OS is even called BridgeOS, which is a huge hint). Understanding the T2 platform is key to understanding the M1+ platform, as well as the basics of Apple hardware design. Understanding T2/iBoot/BridgeOS platform will also help you to effectively read power sequence. Main functions:

  • SMC block: Battery charging, sensors management, power enables, and Intel S0-S5 state control.
  • SSD Controller: The T2/M1 MacBook SSD is essentially a sort of RAID array built on custom-produced Toshiba/Hynix SIP (System in Package) SSDs. Each SSD on board (often called NAND, which is technically not correct) contains its RAID configuration block. To read more, refer to the (upcoming) MacBook SSD Repair page.
  • Encryption processor: T2 has Apple's own security enclave processor (SIP), which is used for encryption, payment verification, authentication, and Touch ID.
  • eSPI controller: The Intel EFI image is stored on the T2 firmware service partition as a file; it is fed to the PCH via the eSPI interface. For example to clean ME region after Intel SoC/PCH replacement you need to use DFU Revive in Apple Configurator. This will practically rebuild EFI image on BridgeOS partition
  • Camera, Keyboard/Trackpad, Touchbar, Audio controller: Being basically a repurposed iPhone CPU, Apple reuses the audio codec, screen controller, and USB interfaces.
  • Power Controller: T2 is paired with a universal configurable power IC called CALPE. This chip has dozens of integrated buck regulators, LDOs. Unlike old platforms, it does not have direct enable signals but is controlled via the i2c interface. T2 itself also controls the GPU power sequence and gMUX.
  • Debug interfaces. T2 is also capable of feeding iBoot log as well as PCH/EFI Log into USB. It was used by Quanta with so called Potassium cable which worked as a debug terminal to read boot log of T2 / Intel PCH. Unfortunately since 2018 iBoot leak, Apple encoded all messages into stripped hashed message which cant be decoded. However PCH log is extrimely useful since it can pinpoint issue to RAM, GPU or other component depending on stalled block of EFI. It also shows full AHT log(Diagnostics on D button on start), listing all sensors and their values.

Firmware

T2 Firmware consists of three main parts:

LLB (low level bootloader): stripped version of iBoot stored on SPI Flash (for some reason called SoC ROM) on board.

Most important parts of LLB: SMC (Sensors, Power Contol, Battery charging), SSD / ANS2 (Apple Nand Storage 2) Firmware

BridgeOS: bigger version of modified watchOS. It is stored on SSD and works on higher level, providing interface to MacOS via USB.

SEP: Secure Enclave Processor is a separate core inside of T2 which runs on its own completely separated firmware. The only aspect relevant to the repair is possible SEP Firmware corruption which yields error 9 during DFU restore in some cases.

T2 uses same bootrom as chackm8te vulnerable-devices. Checkmate can be used to upload patched ramdisk with SSH access and it seems to be used by OnTrack internal data recovery tool. There are also development iboot builds available which might help with hard case troubleshooting.

Power Sequence

T2 Platform power sequence is one of the most confusing and difficult parts in MacBook repairs. Being a mixture of iPhone/iPad naming convention and power stages, it is, however, quite simple and much clearer compared to, say, Lenovo's ThinkEngine nightmare.

The power sequence could be possibly divided into 3 main stages:

  1. "Finite state" stage, which does not require any activity from T2:
    • Main G3H power state generators, PPBUS, PP3V3_G3H_RTC.
  2. T2-Calpe power stage:
    • Main T2 power states, "S2", "SLP" (not relevant to the Intel platform!), Intel S5-S3-S0 power sequence.
  3. Intel Power stage (controlled by PCH/CPU System Agent):
    • Rails like CPU VCORE, VCCGT, etc.