The Unexpected Flaw
Can we trick compilers into emitting binaries that do not match the logic visible in the source code?
That’s what Nicholas Boucher from the University of Cambridge asks himself. Since the 1960s, there’s been a strong belief that code compilers will always emit binaries that allow them to translate high-level programming languages into low-level representations. Boucher, however, asks if perhaps a functional compiler can be manipulated through their source code into producing vulnerable binaries.
Not-So-Humble Beginnings
Exploitable vulnerabilities being usually discovered by potential attackers long before the general public finds out is something that Tim Erlin, Vice President of Tripwire, spoke of before. In this case, there isn’t a better example than SolarWinds’ infamous case. The methods used in both cases are eerily similar, even if they’re not exact replicas.
A giant in the tech world, SolarWinds acquired numerous clients throughout its history. From public entities in the U.S. like the Department of Homeland Security to private monoliths like Microsoft and Cisco. It’s safe to say that any security breach on their side would be synonymous with disaster.
And that’s exactly what happened. During the early months of 2020, hackers managed to appropriate the company’s software system, Orion. By taking advantage of the usual updates, they incorporated malware into Orion’s distribution network, affecting over 18,000 customers.
This supply-chain disaster set out a grim precedent for this new method Nicholas Boucher has been investigating. This, while also created a backdoor into thousands of companies and their precious data,
An Invisible Attack
This new attacking method, dubbed Trojan Source by Boucher, recreates this kind of supply-chain scenario. How does it do this? By exploiting both compilers and human error.
Tracked as CVE-2021-42574, it uses control sequences to permit visual reorganization of characters. That way, Trojan Source can craft different source code that renders different logic than the one being ingested by compilers and transcribers. In essence, this method encodes vulnerabilities into the source code without needing a malicious compiler. That way, it can target vulnerabilities without a human reviewer noticing.
Another possible method occurs by the simple use of homoglyphs, that is, characters that look quite similar to the human eye to hide these differences. While this variant, tracked as CVE-2021-42694, is not malicious in nature, the risk lies in the sense that it could cause an unprecedented number of breaches.
How Big Is The Risk?
Similar to last year’s SolarWinds’ case, the afflicted parties could be quite numerous. If it manages to bypass human perception at the moment of reviewing, downstream software will most likely be affected. Given that it targets any coding language that uses common compilers such as Unicode, we’re talking about almost every possibility: C++, JavaScript, Java, Rust, Go, and Python have all been confirmed as potential victims of Trojan Source.
In short, it affects almost every compiler that translates human-readable code into computer-executable code. However, there’s no consensus on the severity of this risk, as it’s still too early to tell.
How Do We Protect Ourselves?
There’s a few simple responses that researchers have developed so far to counter Trojan Source. It’s very preliminary, but banning the use of text directionality in both language specification and the compilers that implement these languages could be of use.
If we, say, made compilers that support Unicode display warnings for unterminated bidirectional (BiDi) control characters in comments, mixed with Code editors that make mixed-script characters visible with symbols or characters, then we might create a deterrent for this potential risk.
Of course, the creation or development of patches is something that’s being considered at higher levels. Rust, one of the companies that was afflicted last year with SolarWinds’ hack, has developed a new version in order to detect affected code and prevent it from being compiled. The CERT coordination center, backed by the U.S. Cybersecurity and Infrastructure Security Agency, has given access to affected vendors to VINCE. This communication platform is meant to facilitate communications between researchers and interested parties so that they may develop a measured response.
All in all, we can assert that there’s a definite risk. However, we can also assume that a careful implementation of these measures will solve this issue eventually. Applying the patches from the vendors you constantly rely on, as well as disallowing the BiDi directives will certainly help. If we limit our exposure, this flaw will cease to be a threat.