As machine learning models become larger and more complex, they require faster and more energy-efficient hardware to perform calculations. Traditional digital computers struggle to keep up.
An analog optical neural network can perform the same tasks as a digital one, such as image classification or speech recognition, but because the calculations are performed using light instead of electrical signals, optical neural networks can operate many times faster while consuming less power.
However, these analog devices are prone to hardware errors that can make calculations inaccurate. Microscopic imperfections in hardware components are one cause of these errors. In an optical neural network with many connected components, errors can accumulate quickly.
Even with error correction techniques, due to the fundamental properties of the devices that make up the optical neural network, a certain amount of error cannot be avoided. A network large enough to be used in the real world would be too error-prone to work.
MIT researchers overcame this hurdle and found a way to successfully scale an optical neural network. By adding a small amount of hardware to the virtual switches that make up the network architecture, they can even reduce unrecoverable errors that would otherwise accumulate on the device.
Their work could create a very fast, energy-efficient analog neural network, capable of operating with the same precision as a digital one. In this way, as the optical circuit becomes larger, the amount of error in its calculation actually decreases.
“This is surprising, as it is contrary to the evolution of analog systems, where large circuits must have high errors, so that errors limit the measurement. “This current paper allows us to answer the question of the robustness of these systems with an unequivocal ‘yes,'” said lead author Ryan Hamerly, visiting scientist at the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory and senior scientist at NTT Research. .
Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS), leader of the Quantum Photonics Laboratory, and member of the RLE. The study was published today in Natural Communication.
Multiplication with light
An optical neural network is made up of many connected components that act as programmable, readable mirrors. These adjustable mirrors are called Mach-Zehnder Inferometers (MZI). Neural network data is encoded in light, which is output to the optical neural network from the laser.
A typical MZI consists of two mirrors and two beam splitters. The light enters the top of the MZI, where it is split into two interfering components before being combined by a second beam splitter and reflected below the next MZI in sequence. Researchers can use the interpolation of these signals to perform complex linear algebra operations, known as matrix multiplication, which is how neural networks process data.
But the potential errors in each MZI add up quickly as light travels from one device to the next. One can avoid certain errors by identifying them in advance and adjusting the MZIs so that earlier errors are canceled by later devices in the list.
“It’s a very simple algorithm if you know what the errors are. But these errors are known to be difficult to verify because you only have access to the inputs and outputs of your chip,” said Hamerly. “This motivated us to look at whether it is possible to create corrections without measurement.”
Hamerly and his collaborators previously demonstrated a mathematical approach that went a step further. They could successfully identify faults and fine-tune the MZIs accordingly, but even this did not eliminate all fault.
Due to the basic nature of the MZI, there are situations where it is not possible to tune the device so that all light exits through the bottom hole to the next MZI. If the device loses part of the light at each step and the array is too large, eventually there will be only a small amount of power left.
“Even with error correction, there’s a fundamental limit to how good a chip can be. “MZI cannot see the specific settings that need to be set up to meet them,” he said.
Thus, the team developed a new type of MZI. The researchers added an additional beam splitter to the end of the device, calling it 3-MZI because it has three beam splitters instead of two. Because of the way this additional metal splitter mixes the light, it is very easy for the MZI to reach a setting that needs to send all the light from the outside through the bottom hole.
Importantly, the additional beam splitter is only a few micrometers in size and is partially passive, so it does not require additional wiring. Adding more beam splitters does not significantly change the size of the chip.
Great chip, few flaws
When the researchers ran simulations to test their design, they found that it could remove a large amount of uncorrectable error that interfered with accuracy. And as the optical neural network gets larger, the error rate in the device actually decreases – the opposite of what happens in a device with conventional MZIs.
Using 3-MZIs, they can create a device large enough to be used with an error reduced by a factor of 20, Hamerly said.
The researchers also developed a variant of the MZI design specifically for relative errors. This happens due to manufacturing imperfections – if the size of the chip is wrong, the MZIs may be turned off by almost the same amount, so the errors are the same. They found a way to modify the configuration of the MZI to be robust to these types of errors. This technique also increases the bandwidth of the optical neural network to operate three times faster.
Now that they have demonstrated these techniques using simulations, Hamerly and his collaborators plan to test these methods on virtual hardware and continue to drive toward an optical neural network that they can use effectively in the real world.
This research was funded, in part, by a National Science Foundation graduate research grant and the US Air Force Office of Scientific Research.