A review of floating point numbers in Zero-knowledge proof systems.
Floating-point numbers play a vital role in various fields, including artificial intelligence and scientific research, as they enable the accurate representation and manipulation of real numbers. However, when it comes to zero-knowledge virtual machines or domain-specific languages (DSLs), their direct implementation has yet to be created. In numerous instances, floating-point numbers are either not supported or replaced with alternative representations such as fixed-point, quantized, or soft-float numbers. Unfortunately, these substitute approaches often result in slower performance, reduced accuracy, and suboptimal outcomes. This blog will delve into the importance of floating-point numbers as they relate to zero-knowledge systems and review approaches for their integration.
What are Floating-point numbers?
Floating-point numbers are a way to represent real numbers in computer systems. They comprise two primary components: a significand and an exponent. The significand represents the precision of the number, while the exponent helps in determining the scale or magnitude. The combination of these two components allows for the representation of a wide range of numbers, both very large and very small, with a finite number of bits.
Floating-point numbers are represented in computers using a standard called the IEEE 754 Floating Point Standard. This standard defines the format, operations, and rounding rules associated with floating-point arithmetic. It is essential to have a standardized representation because it ensures consistent behavior and results across different platforms and programming languages.
Why can't we use them in ZK?
Numerous zero-knowledge systems operate within finite fields, posing a challenge for efficiently representing floating-point numbers. The majority of methods for handling floating-point numbers in Zero-Knowledge systems employ circuit quantization, which is a computationally expensive process that works, but leads to a reduction in accuracy.
An alternative approach for achieving more accurate floating-point operations involves bit decomposition. However, this method has its own drawbacks, as it results in longer verifier and proving times. Consequently, there remains a need for more efficient and accurate solutions to incorporate floating-point numbers in zero-knowledge systems.
Why do we want Floating-point in ZK?
If we ever want to make ZKP involving science, audio and signal processing, or artificial intelligence (ZKML), we are going to need efficient methods for dealing with floating point.
Let's look into the emerging space of ZKML for some ideas! There are two groups the EZKL Group and Daniel Kang's group. EZKL simply scales and truncates numbers given by the user model. This leads to lower-precision representations which result in a loss of model accuracy due to reduced numerical range and precision. This can lead to issues like quantization errors, rounding errors, or even model convergence problems during training. Low-precision may also cause numerical instability. Using ezkl users can more easily convert their models into a ZKML system — i.e. increased usability for a variety of use-case. We look forward to benchmarks in the future further showing the costs of ZK-ing ML models in this way.
Daniel Kang's group uses quantized models to begin with. This allows them to target the model to the specific problems that they want to solve. ImageNet is one such example that has had very good results. They use Halo2 which is a recursive proving system designed for ZCash. They compare it to previous works on DNN zk-SNARKs which use sum-check or Groth16. It turns out, using performance friendly zk-schemes with a system designed for the use-case indeed results in fast and accurate ZKML! They achieved "79% accuracy while being verifiable within 10s". The code and examples can be found on his Github here.
One issue with the pre-quantized model approach is that it is a bit analogous to writing custom circuits for each use case. The ML use-case needs to be made to work well with a pre-quantized model, and gadgets may need to be tailored as well. It is not a one size-fits-all approach. For more popular AI systems currently in-use, it may not be possible to use this method at all; Any accuracy loss, could cause significant degradation in the output quality and service.
There is one other method I have not mentioned yet. Avoiding using weights or floating-point all together. This is what Zero Gravity does. Zero Gravity uses a kickback to the days prior to the NN paradigm with their "Weightless Neural Network”. "A Weightless Neural Network (WNN) is entirely combinatorial. Its input is a bitstring (e.g. encoding an image), and their output is one of several predefined classes, e.g. corresponding to the ten digits." They use Bloom filters to "offer a space efficient method of representing the data of a RAM cell".
Zero Gravity is a recent hackathon project with the source code available here. Is it feasible to use this approach in production? The answer is probably not, mainly due to potential security concerns. If the prover has any influence over the inputs (x) fed into the neural network, they can manipulate these inputs to impact the results of the proof. Specifically, in this scenario, it is relatively easy to find an input x that causes the Bloom filter to perform poorly (produce false positives), meaning “x is in the set” when it is not.
To make this method secure, the Bloom filter would need to be configured to have a cryptographically low false-positive probability. However, achieving such a low false-positive rate would significantly increase the computational cost and resource consumption, making it impractical in many real-world applications.
What do ZKVM and DSL do?
ZKVM face many similar challenges as ZKML when it comes to handling floating-point numbers. In most cases, ZKVM either quantizes FP numbers or do not support them at all. Risc(0), for instance, employs the soft-float method, which emulates FP arithmetic using integers in software. While this approach works, it is relatively slow and not practical for applications like ZKML.
Another popular Domain-Specific Language (DSL), Circom, utilizes circuit quantization. This method involves scaling the numbers and then discarding the fractional part after applying a floor function, which leads to a loss of accuracy. Some systems resort to fixed-point schemes as an alternative. However, none of these approaches have proven to be an ideal solution for incorporating FP numbers into Zero-Knowledge systems.
The question that arises is whether people should redesign their scientific and AI problems to accommodate Zero-Knowledge systems' limitations, or if there exists a better way to handle FP numbers in these systems. The search for a more effective solution continues, with researchers exploring novel techniques and optimizations to enable the seamless integration of floating-point numbers in Zero-Knowledge systems without sacrificing accuracy or computational efficiency.
The future of FP in ZK?
Recent work on ZK-FP systems is very promising. An article published in late 2022 "Succinct Zero Knowledge for Floating Point Computations", demonstrates a scheme which has better performance when compared to naïve bit-decomposition. They do this by creating a batch range proof system. In this system they check that the output value is within a specific range of accuracy; *if it differs by a small amount this is just as good as checking the bit-decomposition. They claim their method is ∼ 57× faster than previous methods.
Why has this not been implemented yet? I think it would be a worth-while effort to do an implementation and a benchmark to test the claims in the paper. It is important to note two things about this work. 1) The method has increased soundness error. 2) The paper completely ignores lookup arguments which could work quite well with bit-decomposition (definitely better than the naïve case).
Lookup arguments are a specific technique utilized in ZK system, to prove that a given value belongs to a particular set or to verify that an operation was executed correctly using a predefined lookup table. Lookup arguments aim to enhance the efficiency of ZK proofs, making them more practical for complex computations by allowing the compression of specific operations.
We expect many advances to come from research and implementations using lookup arguments for FP in ZK. Our ZKWasm implementation is in the early stages of being able to support floating point numbers in ZK at optimal efficiency. Our first public implementation aims to be better than anything that currently exists, and we are doing the research and work to make this happen! We hope that our implementations of ZK-FP will be a positive contribution to open-source efforts and other ZK projects which need to support these operations.
*Expect updates in the future! 👋