描述
<h1>Introduction of ZKOracle</h1><p>This blog post summarizes my (Hyper Oracle CTO Norman’s) talk from zkDay at ETH Denver, which was originally titled "zkWASM, zkOracle and Programmability." Let's dive right in and start with the term "oracle." In the context of an oracle network, there are two types of oracles: input oracles and output oracles. Input oracles import data from off-chain to on-chain, while output oracles export data from on-chain to off-chain. We use this terminology to distinguish between categories of oracles.</p><p>Input and output oracles share a common characteristic: they need to handle many complex computations. This is where the value of the oracle network comes from. When we use an oracle network like The Graph's indexing service, we need to trust that the oracle node operates based on the correct computations and provides accurate data. However, oracles shouldn't work this way if users have to trust a centralized provider and services without the ability to verify them.</p><p>It would be even better if we could verify and trust the data we get without recomputing the entire computation. Fortunately, zero-knowledge proofs can help with this. That's why we want to introduce a new kind of oracle network called zkOracle Network. To define it, we describe the zkOracle network as a type of oracle network with verifiable pre-commit computation.</p><p>Another simple way to understand this is that we want the zkOracle network to handle most of the complex computation and generate a proof for it. This way, users can verify the data provided by the zkOracle node by merely checking the corresponding zk proof. As a result, users don't need to trust any centralized provider or service, like a trusted node, to utilize the oracle service. This will lay the foundation for a fully decentralized oracle network.</p><h2>Hyper Oracle Network - a Programmable zkOracle Network</h2><p>To implement this idea, we built the Hyper Oracle Network. I want to give you a quick overview of the high-level architecture of our Hyper Oracle Node. Essentially, there are three key components: Hyper Oracle Engine, Meta Apps, and zkGraph.</p><p class="ql-align-center"><span style="color: transparent;"><img src="https://mirror.xyz/_next/image?url=https%3A%2F%2Fimages.mirror-media.xyz%2Fpublication-images%2FpMHohcNhc1d0kRFZpUPxU.png&w=3840&q=75" alt="Hyper Oracle Node" height="1124" width="2000"></span></p><p>Hyper Oracle Node</p><p>The Hyper Oracle Engine takes in the on-chain block data, processes it with predefined off-chain computations, and outputs the Resulting Data. Simultaneously, it also generates a zk proof to prove the computations’ validity.</p><p>The output data then goes to the Meta Apps, such as zkIndexing and zkAutomation.</p><p>For zkIndexing, it provides a trustless indexing service for off-chain use cases. Developers can submit customized indexing logic to the Hyper Oracle network, and users can query any indexed data from any node running the zkIndexing service. They can then verify that data with the on-chain verification contract before using it.</p><p>For zkAutomation, it provides users the ability to define an auto-trigger that automatically sends transactions to the blockchain, triggering the target contract with a specific payload under certain conditions. Users can customize both the trigger condition check and the payload generation rules by submitting a customized zkGraph to the network.</p><p>The main difference between zkAutomation and traditional automation services is that zkAutomation verifies the triggering call on-chain before actually calling the destination contract, while traditional automations trust the oracle node and don't verify the completeness and soundness of the triggering calls.</p><p>For zkGraph, it is a novel structure introduced in Hyper Oracle, adding an important feature, i.e., programmability, to its zkOracle network nature. Developers can provide almost arbitrary customized off-chain data processing logic, structure it into the zkGraph, and the Hyper Oracle network will handle all the computations and proof generation for the developers. Users can then use and verify the resulting data.</p><p>You may be wondering what's inside the HO Engine. Let’s take a look.</p><h2>Hyper Oracle Node Engine and ZKWasm</h2><p class="ql-align-center"><span style="color: transparent;"><img src="https://mirror.xyz/_next/image?url=https%3A%2F%2Fimages.mirror-media.xyz%2Fpublication-images%2FmBO8Zf0Ig1GmfeaRQ5_Th.png&w=3840&q=75" alt="Hyper Oracle Node Engine" height="1124" width="2000"></span></p><p>Hyper Oracle Node Engine</p><p>As you can see, there are three main components inside the Hyper Oracle Engine: Filter, handleEvent, and zkWASM. The Filter is a fixed logic built into our Hyper Oracle node, which extracts the target events, transactions, and state variables from the block data. The handleEvent is defined in zkGraph, taking in the filtered results as a data source, performing off-chain computations, and outputting the resulting data. We also support other handlers like handleState, handleTx, but we use handleEvent here as an example.</p><p>You may notice something interesting here. Usually, to generate a zk proof, we need to program the circuit representing a fixed set of constraints. But in Hyper Oracle, we cannot even predict what the customized handleEvent logic submitted by the developers will be, making it hard to generate proofs. That's why we needed to build a zkVM.</p><p>Currently, we've built a zkWASM, a type of zkVM that supports WASM binary. When a developer submits a zkGraph, the network compiles the Filter and the handleEvent into a WASM binary. Then, we send both the Block Data and Resulting Data to zkWASM, which generates a zk proof for us. zkWASM can also generate and deploy corresponding verification contracts on Ethereum.</p><p>This is roughly how the Hyper Oracle Node Engine works.</p><h2>Filter and Handler</h2><p>Let's dive into the Filter and handleEvent parts.</p><p class="ql-align-center"><span style="color: transparent;"><img src="https://mirror.xyz/_next/image?url=https%3A%2F%2Fimages.mirror-media.xyz%2Fpublication-images%2F278ywhtdC71msSarNVWAD.png&w=3840&q=75" alt="Data Flow Diagram (left) and Pseudo Code (right) of Event Proof" height="1664" width="3006"></span></p><p>Data Flow Diagram (left) and Pseudo Code (right) of Event Proof</p><p>This is the data flow and the pseudo code for the Filter and the handleEvents. We start from the RLP-encoded raw receipt data in the middle of the diagram. First, we decode it and extract Events data. Second, we filter it to get the target events. Then, we send it to the customized logic, i.e., handleEvent, and finally get the resulting data. Also, we bind the raw receipt data to the block hash, with the help of the MPT path proof and the block header hashing.</p><h2>zkWASM</h2><p>Now let's talk about zkWASM. We need zkWASM because we want to generate proof for the unpredictable customized logic. zkWASM is the key to enable programmability in a zkOracle network.</p><p>I'd like to give a brief definition of WASM. WASM is a binary format, and a WASM binary can run on the WASM virtual machine. We can define the WASM runtime as a tuple of three elements. The first one is the WASM executable image, including the code section and the initial memory. The second is the entry point, which defines where the execution starts, and the third one is the standard input and output.</p><pre class="ql-syntax" spellcheck="false">WASM Virtual Machine
I: WASM executable image
C: code image
H: initial memory
E: the entry point
IO: the (stdin, stdout)
WASM run-time := (I(C, H), E, IO)
</pre><p>We also need to define the internal state and the execution trace. Generally speaking, an internal state consists of the instruction pointer, i.e. the program counter, the current calling frame, memory state, stack state and the global variables set. Therefore, each step in the execution trace <em>Ti</em> is a transition function between the previous state and the next state. With this definition, we can define the conditions that a valid WASM binary should meet.</p><pre class="ql-syntax" spellcheck="false">WASM run-time state S := (𝑖𝑎𝑑𝑑𝑟, F, M, G, SP, I, IO)
𝑖𝑎𝑑𝑑𝑟 : the current instruction address
F: calling frame
M: memory state
G: global variables set
SP: stack state
Exec Trace: [𝑡_0, 𝑡_1, 𝑡_2, 𝑡_3, · · · ], s_{i+1} = 𝑡_i(s_i)
Valid WASM should meet:
𝑡0 enforces semantic of op(E)
𝑠_{𝑘+1} = 𝑡_𝑘(𝑠_𝑘), 𝑡_𝑘 enforces the semantics of 𝑜𝑝(𝑠_𝑘.𝑖𝑎𝑑𝑑𝑟)
the last state 𝑠_𝑒: 𝑠_𝑒.F.𝑑𝑒𝑝𝑡ℎ = 0
</pre><p>With those definitions, let's start with the building blocks. First, we use a polynomial lookup table to enforce the basic integer types because, similar to other programming languages, WASM has specific integer formats like i32 and i64. We need to build a polynomial lookup table to enforce this integer range check.</p><pre class="ql-syntax" spellcheck="false">Basic Int Types
i32 / i64
𝑥 < 2^32 (or 𝑥 < 2^64)
Tn = [0, 2^n-1]
𝑝𝑙𝑜𝑜𝑘𝑢𝑝(Tn, x) = 0
</pre><p>Then, we transform the code section and the initial memories into two lookup tables, <em>Tc</em> and <em>Th</em>. Later on, we can validate whether a given execution of instructions and an access log to the memory are valid.</p><pre class="ql-syntax" spellcheck="false">Representing Map With PLookup
WASM run-time := (I(C, H), E, IO)
C: 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 -> 𝑂𝑝𝑐𝑜𝑑𝑒
H : 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 -> U64
Build tables
Tc : 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 × 𝑂𝑝𝑐𝑜𝑑𝑒
Th : 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 × U64
By doing this:
if (𝑖𝑎𝑑𝑑𝑟, 𝑜𝑝) ∈ Tc -> the opcode on 𝑖𝑎𝑑𝑑𝑟 is 𝑜𝑝
if (𝑣, 𝑎𝑑𝑑𝑟) ∈ Th -> the initial value at address 𝑎𝑑𝑑𝑟 is 𝑣
</pre><p>With the initial memory lookup table established, we can then validate the dynamic state accessing with a similar trick. The basic idea here is that we need to enforce the semantics of all three access types of the state access in operations, init, write, and read, which are not difficult to define the constraints for.</p><pre class="ql-syntax" spellcheck="false">Valid Dynamic State Accessing with PLookup
- recall: [𝑡_0, 𝑡_1, 𝑡_2, 𝑡_3, · · · ], s_{i+1} = 𝑡_i (s_i)
- select all 𝑡_i, s.t. at least 1 read/write access to cur state S (i.e. memory M, stack SP or global G)
- define access log item: (𝑡𝑖𝑑,𝑚𝑖𝑑, 𝑎𝑐𝑐𝑒𝑠𝑠𝑇𝑦𝑝𝑒, 𝑎𝑑𝑑𝑟𝑒𝑠𝑠, 𝑣𝑎𝑙𝑢𝑒)
- 𝑡𝑖𝑑 and 𝑚𝑖𝑑 indicates the index of the access operation
- 𝑎𝑐𝑐𝑒𝑠𝑠𝑇𝑦𝑝𝑒
Init memory: (𝑡𝑖𝑑,𝑚𝑖𝑑,𝑖𝑛𝑖𝑡, 𝑎𝑑𝑑𝑟, 𝑣) := {𝑠.𝑎𝑑𝑑𝑟 = 𝑣; }
Write value 𝑣 to memory: (𝑡𝑖𝑑,𝑚𝑖𝑑,𝑤𝑟𝑖𝑡𝑒, 𝑎𝑑𝑑𝑟, 𝑣) := {𝑠.𝑎𝑑𝑑𝑟 = 𝑣; }
Read from memory: (𝑡𝑖𝑑,𝑚𝑖𝑑, 𝑟𝑒𝑎𝑑, 𝑎𝑑𝑑𝑟, 𝑣) := {𝑎𝑠𝑠𝑒𝑟𝑡(𝑠.𝑎𝑑𝑑𝑟 ≡ 𝑣); }
- basic idea: enforce the semantic of init, write, read
</pre><p>However, we encountered a problem when building zkWASM. During the execution of a given WASM binary, the memory addresses accessed are usually distributed randomly, making it quite hard for us to track the data and describe the constraints. So, after rounds of testing, our final solution is quite simple. We group up the access logs in each execution step by the memory addresses being accessed. Then, based on this sorted and grouped access log table, we can clearly track the access and write down the constraints.</p><p class="ql-align-center"><span style="color: transparent;"><img src="https://mirror.xyz/_next/image?url=https%3A%2F%2Fimages.mirror-media.xyz%2Fpublication-images%2FguLra8t-YcSawnM0e6drE.png&w=3840&q=75" alt="Valid Dynamic State Accessing with Lookup" height="912" width="1716"></span></p><p>Valid Dynamic State Accessing with Lookup</p><p>And then, here's the final architecture of zkWASM. Let's break this down into four steps.</p><p>In the first step, we have the Image Setup. First, we send out a WASM image binary to the zkWASM virtual machine. When we receive a WASM binary, we divide it into sections, and we only care about the important sections that affect and decide the execution trace, which are initial memory sections, code section, global data section. And then, in a setup stage, we transform the code section, and data section into two lookup tables <em>Ti</em> and <em>Th</em> that should correspond to image circuit and init memory circuit.</p><p class="ql-align-center"><span style="color: transparent;"><img src="https://mirror.xyz/_next/image?url=https%3A%2F%2Fimages.mirror-media.xyz%2Fpublication-images%2Fmj1dGdPxp_7JoIJaBWEaJ.png&w=3840&q=75" alt="ZKWASM Architecture" height="984" width="1750"></span></p><p>ZKWASM Architecture</p><p>In the second stage, we receive a sequence of valid input for running the given WASM binary. Then, we can generate the execution trace using the standard WASM runtime interpreter. But, to make it clear, we do not need to trust the interpreter because we have very strict constraints on all the semantics of each instruction. So, even if the interpreter produces the wrong execution trace, the constraints will be broken, and the verification won't pass.</p><p>The third step is, after we have the execution trace, we can fill up our five lookup tables based on the execution trace. And we will combine the memory, global, and stack access log table into one huge memory access circuit.</p><p>And then, now we have the circuit table and the pre-defined constraints in between, we can generate the proof for the whole execution trace.</p><p>Having this architecture, the only thing left is that we need to implement the circuit for each instruction. We won't go into the details here.</p><p>Additionally, I'd like to mention a technique we used to solve the long execution trace problem, called partition badging technique. When we generate a huge execution trace, the circuit tables we build might be too big, causing too much overhead and even exceeding the performance limit. Our solution to that is very simple. We basically make partitions on the long execution trace and then batch them together after generating the proofs.</p><p>Now, when you look back at the title, I hope you have a basic understanding of the deep connections between those three words.</p><p>We've talked about what a zkOracle network is, how to introduce Programmability in zkOracle with the help of zkWASM, and how to build a zkWASM. This is part of the work that we are building in Hyper Oracle. It is a long journey, but we will eventually get there and provide the Unstoppable Decentralized Programmable zkOracle Network for web3.</p><p>Follow us on <a href="http://www.twitter.com/hyperoracle" rel="noopener noreferrer" target="_blank">Twitter</a> and join our <a href="https://discord.gg/PW3Ncdvu" rel="noopener noreferrer" target="_blank">Discord</a> to stay up to date with Hyper Oracle. Learn more about Hyper Oracle on our <a href="http://www.hyperoracle.io/" rel="noopener noreferrer" target="_blank">website</a>.</p>