Alternative cryptography: Cryptography based on codes

Share it

In case you wish to grasp the concept of post-quantum cryptography and its significance, refer to the initial segment of my series.

On August 24, 2023, the National Institute of Standards and Technology (NIST) released its preliminary version of post-quantum algorithms. The mechanisms underlying these algorithms were elucidated in part 3 (lattice-based cryptography) and part 2 (hash-based signatures) of this series. This poses the question: With NIST already presenting practical post-quantum substitutes for the RSA and ECC algorithms, why explore other technologies? The rationale lies in the novelty of lattice-based cryptography. Hence, having an alternative in the form of error correction codes as a cryptographic primitive would be advantageous in the event a comprehensive resolution for the underlying lattice issues or the various derived module-based lattice dilemmas surfaces.

Essentials

Error correction codes are digital patterns employed to securely transmit data through unreliable conduits. By encoding data into packages with surplus redundant bits, the reliability of transmission over noisy channels can be heightened. These bigger packages, known as code words, possess specific valid code word values within their space. Corruption of bits in a noisy channel would render a code word invalid, facilitating error detection. Furthermore, with ample spacing between code words, it becomes plausible to deduce the possibly corrupted code word and rectify the error. The assessment of this spacing, known as Hamming distance, involves measuring the disparity in bits between each code word. Throughout this discourse, ‘distance’ and ‘closeness’ will symbolize the Hamming distance. The concept of Hamming weight, denoting the number of set bits in a value, is closely related to Hamming distance. Error values intentionally introduced in encryption schemes typically opt for sufficiently secure Hamming weights that are also correctable.

This technique has been comprehended for decades and has been structured into a comprehensive mathematical system. Transformation from a fundamental value (a) intended for transmission, to a code word (w) dispatched down the channel, is executed by viewing the base value as a bit vector and multiplying it by a matrix G, known as a generator:

w = a . G

Generators are formulated by combining two arrays: I (identity matrix with ones on the diagonal and zeros elsewhere) and P (parity matrix encompassing error coding values).

This configuration, termed standard form, maps a basic value (a) to a larger dimensional code word (w) composed of a concatenated with parity information (p). Not all values of the size of w delineate a valid code word mapping to a specified a. Therefore, w proves useful in detecting corruption if the received value on the channel (w’) is an invalid code word. Moreover, if the size of w substantially supersedes that of a, the augmented gaps between various code words, combined with minimal errors, enable the utilization of the corrupted value w’ to discern the original w, subsequently determining the original a. To rectify w’, the P matrix can be employed to compute a new matrix termed the parity check matrix H.

Subsequently, utilize H to compute s (known as the syndrome) through:

s = H . w’

If w’ constitutes a valid code word, the syndrome (s) will equate zero. Thus, H garners the title of the parity check matrix. Moreover, provided that s deviates from zero, and the error in w’ resides within the acceptable threshold, s aids in error recovery, where w’ = w + e, consequently surfacing the original w = w’ + e (utilizing binary arithmetic where addition corresponds to XOR and presents equivalently to subtraction, thus e + e = 0). Regrettably, in general scenarios, associating s with e entails exploring all viable errors and determining their anticipated s values, archiving them in a table, then referencing the error from the table using s. Meticulous selection of H, however, enables the creation of a function supplying e directly from s. Following error identification (e), the precise w can be ascertained and the original a recovered. This elucidates the formulation of a function Dg derived from H, essentially transforming a corrupted w’ into the original a.

Streamlining syndrome-to-error mapping

Hamming codes

Multiple schemes exist for constructing H such that efficiently determining the error from the syndrome is feasible. The prevalent approach involves crafting a parity matrix (P) demanding log2(n) bits (n being the code word size w). A parity check matrix effective for this encoding would assume the subsequent layout:

With the generator, G, structured as:

Upon calculating the syndrome, the ensuing value s pinpoints the singular altered bit. For instance, supposing the original message was a = 1 1 0 0. Multiplying a by G yields w = 1 1 0 0 0 0 1. If the primary (higher) bit undergoes corruption (w’ = 0 1 0 0 0 0 1), s = Hw’ = | 1 1 1 | = 7. In cases of other altered bits, parallel methodology applies accordingly. The s directly identifies the corrupted bit, except in instances of the concluding or initial parity bit corruption, leading to reciprocal outcomes. By ensuring half the row bits in the parity check matrix (H) depict 1, this strategy leverages efficient implementation for both encoding and error correction. Despite their popularity, these are not the most skillful encoding system for remedying multi-bit errors with minimal code word sizes – a domain ruled by Goppa codes.

Goppa codes

Contrived to attribute an optimal mapping of maximum correctable error against minimal w and a difference, Goppa codes entail choosing diverse base values and functions, composited through powers of these magnitudes and multiplications with the polynomials of said functions. Error determination involves translating the syndrome into a polynomial, further processing the polynomial into an oracle function for assessing each error bit, discerning its binary state. Although the intricate details are beyond this discussion, the complete algorithm is readily accessible here.

Low density parity check codes

While Goppa codes exude efficiency, they navigate complexity hurdles. Alternatively, the low density parity check scheme restricts the count of ‘1’ bits in the parity check matrix. This streamlined configuration allows effective result decoding while facilitating error correction proficiency for manifold errors. The scarcity of ‘1’ bits within the matrix engenders minimal connections between error bits and syndrome bits, permitting the construction of probabilistic algorithms. A ‘bit flipping decoder’ methodology, commencing with an initial error bit estimate, iterates the refinement process based on the probability of inaccurate predictions, gradually edging closer to the accurate answer. Although the success probability encompasses a probabilistic facet, the low failure probability is readily achievable.

Transmuting Goppa codes into a public key encryption mechanism

In 1978, Robert McEliece transmuted Goppa codes into a cryptographic framework following these guidelines:

1. Generate a key pair:

Fashion a Goppa code H (featuring a reversible function Dg)
Instate H in standard form H=[P^T||I]
Compute G=[I|P]
Opt for an invertible permutation array P_r (rearranging column sequences devoid of altering values) and an additional invertible random array S
Calculate Gp=SGP_r
The public key assimilates the array Gp
The private key embodies Dg (invertible function for G derived from H), S^-1 and P_r^-1

2. To encrypt a message:

Compute c=mGp+e where e is a noise vector and m is the message, transformed into a vector
Send ciphertext c

3. To decrypt the message:

Compute m= Dg(cP_r^-1)S^-1

The arrays S and P_r obscure the intricacies of the original matrix G from potential assailants, barring access to H and thwarting attempts to engage the Goppa algorithm for error retrieval. It’s imperative to note that the encryption method closely mirrors lattice operations, where Gp functions as a tailored lattice array.

The primary constraint here is the significantly expansive public key, Gp, which occupies a magnitude of megabytes. Despite withstanding decades of cryptanalysis, ensuring its robust security. A slightly modified rendition is a Round 4 contender in the NIST post-quantum “contest” under the moniker Classic McEliece, as explicated below.

Numerous alternative error-correcting code schemes have surfaced to supplant Goppa codes in the McEliece system, albeit many have been compromised. Endeavors to formulate a signature framework founded on error-correcting codes have also encountered vulnerabilities.

Code-based encryption systems in NIST Round 4

In the aftermath of Round 3 of the NIST Post Quantum Cryptography Standardization, four algorithms earned recognition for standardization: Crystals-Kyber, Crystals-Dilithium, Falcon, and Sphincs+. Crystals-Kyber operates as a lattice-based key-encapsulation apparatus (KEM), while Crystals-Dilithium and Falcon specialize in lattice-based signatures. On the other hand, Sphincs+ assumes the role of a stateless hash-based signature schema. A detailed analysis of these is laid out in part 3 and part 2 of this series. The remaining signature algorithms from round 3 (Rainbow, GeMSS, and Picnic) succumbed to breaches, consequently prompting the transition of the remaining non-lattice-based KEM algorithms to Round 4. Among these, Classic McEliece, BIKE, HQC, and SIKE emerged, of which SIKE fell victim to breaches, leaving the residual trio as code-based contenders. Subsequent descriptions adopt the varied nomenclature and formatting from the specifications for each standard, offering a unique insight into their characteristics.

Classic McEliece

Classic McEliece revisits Robert McEliece’s original configuration. departing from message encoding by a generator (G), the scheme pivots towards leveraging the parity check matrix (H) for generating a syndrome (s) originating from a random error variable (e). This e is utilized within a key derivation function (KDF) to spawn the encapsulated key. Decapsulation necessitates isolating the e value from the syndrome and integrating it into the KDF. In scenarios where decoding falters, a fabricated KDF is invoked to veil shortcomings, preempting oracle formation. With the parity check matrix structured in standard form (Pt || I) and I denoting the familiar identity Matrix, the public key merely incorporates the Pt segment, designated T as per the specifications. The procedural steps for Classic McEliece encompass:

1. Generate a key pair: