RtRPC

Feature Name: Pleiades Wire Protocol v1.0 (rtRPC)

Status: draft

Start Date: 23 August 2023

Authors: Sienna Lloyd [mailto:sienna@linux.com sienna@r3t.io]

= Summary =

Pleiades v3's internal architecture is removing a dependency on gRPC and other RPC frameworks to gain some technical independence and flexibility. This also comes with an added burden of needing to define a dedicated wire protocol. This document defines the layout of the v1 wire protocol, and how to successfully implement it. Generated clients and servers are well outside the scope of this document; however this protocol enables easy code generation.

[!info] The intent of this protocol is to be simple enough that novice systems programmers can implement clients while also being powerful enough to meet long-term needs.

= Motivation =

"[!tldr] gRPC sucks"

gRPC is slow, heavy, and is focused on supporting their largest consumers. The technology is old, stale, and in some languages, such as Go, completely hardcoded implementations are the defaults. Other RPC frameworks are either minimal in their support, or are lacking substantial features. Pleiades v3 internal architecture can't continue to progress while also maintaining ties to gRPC.

The technical motivation is less is more. To effectively meet the performance requirements of Pleiades at scale, the networking protocol must also be performant. gRPC is HTTP-based, where as the v1 wire protocol will be UDP-based with QUIC. This means Pleiades nodes and clients can connect and immediately send data without roundtripping via 0-RTT, as well as bidirectional streaming over individual streams, muxing over multiple streams, or any other pattern.

[!info] For more information on QUIC, read RFC 9000. It's long but worth the read.

By removing the dependency on gRPC, we also free up access to Pleiades overall, and only limit the access via QUIC and protocol buffers. While this new format will remove gRPC, it does continue with protocol buffers. Protocol buffers are an industry standard of data encoding and changing away from them only makes data interfaces more difficult.

This design also allows Pleiades to have a very simple, but heavily muxed service implementation for RPC-style services.

= Technical design =

The Pleiades Wire Protocol v1 (PWP) is simple in architecture, but detailed in implementation. As an important piece of context, PWP is based around the concept of streaming, instead of call and response. Stream programming is a different functional architecture than call and response, and as such different architectural decisions are made.

Generally, there are a few core constructs in PWP:


 * stream pairs
 * magic bytes
 * payloads
 * contexts

Stream pairs are sets of two bidi streams within a single QUIC connection. The first stream allows for negotiation of the second stream. As QUIC supports $$2^{64}-1$$ streams, it's much simpler to set up separate request and response streams than trying to mux over a single stream; re: a stream pair.

Magic bytes are fairly straightforward and provide basic context within a stream used as control opportunities.

Payloads are just that, and come in two forms: metadata bytes and messages. Metadata bytes are short, simple -style metadata response payloads that answer simple questions. Messages are protobuf encoding payloads that contain application-level requests and responses.

Contexts are administrative references used to understand and debug a stream pair. Contexts are generally abstract, but can be concretely implemented.

With these core constructs, an entire RPC-style contract can be built with minimal effort on top of QUIC. QUIC provides the base streaming abstractions for us, and there's very little that we have to do to set that up. A key takeaway about the core technical design is the distinct lack of framing. Framing involves a significant overhead and expects significant inconsistencies in the transport. As QUIC provides ordered streams with retry buffers, Pleiades is guaranteed to get composited messages in order. Structured framing provides no real value for high maintenance costs. However, frame synchronization is a key architectural takeaway that is being kept.

[!info] For more information on framing, see the Wikipedia article on frame design. Frame synchronization in PWP is less about frames and more about stream synchronization. Ultimately, the difference between frame synchronization and stream synchronization is the chunk of data which is parsed. For more classical framing, such as ethernet or TCP frames, there are standard packet transmission sizes that inform the reader of how much information to read, parse and return. Framing requires larger buffers, more memory allocation, and more processor cycles to manage. In leaky or inconsistent environments, this is a reasonable tradeoff, but the value of QUIC is that it abstracts this for us at the lowest levels. To a client, a QUIC stream is a guaranteed delivery data stream - QUIC a hardline into a switch vs TCP's wifi connection.

As PWP is a streaming protocol, not a call and response protocol, the frameless design allows for ridiculously small signals to be transmitted across the wire but provide massive control contexts. As an example, with only 16 total bytes transmitted, a server and client will have established an entire service construct ready for application-level messages to be passed back and forth. If we include version checking, it adds an extra 2 bytes, bringing the total byte transmission to be 18. As a comparison, just the frame headers of HTTP/2 requires 18 total bytes without the payload, there's no inferable context, and the call and response has been completed. HTTP/3 uses the same semantics, however it is implemented on top of QUIC instead of native TCP.

The value of using a streaming protocol is through timing and per-client throughput decisions. For example, a client could open a connection, create the initial stream, send the  magic byte, wait for the response stream magic byte and it's respective payload, send the service type request, wait for the response, and continue operations in a synchronous fashion. There is nothing wrong with that client design, and it would work well for mobile devices or low-end clients with performance limitations. However, a client could do everything from opening the connection to the first RPC message without ever receiving a response from the server. This allows for immediate communication at transmission speeds, and all the client has to do is operate on the order of the payloads it receives and it will have achieved the same end result, but in a fraction of the time.

[!tldr] Context is a construct from graph computing. Context is the localized relevance of something as it relates to a command or operation. Contexts in PWP are set by the magic bytes, and can change the set of operations an implementing client is using. ## Stream Pairs [!todo] Finish this section

Magic Bytes
Magic bytes are just contextual bytes of information that help clients and servers understand the varying states of a stream. Magic bytes are strictly  values that represent the state of an overall stream. Below is the table of magic bytes for PWP v1.0.

Each of these magic bytes represents a different set of information. For the most part, it should be easy to understand how they work.

Payloads
These payloads are split into two core types: metadata and messages.

Metadata payloads are specific to each of the magic bytes as simple responses to simple requests. The max size of a metadata payload is the value of a, or 8 bytes.

Messages are application-specific payloads used in requests, responses, and application-level stream messages. Messages are encoded protocol buffers for specific bundles services.

Metadata Payloads
These payloads create, enrich, or change the contexts of a stream pair for clients and servers. Some values are dictated by the protocol, whereas some

Contexts
= todo =

Versioning
= todo =

Drawbacks
...

Rationale and Alternatives
...

= Explain it to folk outside your team =

Audience: PMs, doc writers, end-users, Pleiades contributors in other areas of the project.

= Unresolved questions =

Audience: all participants to the RFC review.