1 | Error handling in QUIC code
|
---|
2 | ===========================
|
---|
3 |
|
---|
4 | Current situation with TLS
|
---|
5 | --------------------------
|
---|
6 |
|
---|
7 | The errors are put on the error stack (rather a queue but error stack is
|
---|
8 | used throughout the code base) during the libssl API calls. In most
|
---|
9 | (if not all) cases they should appear there only if the API call returns an
|
---|
10 | error return value. The `SSL_get_error()` call depends on the stack being
|
---|
11 | clean before the API call to be properly able to determine if the API
|
---|
12 | call caused a library or system (I/O) error.
|
---|
13 |
|
---|
14 | The error stacks are thread-local. Libssl API calls from separate threads
|
---|
15 | push errors to these separate error stacks. It is unusual to invoke libssl
|
---|
16 | APIs with the same SSL object from different threads, but even if it happens,
|
---|
17 | it is not a problem as applications are supposed to check for errors
|
---|
18 | immediately after the API call on the same thread. There is no such thing as
|
---|
19 | Thread-assisted mode of operation.
|
---|
20 |
|
---|
21 | Constraints
|
---|
22 | -----------
|
---|
23 |
|
---|
24 | We need to keep using the existing ERR API as doing otherwise would
|
---|
25 | complicate the existing applications and break our API compatibility promise.
|
---|
26 | Even the ERR_STATE structure is public, although deprecated, and thus its
|
---|
27 | structure and semantics cannot be changed.
|
---|
28 |
|
---|
29 | The error stack access is not under a lock (because it is thread-local).
|
---|
30 | This complicates _moving errors between threads_.
|
---|
31 |
|
---|
32 | Error stack entries contain allocated data, copying entries between threads
|
---|
33 | implies duplicating it or losing it.
|
---|
34 |
|
---|
35 | Assumptions
|
---|
36 | -----------
|
---|
37 |
|
---|
38 | This document assumes the actual error state of the QUIC connection (or stream
|
---|
39 | for stream level errors) is handled separately from the auxiliary error reason
|
---|
40 | entries on the error stack.
|
---|
41 |
|
---|
42 | We can assume the internal assistance thread is well-behaving in regards
|
---|
43 | to the error stack.
|
---|
44 |
|
---|
45 | We assume there are two types of errors that can be raised in the QUIC
|
---|
46 | library calls and in the subordinate libcrypto (and provider) calls. First
|
---|
47 | type is an intermittent error that does not really affect the state of the
|
---|
48 | QUIC connection - for example EAGAIN returned on a syscall, or unavailability
|
---|
49 | of some algorithm where there are other algorithms to try. Second type
|
---|
50 | is a permanent error that affects the error state of the QUIC connection.
|
---|
51 | Operations on QUIC streams (SSL_write(), SSL_read()) can also trigger errors,
|
---|
52 | depending on their effect they are either permanent if they cause the
|
---|
53 | QUIC connection to enter an error state, or if they just affect the stream
|
---|
54 | they are left on the error stack of the thread that called SSL_write()
|
---|
55 | or SSL_read() on the stream.
|
---|
56 |
|
---|
57 | Design
|
---|
58 | ------
|
---|
59 |
|
---|
60 | Return value of SSL_get_error() on QUIC connections or streams does not
|
---|
61 | depend on the error stack contents.
|
---|
62 |
|
---|
63 | Intermittent errors are handled within the library and cleared from the
|
---|
64 | error stack before returning to the user.
|
---|
65 |
|
---|
66 | Permanent errors happening within the assist thread, within SSL_tick()
|
---|
67 | processing, or when calling SSL_read()/SSL_write() on a stream need to be
|
---|
68 | replicated for SSL_read()/SSL_write() calls on other streams.
|
---|
69 |
|
---|
70 | Implementation
|
---|
71 | --------------
|
---|
72 |
|
---|
73 | There is an error stack in QUIC_CHANNEL which serves as temporary storage
|
---|
74 | for errors happening in the internal assistance thread. When a permanent error
|
---|
75 | is detected the error stack entries are moved to this error stack in
|
---|
76 | QUIC_CHANNEL.
|
---|
77 |
|
---|
78 | When returning to an application from a SSL_read()/SSL_write() call with
|
---|
79 | a permanent connection error, entries from the QUIC_CHANNEL error stack
|
---|
80 | are copied to the thread local error stack. They are always kept on
|
---|
81 | the QUIC_CHANNEL error stack as well for possible further calls from
|
---|
82 | an application. An additional error reason
|
---|
83 | SSL_R_QUIC_CONNECTION_TERMINATED is added to the stack.
|
---|
84 |
|
---|
85 | SSL_tick() return value
|
---|
86 | -----------------------
|
---|
87 |
|
---|
88 | The return value of SSL_tick() does not depend on whether there is
|
---|
89 | a permanent error on the connection. The only case when SSL_tick() may
|
---|
90 | return an error is when there was some fatal error processing it
|
---|
91 | such as a memory allocation error where no further SSL_tick() calls
|
---|
92 | make any sense.
|
---|
93 |
|
---|
94 | Multi-stream-multi-thread mode
|
---|
95 | ------------------------------
|
---|
96 |
|
---|
97 | There is nothing particular that needs to be handled specially for
|
---|
98 | multi-stream-multi-thread mode as the error stack entries are always
|
---|
99 | copied from the QUIC_CHANNEL after the failure. So if multiple threads
|
---|
100 | are calling SSL_read()/SSL_write() simultaneously they all get
|
---|
101 | the same error stack entries to report to the user.
|
---|