Skip to content

design-notes.md: Formatting markers for markdown #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 65 additions & 19 deletions design-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,11 +164,13 @@ In case of #accept, we need to fulfill and return a new promise each time we acc

Perhaps the idea above can be generalized for other commands too like read & write. If given a block then continue running and return a Promise for every X bytes read/written.

```ruby
read = NonblockingCommand.new(mailbox: outbox, with_block: true/false) do
# not sure...
# if with_block is true, then generate a new promise for every iteration. If no block,
# this this is a one-shot.
end
```

I might need to back up here and write the end user code with an ideal API. That API can then drive some of these decisions.

Expand Down Expand Up @@ -247,6 +249,8 @@ The `Sync` module is the namespace dedicated to all classes handling blocking IO
While the underlying operating system may allow passing flags to allow for non-blocking behavior, this class does not have any special support that operation at all. It would need to be explicitly handled by the programmer. For nicer non-blocking or asynchronous IO work, see `Async`.

## Inheritance Tree

```
Sync
|
|---------------------------------------------------------------------
Expand All @@ -272,11 +276,13 @@ FCNTL Stat Config Timer Block Stream IOCTL
|- RAW
|- UNIX
|- Pair
```

All SyncIO works on ASCII-8BIT. To use Encodings, create an IO::Transcoder and pass in the SyncIO::Block or SyncIO::Stream instance. Do all read/write operations on the transcoder instance.

AsyncIO will inherit from SyncIO::Block, SyncIO::FCNTL, and SyncIO::Stat to avoid duplication of effort. None of these operations are truly non-blocking/asynchronous so they will be handed off to a thread pool for completion. The AsyncIO::Stream and AsyncIO::Timer classes can be truly non-blocking so this code may or may not inherit generic functionality from the SyncIO parent classes. To be determined.

```
Async
|
|-----------------------------------------------------------------------------
Expand All @@ -299,7 +305,9 @@ Block Stream Timer FCNTL IOCTL Stat "Enum
|- RAW
|- Pair

```

```
IO
|
|---------------------------------------------------------------------------------
Expand All @@ -310,6 +318,7 @@ Internal Async Sync Config Constants Tra
|-- Private
|
|-- Platforms -- Functions, et al
```

Each SyncIO or AsyncIO io object will have a stat function so we can retrieve information on the open file descriptor. That method will likely delegate to Stat.fstat which will return an instance of Stat. This way we stat the file once and can (at our leisure) get various details from it such as birthtime, file type, etc. without re-stat'ing the file for each inquiry. Stat.lstat and Stat.stat will also exist and behave similarly.

Expand Down Expand Up @@ -343,26 +352,32 @@ NEXT
* do NOT get sidetracked by fcntl, rearranging inheritance hierarchies, etc. Implement the MINIMUM necessary to get the above bullets done!! Refactor later!!
* audit for imperative shell, functional core

```ruby
def write_append(&blk)
offset = Stat.fstat(@fd).length # get file size
loop do
bytes = yield(self, offset)
offset += bytes
end
end
```

Use like:

```ruby
write_append do |io, current_offset|
bytes_written = io.write(offset: current_offset, buffer: some_string)
bytes_written # must pass back number of bytes written to adjust offset
end
```

NOTE
To prevent writes from being lost when the Ruby runtime is exiting, the main thread will need to somehow join on the IOLoop and signal it to exit upon completion. I imagine this means the IOLoop will need to deregister all of its FDs and let the loop cycle at least once to flush any pending writes. This could be tricky.

UNIX domain sockets do NOT provide out-of-band data like TCP sockets. Neither do UDP sockets.

All socket combinations:

AF_INET, SOCK_STREAM, IPPROTO_TCP => TCP v4 socket (sockaddr_in)
AF_INET, SOCK_DGRAM, IPPROTO_UDP => UDP v4 socket (sockaddr_in)
AF_INET6, SOCK_STREAM, IPPROTO_TCP => TCP v6 socket (sockaddr_in6)
Expand Down Expand Up @@ -394,6 +409,7 @@ Each of these internal behavior objects responds to all methods though they will
things based upon the state.

Delete a message to the internal context safely.
```ruby
def safe_delegation #(&blk)
@mutex.synchronize do
yield(@context)
Expand All @@ -405,8 +421,11 @@ def change_context_to(klass, **args)
# only set at instantiation
@context = klass.new(**args)
end
```

Called like:

```ruby
def bind(addr)
safe_delegation do |context|
rc, errno = context.bind(addr)
Expand All @@ -416,13 +435,15 @@ def bind(addr)
[rc, errno]
end
end
```

By convention the @context ivar is only directly accessed in #safe_delegation and #change_context_to. Those are the only two methods that can read or modify the ivar directly. Everwhere else that ivar is used within a block where it's passed in as a block argument. This prevents those blocks from resetting @context (though they could... but it would defy the convention).

We will probably need to nest the behavior change down one level. The outer shell should have no notion of current state. It delegates to the behavior object by sending a message to it. The return value may include a new behavior to take the place of the old behavior. So, the shell needs to manage the mutex so that only one state/behavior change can be in flight at a time.

Example:

```ruby
class SocketShell
def inititalize(**args)
...
Expand Down Expand Up @@ -464,6 +485,7 @@ class Open
[behavior, rc, errno]
end
end
```

GETADDRINFO
This is a real workhorse of a function.
Expand All @@ -474,7 +496,9 @@ ai_protocol: IPPROTO_UDP or IPPROTO_TCP or IPPROTO_RAW
SOCKET CREATION
The call to socket(2) allows for the programmer to pass the domain, type, and protocol directly. However, I am making the choice to not expose this particular method. To create a socket, the creation must use the values from a valid addrinfo struct. So the process to create a socket will look something like this:

```ruby
socket = Socket.open(hostname: host, port: service, tcp4: true)
```

https://stackoverflow.com/questions/5385312/ipproto-ip-vs-ipproto-tcp-ipproto-udp

Expand Down Expand Up @@ -505,22 +529,23 @@ Note that TruffleRuby just improved #gets and #each by making sure to only insta
The Transcode class(es) will provide their own #each methods that work on characters. When reading unicode, we know that 16-byte chars and 32-byte chars always take the same amount of space. Converting from bytes to chars is a simple multiplication. UTF-8 is trickier since a char can be anywhere from 1 to 4 bytes long. To read 80 chars requires reading AT LEAST 80 bytes and perhaps as many as 320 bytes.

The methods in original Ruby IO that support enumeration of the IO stream are:
IO.foreach
IO.readlines
IO#each
IO#each_byte
IO#each_char
IO#each_codepoint
IO#each_line
IO#getbyte
IO#getc
IO#gets
IO#readbyte
IO#readchar
IO#readline
IO#readlines
IO#ungetbyte
IO#ungetc

- IO.foreach
- IO.readlines
- IO#each
- IO#each_byte
- IO#each_char
- IO#each_codepoint
- IO#each_line
- IO#getbyte
- IO#getc
- IO#gets
- IO#readbyte
- IO#readchar
- IO#readline
- IO#readlines
- IO#ungetbyte
- IO#ungetc

Several of these are duplicates of each other and differ only in how EOF is handled (return empty string or raise EOFError). Additionally, several of them take in multiple options depending on their position in the method argument list (each, readlines). Lastly, some methods differ in that they update a global variable after each line (or not).

Expand All @@ -532,13 +557,15 @@ IO.readlines, IO#each_char, IO#each_codepoint, IO#each_line, IO#getc, IO#gets, I

The basis of all is IO#each. In current Ruby, IO#each works on lines/strings. For a system that works with ASCII_8BIT this kind of doesn't make sense. Let's suggest that #each only works on bytes and will read those bytes with some +limit+. The other methods can compose their operations on top of the work that #each performs.

```ruby
def each(offset:, limit:)
...
end

def each_byte(offset:)
each(offset: offset, limit: 1) { |byte| yield(byte) }
end
```

The above looks nice, but those in the know will see a performance problem. There will be lots of method overhead to read a single byte at a time from a stream. Since we always specify the starting offset, the #each method could intelligently buffer data internally. It will maintain state local to its own method. We might waste some effort reading more than we need to read but that will be neglible over time. Anyone who is truly worried about that can just issue #read(offset:, length:) commands directly which will do NO buffering.

Expand All @@ -558,6 +585,7 @@ Create a LimitReader class that provides the #each functionality. It should be a
FFI Select & FDSet
FFI can't wrap macros. Luckily the select(2) macros are fairly simple and the fd_set struct is the same on all platforms. FD_SETSIZE is a max of 1024 and defaults to 64 on some platforms. We'll always allocate 1024.

```ruby
class FDSet < FFI::Struct
# Not sure how the bits are laid out in memory, so some experimentation will be necessary
layout \
Expand Down Expand Up @@ -601,6 +629,7 @@ class FDSet < FFI::Struct
nibble_index = (index % 8) - 1 # remember to use 0-based indexing
end
end
```

ASYNC SLEEP & TIMERS
Here's how it should work...
Expand All @@ -625,16 +654,19 @@ It's really hard to get away from exceptions though in a language that allows so

So let's puzzle through this. If the above is accepted as fact ("mutiny"), then we can't avoid exceptions in all cases even when the programmer sets the policy to ReturnCode. Let's say that only direct IO calls (open, read, write, close, and its ilk) conform to the ReturnCode protocol. All other code (that only does a syscall *indirectly* by calling a method that conforms to ReturnCode) will interpret those codes and raise them as exceptions. Maybe I can make it super easy on myself by providing something like ReturnCode#to_exception so it transforms itself.

```ruby
def foo
return_code = somesyscall(...)
raise return_code.to_exception if return_code.error?
end
```

SYSCALL ERRORS
Oh, Ruby, your inconsistency is killing me. See this note from the IOError rdoc: "Note that some IO failures raise SystemCallErrors and these are not subclasses of IOError:"

Yep, IO can raise Errno errors, IOErrors, and of course, EOFErrors. Time to pull all of these under one umbrella and make sense of them.

```ruby
class IO
module Error
module Errno < SystemCallError # just like current Ruby
Expand All @@ -647,17 +679,20 @@ class IO
end
end
end
```

SPECS
All specs should be written as shared specs. That is, any spec that passes in the Sync classes should also pass for the Async classes. This goal may not be possible to achieve; we'll see.

UNICODE
FYI, you can use Array#pack and String#unpack to build arbitrary UTF-8 strings. I ran this code to convert the numbers 0 to 255 into UTF-8 and then back into integers so I could see the encoding.

```ruby
array = 256.times.to_a.map {|v| v}
utf8 = array.pack("U*")
puts "size #{utf8.size}, bytesize #{utf8.bytesize}" => size 256, bytesize 384
bytes = utf8.unpack("C*")
```

Turns out that representing 0 to 255 (integer) in UTF-8 takes 384 bytes. From 0 to 127 UTF-8 can represent normal ASCII in a single byte. For 128 to 255, it takes a second byte to represent the codepoint.

Expand Down Expand Up @@ -701,10 +736,13 @@ As a side benefit, __read__ will be available for truly unbuffered reads. Anyone

NON-POSIX FUNCTIONS
OSX has kqueue. Linux has epoll. Both have #writev, #readv, and a bunch of others that are not part of POSIX. These non-POSIX functions could be added to a PlatformsMixin... Platforms::Mixin::Linux and Platforms::Mixin::BSD or whatever. During instantiation of a class, it could do something like:

```ruby
class Bar < Foo
include Platforms::Mixin::Linux if Platforms.linux?
include Platforms::Mixin::BSD if Platforms.bsd?
end
```

This would allow us to provide support for platform-specific functions at runtime. I'm sure there's more than one way to skin this cat, so play around with a few options.

Expand All @@ -720,7 +758,7 @@ Returns a Config::File object. Given a `fd` argument, the `path` argument is ign

Given a `path` argument with a trailing slash, the slash will be removed.

```
```ruby
config = SyncIO::Config.file_from(fd: 1, path: '/path/will/be/ignored')
io = SyncIO.new(file_config: config)
```
Expand Down Expand Up @@ -775,7 +813,8 @@ If source_offset is given, it specifies the start position of the copy.

Both source and destination are closed upon completion.

Raises an IOError in the following situations:
Raises an IOError in the following situations:

* source does not exist
* destination location is not writable
* destination runs out of space
Expand Down Expand Up @@ -804,6 +843,7 @@ One of the inspirations for writing this library was to have a solid foundation
FIBER SCHEDULER
While thinking about producing an "echo server" example, I realized that we couldn't easily spool up a bunch of "connect" requests. Here's the potential pseudo-code.

```
1 10.times do |i|
2 socket[i].connect(addr: addr) do |sock, remote_addr| # connect will suspend fiber
3 rc, errno, string = sock.recv # will suspend fiber
Expand All @@ -812,7 +852,8 @@ While thinking about producing an "echo server" example, I realized that we coul
6 # exit block...
7 end
8 end

```

I marked each line above where the fiber will suspend. Walking through this, we will suspend on line 2 when we call #connect. When it returns, we'll spin up a new Fiber to execute the block and transfer to it. That fiber will suspend on lines 3 and 5. However, those suspensions will not pick up any execution from line 2 to line 8 and then back to top of loop. So we'll be processing the #connects in lock-step form. I don't like this.

Instead, we will suspend on line 2 as before inside the #connect call. When it returns a connection, we will mark the fiber that the #connect ran in as "eligigle for more work". We'll then create a new fiber for the connect block and transfer to it. When that fiber suspends on line 3, we enter the Fiber Scheduler. We'll package up the request and send it to the IOLoop. Normally at this point we would block waiting for a response.
Expand All @@ -830,6 +871,7 @@ Obviously we need a way to build `fiber_list` in the scheduler. Perhaps every fi
STRUCTS
Need some simple objects to return errors. e.g.

```ruby
class Err
attr_reader :rc, :errno

Expand All @@ -838,15 +880,19 @@ class Err
@errno = errno
end
end
```

Also need simple objects to return useful data to #each blocks, #accept blocks, etc.

e.g.

```ruby
class EachResult
attr_reader :string, :offset

...
end
```

These need to be cheap and fast to allocate and populate. Structs might be a good choice though I wish they supported keyword args pre-2.5.0.

Expand Down