-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async usercall interface for SGX enclaves #515
Async usercall interface for SGX enclaves #515
Conversation
12728f3
to
f917b37
Compare
c2eae70
to
7dfaf7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most things I found are minor, or can/should be picked up as part of another ticket
8c87b85
to
9a71137
Compare
Added one last comment related to |
Short Questions (I have not read all the changes):
At the moment I'm not sure what exactly is meant with |
Good question @DragonDev1906, I've updated the description of this PR to make things more clear. Let me know if you still have questions. This PR doesn't have examples, but we'll add some once the changes to mio and tokio have been upstreamed and things are easier to be used. |
Nice, I've had a few issues with dependencies that rely on I do have two more questions (though I'm not sure if this is the right place to ask them):
I plan to test the throughput of those options, but perhaps you already have some experience or suggestions which option may be the slowest/most inefficient. Especially the library mode and if such a system would even benefit of the async usercall interface changes. (It could also be useful to have such a comparison of communication options somewhere in the docs). (so many questions, sorry) |
No worries @DragonDev1906
No, without changes to your code, this PR doesn't have any impact for you.
If you use the changes in this PR to build an async enclave, your code will be a bit more readable. Biggest change would be that you don't need to enter/exit the enclave to request new commands/return responses. If the enclave is compute expensive, the performance benefit of that may be minimal. Async code works best when it no longer blocks on I/O, but can do something useful while it waits for some event. Based on your description, you may already be doing that with a custom runner.
See previous answer
Yes that only makes sense if the enclave runs async code
That seems unrelated to whether you right sync or async code. |
Just to see if I understood that correctly: The changes in this PR (when using the new async interface) are going to mean that multiple usercalls can/will be batched into a single ECALL (enter/exit), with the ability to use async code to send multiple usercalls without waiting for the response. But there still needs to be at lest one ECALL (for the entire batch) before the runner can process the usercall and the same for the way back, correct? Just some info if you're interested, @raoulstrackx:
Yeah, my enclave is not waiting for any responses for requests sent out (that's handled outside the enclave) and only blocks while trying to read new commands (currently via TCP) or writing results (also via TCP), but new commands don't depend on previous results unless something goes wrong.
I've thought about implementing in a "enclave requests the data and waits for the response" way, where async usercalls would likely be a big performance benefit and/or be a lot more readable. My conclusion to that was that there is a rather big trade-of:
I'm not yet sure if this architecture is going to bite me at some point. It's good to know that there will be an efficient way to implement it in a "enclave asks for data" way should the need arise to do that because a complete separation of fetching and logic gets too difficult. |
@DragonDev1906 sorry I forgot to reply to your comment.
Strictly speaking: yes, but I think you misunderstood a bit how EDP is expecting to be used. The idea is to run an entire application in the enclave. So the single ecall you refer to, is coming from the enclave-runner that calls the enclave for the very first time. This eventually leads to the enclave calling your For questions/comments not specifically related to this PR. Let's switch to the #rust-sgx channel in the Runtime-Encryption Slack workspace |
fn make_progress(&self, deferred: &[Identified<Usercall>]) -> usize { | ||
let sent = self.core.try_send_multiple_usercalls(deferred); | ||
if sent == 0 { | ||
self.core.send_usercall(deferred[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a potential runtime error here if someone calls make_progress
with an empty slice for deferred. I don't think there's currently a place that calls this function with an empty slice, but someone could change the code in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
return; | ||
} | ||
let sent = self.make_progress(&self.deferred); | ||
let mut not_sent = self.deferred.split_off(sent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we're using the methodology here of doing split_off/clear/append
rather than using drain(0..sent)
or something like a VecDequeue
?
I suspect that the answer is that the common case is expected to be that make_progress()
should normally be able to process the entire queue, which makes the split_off()
and append()
effectively no-ops. Which is fine, but it's probably useful to leave a comment for the next person who looks at this code and wonders the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it works, but your drain(..sent)
seems more readable. I'll update.
} | ||
let mut wrote = 0; | ||
for buf in bufs { | ||
wrote += self.write(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If self.write()
ever returns 0, we can break out of the loop. This might be an efficiency concern if the slice of bufs
is long and we're frequently writing to a nearly-full buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is even a strong requirement to immediately return when write
ever returns 0; following writes may succeed and thus increase the wrote
counter. The user will then incorrectly assume that the first slices were written while the latter ones weren't. That may not be the case.
intel-sgx/async-usercalls/src/lib.rs
Outdated
n => &returns[..n], | ||
}; | ||
// 2. try to lock the mutex, if successful, receive all pending callbacks and put them in the hash map | ||
let mut guard = match self.callbacks.try_lock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me why the try_lock()
is useful here. It would seem simpler, as efficient, and possibly more correct to write this as something like:
let mut guard = self.callbacks.lock().unwrap();
for (id, cb) in self.callback_rx.try_iter() {
guard.insert(id, cb);
}
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is meant as an optimization; when self.callbacks.try_lock()
fails, it means multiple threads are competing for this resource. The code then avoids receiving more callbacks. I can imagine that self.callback_rx.try_iter()
is a potentially time costly operation.
But I don't understand why this is correct in all cases. When the system is under very heavy load, the result of the usercall may be received in step 1 before the callback is added to self.callbacks
. This eventually leads to the callback never being called. There's also a memory leak: Eventually the callback is received in step 2, but never gets removed from the hashmap.
@mzohreva Are we missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, this should be simplified.
pub fn insert(&mut self, value: T) -> u32 { | ||
let initial_id = self.next_id; | ||
loop { | ||
let id = self.next_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an extremely unlikely but severe potential performance problem with this approach for allocating ids. Suppose you have 2^32 - 1
calls that never terminate, and then a single other call which is fast but you never have more than one of those outstanding at once. Each time you make the fast call, you'll need to do 2^32 map lookups to find the available id.
I don't think we need to worry about this in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does make me wonder if we might want to have some way to limit the length of the call queue to something smaller than 2^32, with either blocking if the queue is full, or switching to making the call synchronously in this case. This might not be useful for DSM, but I can imagine some type of EDP application in which enclave threads could be generating asynchronous calls faster than they could be serviced, and you might want to have the ability to either throttle the enclave threads when the queue gets full or switch to synchronous ocalls, instead of growing the queue without bounds and getting a panic when you run out of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a potential problem, but can/should be fixed as a separate issue. @arai-fortanix what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing later seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue #550 for this
2394fe3
to
9017716
Compare
Credits for this commit go to: Mohsen: fortanix#404 YxC: fortanix#441 This commit is an attempt to have the async-usercalls finally merged into the main codebase (master branch).
Address code review comments
273c394
to
2c49170
Compare
This change ports the old tests from .travis.yml to the new github actions used on the current master branch. @raoulstrackx
2bf3f6c
to
51ea191
Compare
(CI failing otherwise)
Entering and exiting an SGX enclave is performance costly. It's much more efficient to continue executing within the enclave and communicate with the
enclave-runner
by passing messages. The tokio runtime can be used for such asynchronous communication.This PR provides very basic support for this in EDP, but changes to mio and tokio still need to be upstreamed. These changes are fully backwards compatible; your existing enclaves will continue to run as expected.
Credits for this PR go to:
Mohsen: #404
YxC: #441
This commit is an attempt to have the async-usercalls finally merged into the main codebase (master branch).