Your response is exactly the kind of response I was hoping to get. I want to see as many holes punched into any design as much as possible so it can be strengthened and modified as needed. I would be concerned if no one challenged the design. All ideas and concerned are welcomed. And I appreciate your passion. It is part of what makes a strong community.
It may get too confusing trying to interlace responses within your responses, so I’ll address the major ones, combine where appropriate and number the points so they can be referred to in responses.
1 DDoS server: this would be true of the polling design or pool design. Both are centralized systems. There would be only one URL for a Foundation server to either poll for challenges or to connect to the pool. In either design an assumption can be made that all known secure nodes where up during a server attack and should be paid (conditions need to be defined).
2 DDoS node: in the polling design having a web server sitting on each node for the sever to check for the challenge response creates a whole new attack point. There is no additional open port in the pool design since the node pool app initiates the connection to the server.
3 Uptime-availability: the polling design only describes the node looking for a challenge, with the node itself providing no identity and the server not tracking who is polling. Let’s say a challenge is the only way to identify a node and there is only one challenge a day. What is there to stop someone from checking for a challenge and only running the node during the challenge (not sure why they would unless the system is used for other purposes)? Or even writing an app to mimic a node. To combat that, the number of challenges would likely be increased. Another approach is to have the server track each node that polls the server by ip address and associate it with an identity to prove it is up, but that does not prove the node itself is running.
In the pool design, identity (t_address of node and a challenge during registration) is checked during initial connection. The server knows who the node is since the connection is maintained. Node information (like block height and number of peers) is checked periodically to ensure the node is running. Regarding the point of tracking uptime to the second: that is not really the point, although possible I guess. The system would track downtime (disconnects) and calculations would include allowances for things like maintenance. We don’t want to penalize nodes.
4 Uptime-tracking: I am not sure where the assumption came from on tracking disconnects each second in the pool design. The server would simply notice a disconnect event, create a record of start time and when it reconnects add a stop time to the record. There should be very few of these records. It could even allow a ‘grace period’ and ignore disconnects shorter than a few minutes to take into account things like network issues or reboots. I would anticipate very few disconnect records.
5 Node health: in a polling design, some information like block height or peer count could be added and transferred in the challenge. Or perhaps in the polling itself, but some identity would need to be added. The pool design keeps tabs on these items or anything that would help with the health of system.
6 RPC on the node: in both designs the tracking app is making calls to the node via rpc. The code is limited to reading information for node metrics and challenges and submitting a shielded transaction of 0 amount. The control is at the node level and there is nothing that can be run outside of what is hard-coded. Having an instruction to perform a challenge sent via a secure socket or pulled via https does not change how it is executed on the node.
7 Server polling own address: yes this is a little confusing.
In the polling design, the information (URL of the node to find the response it created) is in a shielded transaction memo sent to the server. After a challenge period ends, the server has to loop through each of the new transactions in its z_address to get the URL of each node. It then needs to query the node at that address, retrieve the response, validate it and log the results.
In the pool design when the node is done with the challenge it sends back the transaction id. The server looks up the transaction by id, validates it and logs the result. Additionally if the challenge can not be completed by the node, the error is sent back to the server, logged and available for display so it can be addressed.
8 Self registering: the amount of configuration would be similar in either design. The self registering simply means when the pool app running on the node starts up, it connects to the server, the server determines it has not seen it before, the server sends it a challenge to perform, the server validates and registers it. Data like block height, software versions, and peer count also are used to determine if it is a secure node. Some sort of SSL validation will be included once that is finalized.
9 Using the block chain for administration: Let’s assume there needs to be six challenges a day (one every four hours) to ensure node availability in the polling design. At 1000 nodes that is 6000 transactions a day. At 5000 nodes, 30,000 transactions a day. Going into the wallet on the server.
This could be more than standard transactions depending on number of nodes. It also just does not seem like good practice to flood the blockchain with lots of administration data. Some is needed, but these numbers are excessive. The pool design does not depend on the blockchain to determine uptime. It also could be set to only send a challenge every few days or when the node is down for a time since the challenge is only used to prove capability.
10 Scalability: The polling design requires times of peak activity. The pool design spreads out activity. In fact most of the time it is just sitting there monitoring the sockets and getting node metrics. Challenges can be spread out over the course of a day or even a week. Handling the number of target nodes could be done with one server at the outset.
11 Two way communication: To address the little bit of FUD on this one: The pool design could use a package like socket.io to provide and maintain the underlying encrypted socket connections. It has ‘heartbeat’ and timeout features built in. This package has been downloaded from npm over 5.7 million times in the last month indicating there are many applications using and relying on it in production settings. The pool design is not being built completely from scratch at that level, it would use well-known and supported libraries. And no, it is not like a windows network.
Overall I believe the pool design addresses many of the implementation issues/challenges I found while writing the detailed spec.
I hope I have addressed your questions and provided cleared explanations. Let me know if I missed anything. This has also been a good exercise for me in being challenged to rethink aspects of each design. Keep the questions, concern and ideas coming!