Testing fellow routers ideas (2020-05-18)

Active or passive or both?

passive probably better. Active crowds out other traffic, or fails when node is already passing lots of data.
very simple test: have received an RC recently.
less simple but not too complicated: stats on path builds to a peer.
passive metrics: need an aggregate to measure properly because just seeing that one router isn't accepting enough data doesn't tell me that that router is the problem.
need to flesh out responsibilities. If lokinet just says "good/bad" that might be worse than lokinet building a "score" somehow (or perhaps just sending the aggregate numbers to let the quorum decide).

Quorum testing: - beginning of window: quorum decides on a random set of nodes. [Maybe not necessary with the long windows discussed below -- having 30-minute-predetermined nodes is okay when the testing window is days] - Relays send aggregate data for last 2/4/24/48/168 hours [more discussion on the testing window below] - lokinet reports data to lokid, lokid shared with other quorum members - quorum decides based on aggregate data.

General problem:

Big concern: DDoS. Tests create an incentive to get a node kicked off.

Alternative:

reward for good behaviour
maybe: 50% of SN reward + 50% * (weighted average of "score")
- score could come back up over time.
- DDoS doesn't bump someone off, it just reduces their rewards for a while, so less incentive to DDoS an individual.
- however this doesn't get a bad node off the network, which we want to happen.

Using send/resend data:

count amount of successful data sent, and number of data that had to be resent.
^ this is maybe a problem with bursting: how do with distinguish between a node that is sending a continuous trickle of 10kbps and resending 50% of it versus one that idles most of the time with an ocassional 100Mbps spike that needs 50% resent. (The former is bad, the latter is not a problem).

We don't have full network information, so we can't know conclusively that a node is "busy" (and even if we did have full network info, two nodes could just lie for each other to make it look like they are busy all the time by pretending that they are routing lots of data through each other).

Using some thresholds on aggregates:

build some "success" measure using send/resends. (sent data = good, resent data = bad).

Need testing timeframe to be long enough to make deregistration expensive, but not so long that we leave DDoS routers active causing network detriment. 48 hour data? 1 week? Longer means we need clients to deal with it.

Using decommissions:

if you get a "bad" (consensus) vote you get decommed.
Extend decomm period? Maybe up to a week?
Need some way to broadcast intermediate failures to a node. So, if a node is failing, we need a way to tell the node that it is failing.
Need to push some bad path handling to clients: clients need to be able to recognize a path is failing and switch to another one, both for their own half of the path and using a different introset entry (because we don't know which half of the path the problem is on).
Reporting information. Two ideas:
- lokinet self-reporting: lokinet can use information about its own queues to figure out how much data it is routing and how much it dropped. This isn't perfect, but it can be useful information for an admin. (It can report this sometimes to lokid, perhaps, for display in lokid status).
- "advisory" tests - say we require sustained shittiness over a 1 week period to take action; we could then also do tests on 1 day worth of data and, if a quorum decides that the node has been shitty for a day, the quorum contacts the SN to submit the advisory notice. (Alternatively we could p2p blast it so LokiSNbot or lokisn.com can see it and report on it).
For 8.x HF: don't trigger decomms except for simple tests (RC + maybe path builds), but perhaps start doing advisory tests and (non-binding) notifications. (Would be very helpful to P2P spam this.)

Alternative punishments to decommissioning:

Sending a SNode to the end of the reward queue is way of punishing a node only financially; it is sort of a temporary punishment.
Another system might reward nodes based on their relative performance. Nodes with a flawless reputation might earn their full reward, while those with more "dings" might only receive part of their reward.