From: Jason Croft Sent: Tuesday, April 19, 2011 12:22 PM To: Gupta, Indranil Subject: 525 review 04/19 The Byzantine Generals Problem This paper examines failures in a system using the analogy of a Byzantine army trying to organize an attack on a city. It first proposes a problem in which generals must reach an agreement in the presence of traitors, requiring (a) all loyal generals to decide upon the same plan of action, and (b) a small number of traitors could not cause the loyal generals to adopt a bad plan. The authors then rephrase the problem such that there is one commanding general sending an order to lieutenant generals. In this case, all loyal lieutenants must obey the same order and if the commanding general is loyal then every loyal lieutenant obeys the order he sends. These two conditions are referred to as interactive consistency conditions. The authors prove there is no solution to this problem unless more than two-thirds of the generals are loyal, and therefore no solution exists if there are three generals, one of which is a traitor. That is, for m traitors, there must be at least a total of 3m+1 generals. A solution is also possible with signed, unforgeable messages. This assumes a loyal general's signature cannot be forged and any changes in the messages can be detected. Anyone should also be able to verify the general's signature. The authors also discuss applications of this problem to reliable systems, where generals are processes and loyal generals are non-faulty processes. However, there is an assumption that all messages sent by a non-faulty process are delivered correctly. As the authors note, communication lines can fail and determining whether a message was not delivered because a process failed or the link between two processes failed requires a mechanism such as timeouts. In addition, the authors do not discuss the possibility of a system or algorithm where no nodes can misbehave or fail. The solutions also has a lot of overhead with regards to the number of messages that would need to be sent between nodes. Airavat: Security and Privacy for MapReduce Airavat builds on mandatory access control (MAC) and differential privacy to ensure untrusted MapReduce computations on sensitive data do not leak private information and provide confidentiality, integrity, and privacy guarantees. The goal is to prevent malicious computation providers from violating privacy policies a data provider imposes on the data to prevent leaking information about individual items in the data. The system is implemented as a modification to MapReduce and the Java virtual machine, and runs on top of SELinux. One strength of Airavat's design is its use of SELinux to prevent leakage of information using system resources such as network connections or pipes. This may not completely eliminate threats of covert channels, but within the scope of their work this was a good addition to their design to consider such attacks. However, their design only protects against a malicious computation provider, but not a malicious cloud provider. Differential privacy also imposes some limitations on Airavat and its uses, such as the finite privacy budget. As the authors mention, one computation provider could exhaust this budget on a dataset for all other computation providers and use more than its fair share. While there is some estimation of effective parameters, there are a large number of parameters that must be set for Airavat to work properly. This increases the probability of misconfigurations or configurations that might severely limit the computations that can be performed on the data. From: harshitha.menon SpamElide on behalf of Harshitha Menon Sent: Tuesday, April 19, 2011 11:59 AM To: Gupta, Indranil Subject: 525 review 04/19 Byzantine Generals Problem Reliable computer systems must handle malfunctioning components that give conflicting information to different parts of the system. This problem is been expressed as Byzantine Generals problem and the idea is to find an algorithm to ensure that the loyal generals come to a consensus even when there are traitors. The generals can communicate with each other through messengers. It is shown that this problem is solvable only if more than two thirds of the generals are loyal. This means that a system can handle faults and come to consensus if there are more than ? processes which are non-faulty. Pros: -An problem has been portrayed in an interesting way with analogies. -A novel solution has been given to the byzantine generals problem. -Formal proofs have been given. Cons: -This algorithm is expensive in terms of time complexity and number of messages. Number of messages is O(N^m). -This doesn’t identify the source of the failure but it handles the failure. -This doesn’t handle the case where messengers are forge able. PeerReview PeerReview is a system that provides accountability in a distributed systems. PeerReview ensures that a byzantine fault observed by a correct node is eventually detected and linked to the faulty process. The faulty node can defend against false accusation. PeerReview works by maintaining a secure record of the messages sent and received by each node. The record is used to automatically detect when a node’s behavior deviates from that of a given reference implementation, thus exposing faulty nodes. Pros: -Initial solution that was proposed gives strong guarantees but it doesn’t scale to large systems. But by relaxing completeness in favor of probabilistic detection guarantees, it can scale to very large systems. -As an extra benefit, this system could be used to identify race condition bugs which are otherwise difficult to identify. -They have experiments showing the applicability of this. -It provides completeness and accuracy. Cons: -If the processes that identified the byzantine failures were faulty, then this cannot be handled. -The faults have to be observable else it cannot be detected. -Since all the inputs and outputs of each node has to be logged, this would increase the disk space that is used. -How will this system perform under high churn rate? -The network traffic is significant when the number of witnesses is more than 1. From: Anupam Das Sent: Tuesday, April 19, 2011 11:59 AM To: Gupta, Indranil Subject: 525 review 04/19 i. PeerReview: Practical Accountability for Distributed Systems This paper presents a system called PeerReview which provides accountability for distributed systems where nodes are under multiple administrative domains. The system is called peer-review as it does not rely on any centralized entity rather all the nodes in the system help each other in identifying the faulty nodes. PeerReview can be incorporated into any practical system as long as the operations conducted by correct nodes are deterministic. PeerReview also requires that faulty nodes cannot forge the signatures of any correct node. Moreover, each node maintains a set of nodes called witnesses which contain at least one correct node. In this system every correct node maintains a tamper evident log by using a hash chain protocol. After every successful transmission a correct node expects acknowledgement from the receiver and it if fails to receive any it starts to suspect the receiver. At this point the sender requests the witness set of the corresponding receiver to provide evidence about the receiver’s current status. The witness set members periodically ask the node being monitored for a portion of its log and this log is then fed into the reference implementation system protocol to check for any deviant behavior. If any deviation is found then the witness set members generate evidence reports which are conveyed to other nodes in the system. PeerReview can only detect a faulty node if the fault results in one or more messages that causally affect some other correct nodes. Nodes that are unwilling to send any acknowledgement are marked suspected and are eventually challenged to produce a response. This provides correct nodes with the opportunity to exonerate themselves from any false accusations in case of network failure. The paper provides experimental results which show that PeerReview is scalable and its cost is a function of the number of witnesses for each node. Pros: 1. The challenge/response protocol allows correct nodes to defend themselves from any false accusation. This reduces the rate of false positives. 2. It is eventually complete and accurate. 3. The number of witnesses for each node can be tuned to provide a tradeoff between accuracy and completeness. 4. It has been applied to three different types of real distributed systems. Cons: 1. PeerReview operations introduce additional network traffic and computational overhead. 2. PeerReview can only work with systems whose actions are deterministic. 3. PeerReview requires at least one correct node in the witness set. This assumption might not hold in some scenarios. 4. The paper does talk about how often the witness members of a given node investigate the corresponding node. They only say periodically. Is this period dynamic or static? 5. In case of high churn PeerReview consumes additional bandwidth for initializing new witnesses. Some experimental results regarding this would have been helpful. ----------------------------------------------------------------------------------------------------------------------- ii. Airavat: Security and Privacy for MapReduce Airavat is a MapReduce framework based system which provides security and privacy guarantees against data leakage. Clouds provide a great environment where distributed computing (aggregations like-sum, count, max etc.) can contribute useful information to both research and industrial community. However, users are always afraid that their personal information will leak out. Airavat allows these distributed computing/aggregations to be done without exposing any private data. Airavat uses Mandatory Access Control (MAC) and differential privacy to achieve this. Airavat uses MAC to prevent information leakage through system resources. MAC ensures that untrusted codes do not leak private data through different channels like-pipes and network connections. But MAC does not ensure end-to-end privacy in a cloud computing environment. So to provide this, Airavat uses differential privacy to hide the actual information of particular individuals. The current implementation of Airavat supports both trusted and untrusted Mappers, but Reducers must be trusted (i.e., selected from one of the Reducers provided by Airavat). The authors used SELinux to enforce MAC on the HDFS (Hadoop DFS) and they also modified the JVM to make mappers independent (using invocation numbers to identify current and previous mappers). They also modified the reducer to provide differential privacy. From the data provider’s perspective they must provide several privacy parameters like- privacy group and privacy budget. The computation provider must also provide information like-mapper range and the reducer code (written in Airavat programming model). Airavat was tested on top of EC2 and was shown to be 32% slower than conventional MapReduce. Pros: 1. Differential privacy prevents data leakage in the output of computations. 2. The authors provide some ways in which to estimate the sensitivity of computations. 3. Complete implementation is provided. 4. Experimented on a number of different problems. Cons: 1. Restricted the set of reducer function to SUM, COUNT and THRESHOLD. 2. Differential privacy works best only for low-sensitivity computations. So it restricts the set of possible computations. 3. Data providers are burdened with providing many privacy parameters. Many of these parameters are hard to estimate. 4. The overhead of 32% is quite significant. Could this be lowered at the cost of lower security or privacy? -----Anupam From: trowerm SpamElide on behalf of Matt Trower Sent: Tuesday, April 19, 2011 11:54 AM To: indy SpamElide Subject: 04/19 525 review Airavat This paper presents a new security layer for use on top of a MapReduce-like infrastructure. The authors are concerned with the privacy of data used in large data mining operations which often take place on cloud systems such as Amazon's AWS. In order to have a proper MAC security implementation, all output data from MapReduce job would have to inherit the union of all input data. This means that any user would have to have access permissions for all of the input data in order to read the output. This would make the output inaccessible to almost anyone so the authors concentrate on how the output data can be declassified in a systematic way, helping companies avoid having to audit all users of their data. The systematic declassification is done using a new technique, differential privacy. The idea behind differential privacy is measuring the effect each input has on the output. In slightly more formal terms, what is the probability of producing the same output without one input. A nice example of this is taking a sum. If all input values are 0 or 1 and the output is 1000 then changing one input variable is unlikely to have a large effect on the system. The authors define a system where the MapReduce author must define the ranges of his function which Airavat will then decide how much information is being leaked using differential privacy. There is then an economic system allowing the person querying to only leak so much information. This is a very nice system with reasonable performance tradeoffs. The authors decided to throw random inputs into the data in the case that some input is outside of the defined range which makes less sense to me than not declassifying that data. As stated in the paper, information leakage is still an issue and the problem was in some sense moved to controlling access to the economic system. PeerReview PeerReview is a fault detection system aimed at multiple administrator domains such as inter-domain routing. PeerReview is a hashed log of all actions taken be each node which can be verified for correctness by other nodes in the system. This allows nodes to defend themselves against false accusations. The authors take time to insure that the logs or tamper-evident and consistent. My main complaint about this system is that each node is required to have a reference implementation of the other nodes which can be initialized and run with some state. This seems like a great deal of overhead for detecting a failed node. From: Long Kai Sent: Tuesday, April 19, 2011 11:54 AM To: Gupta, Indranil Subject: 525 review 04/19 CS 525 Reviews 04/19 Long Kai (longkai1) The Byzantine Generals Problem Summary: This paper presents analysis and solutions to Byzantine failures. Machines in the system can fail in any arbitrary way and a reliable computer system needs to be able to cope with this kind of failure. This problem is abstractly described as Byzantine Generals Problem, which is a group of generals needs to achieve agreement only by messages between them. There are some traitors in the group that are trying to confuse the others. This problem is to find an algorithm to reach agreement among loyal generals. In the paper and it concludes that at least two thirds of the generals must be loyal to guarantee the reliability of the system. The paper has presented several solutions to the Byzantine Generals Problem. In other words, each lieutenant may have to wait for messages that originated at the commander and were then relayed via m other lieutenants. So the minimum message path of length is m+1. Algorithms OM(m) and SM(m) also involve sending up to (n - 1)(n - 2) ... (n - m - 1) messages. Comments: This paper theoretically analyzed Byzantine failures and give reliable solutions to solve this problem. However, the algorithms are very expensive in terms of both time and messages. Certain constrains can be put on the system to make the solution less expensive. Airavat: Security and Privacy for MapReduce Summary: This paper presents Airavat, a MapReduce-based system for distributed computations which provides end-to-end confidentiality, integrity, and privacy guarantees using a combination of mandatory access control and differential privacy. It provides strong security and privacy guarantees for distributed computations on sensitive data. Users can perform computations on the data, but Airavat confines these computations, preventing information leakage beyond the data provider’s policy. Mandatory access control (MAC) is a useful building block for securing distributed computations. MAC-based operating systems enforce a single access control policy for the entire system. This policy, which cannot be overridden by users, prevents information leakage via storage channels. However access control alone does not achieve end-to-end privacy in cloud computing environments, where the input data may originate from multiple sources. The output of the computation may leak sensitive information about the inputs.The output of an aggregate computation must be “declassi?ed,” but only when it is safe to do so. Airavat integrates mandatory access control with differential privacy, enabling many privacy-preserving MapReduce computations without the need to audit untrusted code. The paper also demonstrates the practicality of Airavat by evaluating it on a variety of case studies. Comments: This system provides security and privacy guarantees for distributed computations on sensitive data at the ends. However, the data still can be leaked in the cloud. Because multiple machines are involved in the computation and malicious worker can sent the intermediate data to the outside system, which threatens the privacy of the input data. Even not to this extent, temporary data is stored in the workers and those data can be fetched even after computation is done. -- Larry From: Chi-Yao Hong Sent: Tuesday, April 19, 2011 9:52 AM To: Gupta, Indranil Subject: 525 review 4/12 ---- The Byzantine Generals problem, TOPLAS 1982---- This paper studied the byzantine general problem, which is an abstraction of a problem about reaching a consensus agreement in a distributed system where the communication can fail in an arbitrary manner. Notice that the “failure” consists of two types: 1) fail-stop failure: the message could lose. 2) Corruption: the message could be incorrect. The authors model this problem and studied the problem in different settings. For example, when the data corruption is impossible, this problem is solvable for any number of traitors, which could intentionally send incorrect message to bias the result. For the data forgement is possible, authors showed the impossibility results for m traitors and <3m+1 generals, and propose feasible solutions for the case of >=3m+1 generals. Pros: 1. A high-impact paper that deals with Byzantine fault, a general failure model in distrusted systems. Impossibility results are neat and technically sound. 2. It is interesting to consider unforgivable messages which nowadays can be achieved by using public-key cryptography (with reasonable overheads). Cons: 1. Byzantine fault might be too general to consider all possible malicious behaviors to make this becomes the worst case analysis. It might be more interesting to relax the model to consider other types of faults that are more likely happen in the distributed systems, e.g., random error. 2. How about probabilistic version of this problem? The impossible results for deterministic agreement might be less interesting than the probabilistic ones. For example, for how many traitors and generals with a given communication graph, what is the probability of achieving agreement successfully in the bounded time? ---- PeerReview: Practical Accountability for Distributed Systems, SOSP’07 ---- It is generally considered hard to detect node faults in a distributed system. This paper uses the idea of accountability by maintaining an evidence of nodes’ actions. By comparing the nodes’ actions with protocol behaviors, the proposed system PeerReview is able to detect unexpected behaviors such as Byzantine faults. To secure the system, Peerreview uses a strong identity where each node has a public/private keypair bound to a unique node id such that faulty nodes cannot forge the signature of a correct node. To ensure the correctness of logs, PeerReview uses a commitment protocol to ensure that the sender of each message obtains verifiable evidence that the receiver logged the message. Comments: 1. The use of logs and authentications significantly improves the data integrity. The challenge/response protocol allows a correct node to monitor all the suspected nodes. Clearly the design provides nice security properties that authors argue in the Introduction section. 2. The design of probabilistic guarantee is a promising way to reduce the control traffic generated by the PeerReview by relaxing the deterministic consistency guarantees. However, the author did not evaluate the bandwidth benefits of this scheme. 3. As it is clear that the capability of detection is highly dependent to the protocol, it would be interesting to see the percentage of nodes that cannot be detected by using PeerReview for some popular distributed protocols, say, Paxos, to see if the proposed scheme could provide strong security properties under the real protocols. -- Chi-Yao Hong Computer Science, Illinois http://cyhong.projects.cs.illinois.edu From: muntasir.raihan SpamElide on behalf of muntasir raihan rahman Sent: Tuesday, April 19, 2011 9:24 AM To: Gupta, Indranil Subject: 525 review 04/19 Airavat: Security and Privacy for MapReduce Summary: In the era of mapreduce cloud programming, security and privacy is of utmost importance. In-fact, it has been identified as one of the most important research challenges to further increase the viability of cloud computing. Third party outsourced computation of huge amounts of private data of individual users wreaks of privacy breaches. An untrusted mapreduce program can easily leak private information of a user. Existing approaches based on user de-identification have been shown to be inadequate, as evident from large scale privacy fiascoes like AOL search logs, and netflix dataset. Auditing all mapreduce programs is not feasible. Instead the target of this paper is to confine code instead of auditing it. It proposes Airavat, which is a novel privacy preserving framework for mapreduce computations over untrusted code. Airavat guarantees bounded information leakage of individual data for mapreduce using differential privacy. This however comes at the cost of reduced accuracy. The Airavat framework runs on the cloud infrastructure and consists of a modified mapreduce, DFS, JVM, and SELinux. In this model, the data provider uploads data with certain privacy parameters. The computation provider write possibly untrusted mapper code. Finally Airavat runs the computation and protects data privacy. In Airavat, mapreduce consists of untrusted mappers and a limited set of trusted reducers. It confines untrusted code through a combination of mandatory access control and differential privacy. A range enforcer ensures that mapper outputs are not perturbed by a malicious user. Random noise is added to mapper outputs before passing them to the reducer. This minimizes the effect of any individual input on the output at the cost of reduced accuracy. Pros: (1) First work to add MAC and differential privacy to mapreduce. (2) Proposes a new framework for privacy preserving mapreduce computations. (3) Confines untrusted code. (4) The differential privacy mechanism allows control over privacy - accuracy trade-off via three privacy parameters. It also bounds information leakage and quantifies it mathematically. Cons: (1) Airavat cannot guarantee privacy for computations where malicious users can generate output keys. (2) Also for computations that require key outputs, Airavat can only work with trusted mappers. Future Works: (1) An obvious extension would be to incorporate untrusted reducers. PeerReview: Practical Accountability for Distributed Systems Summary: This paper proposes PeerReview, a general system for ensuring accountability in distributed systems in a practical manner. The goal is to ensure that whenever a node is faulty, the system eventually generates a proof of misbehavior for that node which can be accessible to all working nodes. PeerReview only deals with faults that causally affect correct nodes via messages. It does not deal with faults that only affect internal node behavior. The system assumes deterministic state, reference implementations at each node, eventual communication for correct nodes, and availability of signed messages. All nodes keep a log of their IO and messages. Each node has a set of witnesses who periodically audit that node’s log to detect any deviations. Other nodes can check evidence reports from the witnesses to report faulty nodes. Log tampering is prevented using signed hash chains. A node is audited by checking a replay of the state machine against the log. PeerReview guarantees that all faults are eventually detected and that good nodes can never be accused. The authors tested the system on top of three representative applications: NFS file server, overlay multicast, and P2P email. In summary, the paper proposes accountability as a new approach to deal with faults, and builds a practical system PeerReview to implement this idea. Pros: (1) The system is generalized and applicable to many different failure scenarios. (2) It provides eventual detection of faulty nodes and ensures zero false positives. (3) Does not require any formal specification of protocols to ensure correctness. (4) Works for different types of faults. (5) Proposes accountability as a new approach to deal with faults. Cons: (1) PeerReview requires deterministic behavior from all nodes which might not be realistic. Many distributed protocols are randomised in nature, as a result PeerReview might not work for them. I don’t think saving the seed of a random number generator can capture all the non-determinism of a randomized system. (2) The paper talks about reducing the log overhead by truncating logs. The proposed system works well when complete logs are generated. Log truncation heuristics mentioned in the paper don’t seem plausible. (3) I think the generality of the system is also a weakness of the system. It goes against the principle of “making the common case fast”. It is not clear whether PeerReview can compete against systems highly optimized for specific scenarios. Trying to cover all possible scenarios might turn out to be a pitfall for the proposed system. (4) I think the cryptographic overhead of PeerReview is prohibitive. (5) The system becomes scalable only with probabilistic guarantees. This also violates their design principle of only allowing deterministic node states. (6) What happens if the witness nodes collude against the node they are witnessing? Even though a low probability event, it could still happen. I don’t think PeerReview can deal with this situation. -- Muntasir From: Andrew Harris Sent: Tuesday, April 19, 2011 4:51 AM To: Gupta, Indranil Subject: 525 review 04/19 Review of “PeerReview: Practical Accountability for Distributed Systems”, Haeberlen et al, and “Airavat: Security and Privacy for MapReduce”, Roy et al PeerReview is a heuristic-based fault detection system for Byzantine failures in distributed systems. It relies on some number of nodes in a given network monitoring applications on those nodes, and producing and sharing activity logs of those applications with other nodes. From these logs, a node may probabilistically determine whether a given application is acting within its “known good” activity range. For applications that deviate here, PeerReview detects this deviation and presents it to the implementing system, which then presumably ends or otherwise halts the application. The group shows a handful of reference implementations, and offers an analysis of the scalability of their system for very large distributed networks. At first, there is a clear drawback to Peer Review’s log-playback analysis for abnormally behaving nodes. Consider a node that always displays erratic behavior; in this case, one could only detect “deviations” in a sense of repeated actions that should not have been repeated. It is possible that one could do an entropy analysis to see if the erratic behavior demonstrated by a node fits the expected level of entropy. The system here gets around this heuristic limitation in their theoretical model by specifying that all state machines involved are deterministic, which may not be a realistic assumption for a general purpose machine. In practice, this heuristic is based on multiple-node observation of the process, to establish a baseline of activity. But again, this can fail in the presence of an always-erratic node, without deeper entropy analysis. A drawback in relying on other nodes for reports is the assumption that these reports are accurate; a malicious subset of a PeerReview network with some incentive to promote a malicious application may falsify its logs as to the application’s activity. This would introduce a conflict in the heuristics for other nodes, again breaking the heuristic overall. Perhaps the only saving grace here in the face of malicious nodes is the assumption that these are Byzantine failures, which would themselves fail anyway for a high number of malicious nodes. For a low number of malicious nodes inserting bad logs, though, the probability of fault detection would decrease, diminishing the usefulness of this system. Airavat is a distributed key-value aggregation server system, that employs the MapReduce programming paradigm to perform a sort of single-machine multiparty computation. Untrusted clients can pool their data within the Airavat server, which strips out identifying metadata and broadcasts back out to clients (or wherever needed) the result of a pooled calculation for the pieces of received data. The system is built on a mandatory access control scheme on an SELinux implementation, which minimizes data leaks. The group describes the theory behind their approach, and offers some microbenchmarks which suggest that the added computation overhead for pooling and reducing sensitive data is tolerable for many applications, especially considering the benefits of privacy introduced. The system suffers from a single point of failure, as it relies on just one trusted machine (or a finite number of trusted machines, depending on implementation) to carry out the shared computation between some number of entities. If the trusted machine is itself malicious, it could leak pieces of information that in normal operation would be scrubbed. Borrowing from multiparty computation work, this system could be trivially extended to handle multiple shuffled and encrypted pieces of information, thereby removing the need to scrub metadata from the raw information at computation. In this same line of thinking, it is strange that the system did not employ any sort of certificate checking for incoming data (or for the computation machine), which seems like a trivial addition that would enable greater trust for the overall system. Airavat also doesn’t handle “gaming” of inputs, instead simply assuming that input labels are correct and non-malicious. This may not be an appropriate assumption for real-world implementations, and may lead to data leaks with sufficient numbers of malicious clients in collusion.From: nicholas.nj.jordan SpamElide on behalf of Nicholas Jordan Sent: Tuesday, April 19, 2011 3:52 AM To: Gupta, Indranil Subject: 525 review 04/19 Nicholas Jordan cs525 review 04/19 Airavat: Security and Privacy for MapReduce By: Indrajit Roym, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel As they so elegantly stated on page 2 “We aim to prevent malicious computation providers from violating the privacy policy of the data provider(s) by leaking information about individual data items.” They use differential privacy mechanism to ensure this. One interesting solution to data leakage, is that they have the mapper specify a range of its keys (0, 5). For example, if a MapRedue is aggregating the total of 25 participants with the each suppose to have range (0,5). We could when we see a specific user, map that key to 1 Million. So when you see a total for 1,000,100 we know that you have seen the key you were looking for. But they specify a range, and when we try to return a key out of the bounds, it instead returns a value inside the range. Even if you do map a key to value out of the bound, it will return a value with in that range. They also limit the number of queries of that data to prevent, the discovery of the original input data. They run Airvat on SELinux to implement there restraints for access to I/O and edit the Java JVM, so untrusted mappers can’t user any time channels mechanism to leak data as well. Pros: - provides privacy, with little to no change mappers code Cons: - 30% overhead Additional Comments: I understand that you are trading accuracy for the privacy of data. It seems that the bigger you data set is the more privacy you have because a user effects less of the output, if removed. They showed results that were really close to 100% with the added noise, it seems this is viable solution to protect the privacy of your data input. PeerReview: Practical Accountability for Distributed Systems Andreas Haeberlen By: Petr Kuznetsov, and Peter Druschel PeerReview is a strong accountability system for distributed systems that is resilient to Byzantine faults. At the core, each node keeps an append only log, that is tamper proof with the use of keys and hashing. Each node has a set of witness that will come into agreement that a node is either exposed or suspected. A node is suspected when it doesn’t respond to a message; this characteristic is needed due to the nature of the asynchronous system. A node is also declared exposed if it responds incorrectly to a challenge and response. Pros: - has many useful applications, such as the live video p2p. You can force users to cooperate with a forwarding a stream or notice a user is not forwarding / altering the streaming content. Cons: -Throughput dropped by 1/3 with the NFS example, when uses 2 witnesses. Additional comments: They used PeerReview in a variety of scenarios. The NFS scenario is a real performance hit, but the challenge/response messages can be reserved to be sent during down time. Since its has strong accountability, it brings a big overhead, but you can relax the accountability to scale better. -- Thanks, Nick Jordan From: david.m.lundgren SpamElide on behalf of David Lundgren Sent: Tuesday, April 19, 2011 2:52 AM To: Gupta, Indranil Subject: 525 review 04/19 PeerReview: Practical Accountability for Distributed Systems Haeberlen et al. introduce their system, PeerReview, for observable Byzantine fault detection using accountability in distributed systems. Accountability is defined in terms of a tamper-evident record that provides undeniable proof of all node behavior. Using this record, misbehaving nodes are detected and false positives are prevented. The authors assume that the state machine of a node is deterministic, that all messaged are eventually received, and that secure hash functions are used. Nodes cooperate to maintain a global tamper-evident record of the system's activity. Nodes frequently replay the logs and compare the results with authenticators to detect node failure. A challenge/response protocol is described to account for message delivery latency due to network constraints. An evidence transfer mechanism is also discussed to disseminate faulty node detections to all network nodes. The completeness requirements are relaxed and randomized consistency checking is discussed. Finally, PeerReview is evaluated on three distributed systems: an overlay multicast, a network file system, and a peer-to-peer (P2P) system. Pros, Cons, Comments, and Questions: - Scalability is a concern with PeerReview. To maintain completeness, system overhead grows quadratically with the size of the witness group (which to maintain completeness must be of size O(n)). Decreasing the witness set size and using probabilistic consistency checking reduces this overhead substantially. It seems reasonable to assume there might be a middle ground between the two extremes, one that requires linear overhead while maintaining completeness. - The static network membership assumption was troubling in the context of evaluating PeerReview in a peer-to-peer service (ePOST). P2P systems are often characterized by high rates of churn, and I would like to see an analysis of churn rate's effect on PeerReview performance. - It is unclear how rapidly faulty node detection is propagated to all non-faulty nodes. Furthermore, in the consistency protocol, it is not clear that a node i will be always able to communicate with the witnesses w(j) of another node j. - The ternary flag of suspected/faulty/functioning allows for arbitrary network delays, allowing for suspected nodes to transition to functioning states upon message receipt. - The use of local/peer witness groups to provide accountability is a nice solution to the potential bottleneck of a centralized, omniscient authority. ------------------------------------------------------------------------- The Byzantine Generals Problem Lamport et al. describe the Byzantine Generals Problem (BGP), an agreement problem in which processes (generals/lieutenants) can malfunction in an adversarial and malicious manner. The problem has two explicit criteria and one implicit criterion. Explicitly, all loyal (correctly functioning) lieutenants must agree on the same order; and if the commanding general is loyal, then every loyal lieutenant must obey this command. Implicitly, this process should halt. The authors show that with oral messages, at least two-thirds of the lieutenants/generals must be loyal. A simple 3-player example is shown where a loyal lieutenants cannot distinguish between a malicious message from a commander and a loyal lieutenant's communique from a malicious lieutenant and loyal commander. The authors then expound upon this result, showing that the approximate agreement problem is as hard as the exact agreement problem. Solutions for BGP using signed and unsigned messages are presented. The signed message solution is then generalized to undirected graphs with fully connected loyal lieutenants. Lamport et al. close with some observations on how BGP and their associated solutions may be implemented pragmatically. Pros, Cons, Comments, and Questions: - The problem formulation is concise and elegant, and the narrative form seems to improve the readability of the paper (although I have read that Lamport did this somewhat sardonically). - The (message) complexity of the algorithms seems to be prohibitively large in N. This seems to imply that the presented algorithms for Byzantine fault tolerance did not scale to arbitrarily large systems. - Lamport et al. prove that their signed message algorithm ``generalizes'' to specific instances of regular graphs. I am curious as to what other network topologies the SM algorithm is valid for. - It seems the algorithms discussed are not resilient to timing attacks. From: Tengfei Mu Sent: Tuesday, April 19, 2011 1:40 AM To: Gupta, Indranil Subject: 525 review 04/19 1. The Byzantine Generals Problem The paper introduces the Byzantine Generals Problem, which is an introductive concept to define the general computer system fault-tolerance problems. The Byzantine generals Problem is posed as follows: An army must collectively decide whether or not to attack a city that they have surrounded. The army is broken into divisions and a general leads each division. The generals can communicate with each other by message if they observe the enemy. And the key issue is that some of the Byzantine generals can be traitors, and can disobey the orders sent by a general. Then the army needs to conduct the correct plan, be tolerating to the traitors. The authors show that the army will reach consensus and "do the right thing" if no more than 1/3 of the generals are traitors. The paper provides an intuitive reasoning behind this for a three generals problem, and then proves this result for any number of generals. These results show us how many faults a system can handle before the system delivers the wrong result. Pro: 1. The theoretical bounds are still very useful for today’s computer system. Con: 1. The paper does not consider practical system implementation 2. The fault tolerance solution here could not be applied on asynchronous systems, since the absence of message needs to be detected. 2. PeerReview: Practical Accountability for Distributed Systems The paper presents a general and practical system for detecting node faults in the distributed system. PeerReview is designed for distributed networks and the experimental evaluation shows it can be used widely in network file systems, peer-to-peer email, overlay multicast and a number of other distributed systems. PeerReview tries to store the records of messages sent and received, as well as the inputs and outputs of applications. The nodes regularly request the log files from other nodes and compare the records. When any deviations found, the other node is considered faulty. Also, if node does not respond to the requests over time it is initially suspected and eventually exposed as faulty. Pro: 1. PeerReview guarantees completeness and accuracy. Con: 1. PeerReview assumes that the application behavior are deterministic 2. PeerReview will incur much overhead on CPU and disk space. From: wzhou10 SpamElide Sent: Tuesday, April 19, 2011 12:54 AM To: Gupta, Indranil Subject: CS525 Review 04/19 Review of “Byzantine Generals Problem” Core Idea: This paper discusses the problem of agreement in the presence of non- crash/Byzantine failure. The problem can be abstracted through a battle- field analogy as follows: a set of n generals have to make an agreement to attack the enemy or to retreat. Each general is either loyal or traitor. The goal is that all loyal generals agree on a common decision. One of the generals is the commander, and the other n-1 generals are lieutenants. The commander sends an order to lieutenants, generals communicate with messages. Byzantine agreement is achieved if that 1) all loyal lieutenants obey the same order, and 2) if the commander is loyal, then every loyal lieutenant obeys the order he issues. An algorithm to achieve this agreement using oral message (OM) is proposed based on following assumptions: 1) every message is delivered correctly; 2) the receiver of a message knows the sender; 2) absence of a message can be detected. It’s proved that when there are less than n/3 traitors, this agreement is possible. That is the system can tolerate up to (n/3-1) Byzantine failures. The algorithm is: 1) the commander sends out order to n- 1 lieutenants; 2) each lieutenant sends the value received from the commander to the other n-2 lieutenants; 3) each lieutenant takes the majority of the values it received. When a stronger assumption, that no one can forge commander’s order, is hold, another algorithm, SM, is proposed. In this case, any number of failures are solvable. The correctness of these algorithms is presented too. For missing communication paths, the authors also presented an algorithm. They mainly considered regular graphs in the proof. Pros: 1. Byzantine General problem is one of the most classical distributed problem for decades. This paper gives a clear definition and explanation of the problem, very educational. I like the way it proves the OM algorithm: first use a three-general case, then each general represents a set of generals. 2. The theoretical bound is useful in today’s distributed systems. 3. SM algorighm is insensitive of message failure. Cons: 1. I’m not sure if the assumptions are realistic enough. 2. Could the traitors conduct man-in-the-middle attack? 3. Overhead of proposed algorithms is a little high. 4. SM algorithm assumes detection of missing messages. Does this require synchronization? 5. Only regular graphs are considered in case of non-completely connected graph. Review of “PeerReview: Practical Accountability for Distributed Systems” Core idea This paper develops PeerReview, to defend against Byzantine failures by providing accountability in distributed system. PeerReview ensures that a faulty node is eventually be detected, and a correct node couldn’t be falsely accused. The core idea of PeerReview is that each node maintains a non- tamperable log of the passing messages along with input and output of applications. Each node consists of a state machine, a detector module and an application. The state machine represents functions need to be checked. The detector module implements PeerReview. It observes all input and output of the state machine, and communicates with other nodes’ detectors. State machines are assumed deterministic, and nodes can sign messages. Each node also has a set of witness nodes which audit its log periodically. If the witness nodes find any misbehavior, they generate evidence and send to other nodes. Then other nodes then check the evidence and report the fault. Pros: 1. PeerReview provides accountability for all nodes in a network to defend against Byzantine failures. It’s ensured that eventually faulty nodes are detected and no correct nodes are accused. That is it guarantees completeness and accuracy. The idea is simple and intuitive. 2. A faulty node is detected with high probability ad the message complexity is O(logn). 3. No modification of lower layers is required. Cons: 1. It depends on the assumptions a lot. But some of the assumptions are not quite realistic, like say, that applications are deterministic. 2. Storage of logs is a problem, since the log will grow linearly with the number of the nodes. Also, network traffic is heavier and signing of messages increases latency. Best, ~Wenxuan From: lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng Sent: Tuesday, April 19, 2011 12:01 AM To: indy SpamElide Subject: 525 review 04/19 CS 525 - Review: 04/19 In Byzantium The Byzantine Generals problem, L. Lamport et al, TOPLAS 1982 This paper dealt with the toughest threat model, i.e., Byzantine faults. Under this model, opponents can perform almost any behaviors and have a full knowledge of the algorithm and the implementation. The only limitations are computing power constrained by hardware and knowledge about privacy. For example, usually researchers assume that the Byzantine opponents cannot break one-way hash functions or do not know non-faulty node’s private keys. Otherwise, it is impossible for a node to “sign” the message. The paper focused on the most basic primitive in almost any system, agreement among non-faulty nodes. This is called Byzantine agreement (or consensus) problem in the literature. There are several contributions. First, the fault model introduced in the paper is very strong and is widely used in any area related to security, since if a system is robust against Byzantine faults, then it can also tolerate any other faults, such as crash failure. Second, the paper briefly gave an intuition on the lower bound of the ration of faulty nodes over total number of nodes. Third, two recursive algorithms that are intuitively simple were proposed. One requires nodes’ signatures on each message while the other had no such requirement. Basically, the spirit of the algorithm was to obtain message as more ways as possible, i.e., through different paths. After obtaining enough messages, node just perform majority vote to reach the agreement. Since the number of faulty node is limited, such scheme works. Finally, the paper presented results over connectivity among nodes. Comments/questions/critiques: The paper used an interesting analogy, but made it somewhat seem informal. However, maybe this was the way to write paper in old days. The fault model is unrealistically strong. I am wondering whether any practical systems really build (or even claim to have) a system against Byzantine fault. The warning about nonrigorous reasoning when trying to prove something in the paper is very true and should keep in mind all the time. Sometimes, there are just way too many traps while seeking for something totally unknown. PeerReview: practical accountability for distributed systems, A. Haeberlen, P. Kouznetsov, P. Druschel, SOSP 2007 The paper adopted the usage of accountability to build a practical log-based distributed system that can tolerate Byzantine faults. The design proposed in the paper was very general and was able to detect Byzantine faults and provided accountability for any systems that are essentially a set of deterministic state machines. Most previous Byzantine-related papers dealt with agreement on something, either a broadcast from a source or an order of performed tasks. The paper focused on the detection problem, i.e., identifying those faulty behaviors that directly or indirectly manipulate a message of which is observed by some non-faulty nodes. The first contribution of the paper was to distinguish between ideal and non-ideal completeness and accuracy; otherwise, with such strong ideal requirements, it is likely that nothing can be solved. Second, the paper proposed a general working scheme. Though the log-based system was not a very innovative idea, the paper used it together with commitment, consistency, audit, challenge/response and evidence transfer protocols to implement a fault-detection system. The key idea was to require every node to keep a log that proves its behavior. Further, a node needs to submit such proof to its witnesses (other nodes monitoring the node). Third, a relaxed version with probabilistic guarantees was also presented. This was done by assigning fewer witnesses. Finally, the practical implementation issues such as selecting appropriate parameters and keeping log in moderate size were discussed and some applications were presented and analyzed. Comments/questions/critiques: The idea is simple and intuitive, just monitoring and using log to prove nodes’ correctness. However, the way that the paper is presented made it seem complicated. Moreover, I was expecting more theoretical analysis on overall implementation (at least big-o notations on complexity). For example, message complexity was mentioned in consistency protocol but not the others. Also, log truncation was only briefly discussed. I would like to see whether there is any tradeoff between reliability and storage space for log. While designing practical system, one guideline is to make a common case faster. However, PeerReview does not consider this. Node needs to keep everything to prove its correctness even with some (mysterious) knowledge about low probability of faults. If such similar prior knowledge could be incorporated into PeerReview, the system would be more practical. Moreover, a hybrid of log-based system and the traditional agreement protocol such as Lamport’s algorithm might be useful. Since while performing agreement on one particular node’s correctness among n nodes, roughly O(n^2) bits need to be exchanged. On the other hand, in PeerReview, a challenge log needs to be fetched from witnesses which might be greater than O(n^2) bits, if we perform such agreement in a limited set of nodes. Since practical system is considered, the mechanism for redeeming node’s correctness seems to be necessary. In PeerReview, once a node is exposed, it is always exposed. However, in real word, some mistakes should be allowed, because reckless user behaviors or random machine errors are unavoidable. From: Agarwal, Rachit Sent: Monday, April 18, 2011 11:50 PM To: Gupta, Indranil Subject: 525 review 04/19 Rachit Agarwal ---- PeerReview: Practical Accountability for Distributed Systems The authors argue the importance of accountability in practical distributed systems and present a system that tries to achieve accountability in a scalable fashion. While it is an interesting direction of work, there are a lot of catches in the paper. Some of my thoughts/comments: 1. The assumption of nodes running a deterministic application is a breaker. I am not aware of many systems that run deterministic protocols, especially given the inclination towards randomized schemes. It is not clear that there is a simple modification to the basic PeerReview system that would allow the system to handle non-determinism. 2. The next catch is the unique ID names. I find it surprising that people still assume that nodes can not tamper their IDs. Consider a system (with no centralized server) where a node can not tamper its ID. How does a new node join the system then? Even if there is a centralized server, how does it differentiate between a node that is trying to tamper its ID or a node that is legally trying to join the system. 3. How important is the system really is? Is it not possible that a node perfectly follows the syntax of the protocol and still be able to cause extreme damage to the functioning of the system? How about nodes that tamper the messages while relaying them? 4. The authors talk about the importance of the system in presence of multiple authoritative domains. Why would such domains reveal their logs to other domains? and the list continues ..... ---- The Byzantine Generals Problem A classic paper in the field of consensus. Some ideas/thoughts on the paper: 1. What do these results tell us about distributed systems that are more application-layer dominated? In particular, how can these results be modified in case of virtualization and other OS level hacks? 2. Can one reduce the messaging overhead while providing slack guarantees? 3. How does bandwidth play a role? In particular, the results in this paper assume that all links have infinite capacity; what about practical networks, which have a finite link bandwidths? 4. Can one extend these results further with lower/upper bounds if we also want to detect the culprit? ---- -- Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Web: http://agarwa16.wikidot.com ---- Sing as if No one is Listening, Dance as if No one is Watching, Dream as if You'll Live Forever, Live as if You'll Die Today !! From: mark overholt Sent: Monday, April 18, 2011 8:37 PM To: Gupta, Indranil Subject: 525 review 04/19 Mark Overholt CS525 Review 04/19/2011 The Byzantine Generals problem Summary: Byzantine General Problem can be abstracted through a battle field analogy as follows: a group of generals (one of them is the commander) gathers around the enemy's place. Among these generals, one or more than one, are traitors. The goal of Byzantine General Problem is to reach a consensus by loyal generals whether they should attack the enemy place or they should retreat, irrespective of what the traitor generals say. In the paper, the authors proposed solutions how to reach a consensus in such situation. Such solution can easily be extended to any distributed system, showing the similar characteristics. The problem to be solved is as follows: We consider a set of generals. Each general is either loyal or traitor. In the agreement problem the goal is that all loyal generals decide on a common plan. In this setting, traitors can be considered as Byzantine faulty nodes of the system, while loyal generals are fault-free nodes. We assume that out of total of n generals, one of them is the commanding one and the other n-1 are lieutenant generals. A commanding general sends an order to lieutenant generals. In the Byzantine general problem, we require that: 1. All loyal lieutenant generals obey the same order and 2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. This paper proves that the above mentioned agreement is impossible if 1/3 or more of the generals are traitors. In other words, it proves that if m traitors are present, agreement is possible only if total number of generals is 3m+1 or more. Discussion: Pros: The authors nicely showed the theoretical lower bound of number of generals. Signed message algorithm is insensitive of message communication failure. Cons: Are the assumptions made in this paper realistic? Many distributed systems enjoy a very high churn rate, where it is hard to know the number of nodes a priori. In that case, it is not guaranteed to say whether a system is robust to byzantine fault or not. What will be the impact of man-in-middle attack? Proposed solutions suffer from high time and message complexity. Is it possible to distinguish between failure of communication and failure of nodes? PeerReview: practical accountability for distributed systems Summary: PeerReview presents a system architecture that ensures that nodes are made accountable for their actions and allows correct nodes in the system to detect bad nodes whose misbehavior can be observed through the messages they receive. The application which is to be made accountable runs over a detection module which keeps track of all the messages that are received and sent by the application running above and logs them securely in a non-reputable append-only log, maintained by each node. This log is presented by the detection module when another node requires verification of the actions taken by this node. This log serves as the proof for the actions of the node. PeerReview assumes that the state machine of the application is deterministic and thus, replaying the messages in the log would allow it deduce whether the application is behaving correctly or not. A reference implementation of the application at each node is maintained which is used to check for misbehavior. To ensure that every message received by a node is logged, PeerReview uses a commitment protocol which requires the receiver to send an authenticator to the sender containing the log entry for the message. Each log entry contains the message, sequence number and type of the message associated with a recursively defined hash value and authenticator making the log tamper-evident. Thus, PeerReview guarantees that eventually every bad node is detected and no node is falsely accused. Discussion: Pros: It provides guarantees for eventual detection of faulty nodes and ensures that no correct node is falsely accused. It requires no difficult modeling of the protocol under question and thus can be done in an application independent manner. Cons: It assumes that the applications are deterministic. Although, this assumption is not true in practice, it might not be so in general esp. if the algorithm/protocol itself is random. Although, the authors say random values can be regenerated by configuring the seed for them, it does not provide a convincing argument that this captures all sources of randomness associated with the program. Although PeerReview can be used as a mechanism to detect malfunctioning/bad nodes, it cannot do anything to detect the bugs in the system. Thus it only provides something like syntactic detection but not a semantic detection which could presumably be obtained from a model of the protocol. PeerReview does not talk about the case in which nodes perform their functions properly but intercept/tamper messages not addressed to them but routed through them. From: iitb.ankit SpamElide on behalf of Ankit Singla Sent: Monday, April 18, 2011 7:34 PM To: Gupta, Indranil Subject: 525 Review 04/19 1. Byzantine Generals ------------------------------------- Summary: The paper discusses a classic problem related to consensus in distributed systems by framing it in the interesting setting of several units of an army at war coordinating a retreat/attack decision. It presents a simple formulation of the problem with the requirement that all non-faulty processes come to the same decision and if the leader process is non-faulty, every non-faulty process uses its decision. The paper proves that satisfying these conditions is impossible in a system with fewer than 3m + 1 processes for every m faulty processes. It further gives an algorithm for satisfying the conditions if there are 3m + 1 or more processes. Comments: The proofs are very simple an elegant, particularly the impossibility of solving the Byzantine Generals problem with fewer than 3m + 1 processes. It’s fairly intuitive that signed messages make the task easier, but a formal proof of the fact puts the matter to rest. 2. PeerReview ------------------------- Summary: The paper discusses the problem of detecting faults, identifying faulty nodes and convincing others about a node being fault/non-faulty. The paper limits itself to faults which are visible to some correctly working node. It also restricts fault-detection to “suspicion” in scenarios like long delays where deterministically distinguishing a fault from a network delay etc. is not possible without synchrony. The solution is based on tamper-evident logging of each node’s messages locally at the node. Other nodes act as ‘witnesses’, check these logs for correctness, and make the status (faulty/non-faulty) known to the rest of the system. Comments: That the paper discusses accountability and not prevention separates it pretty well from most other literature in fault-tolerance research. For a heterogeneous system, for every node to have a reference implementation of every other node’s state machine is a significant overhead. (For systems running instances of the same process, this is a non-issue). The tamper-evident logs are a pretty smart idea I think. I liked the clear and concise discussion of PeerReview protocol features that tackle nodes attempting to lie about their own or other nodes’ performance. Discussing real applications also helps understand the system better. Ankit -------------------------------------------------------------------- Ankit Singla Graduate Student, Computer Science University of Illinois at Urbana-Champaign (UIUC) http://www.cs.illinois.edu/homes/singla2/ From: Curtis Wang Sent: Monday, April 18, 2011 12:00 AM To: Gupta, Indranil Subject: CS 525 Reviews Hi Professor, I just realized that I made a mistake and submitted the wrong reviews last week on 4/14--I actually read two papers about the "In Byzantium" topic when the topic was actually "Old Wine...". I apologize for the inconvenience. Also, even though the lectures were covered last week, would it still be possible for me to write a review for the actual 4/14 lecture topic? Thanks for your time. Best regards, Curtis