| Due to the explosion of large amounts of data and large number of machines, data analytics frameworks became crucial components in developing new technologies and generate new discoveries.;Resource scheduling remains a key building block of modern data analytics frameworks. As data volumes increase, jobs from many users and applications, consisting of many tasks contend for the same pool of shared resources. Consequently, today's cluster schedulers have to deal with multiple resources, consider jobs with complex structures and allow job-specific constraints. Furthermore, schedulers must provide performance isolation between different users and groups through fair resource sharing while ensuring performance and efficiency.;We first present the design of a multi-resource packing cluster scheduler called Tetris which efficiently packs tasks to machines based on their requirements of all resource types. Doing so avoids resource fragmentation as well as over-allocation of the resources that are not explicitly allocated. Tetris combines heuristics of packing and improving average job completion time and shows that achieving desired amounts of fairness can coexist with improving cluster performance.;Given that the users of data analytics jobs observe the outcome of performance isolation when their jobs complete, and they care less for instantaneous fair-share guarantees, we explore how to design an altruistic, long-term approach scheduling solution called Carbyne, where jobs yield fractions of their allocated resources without impacting their own completion times. Leftover resources donated through altruism enables an additional degree of freedom in cluster scheduling to further improve secondary objectives.;Although a variety of different frameworks for expressing and running large-scale data analytics exist today, they share the common attribute that they are compute-centric in nature. Key details of job execution depend on the physical structure of the data parallel computation. Driven by these observation, we introduce a fast and flexible data analytics framework called F 2 which separates computation from data, making them equal first-class citizens. It then enables data-driven computation -- determining logic changes, computing parallelism, and scheduling tasks for execution are all triggered by the relevant intermediate data becoming available. |