PhD student | Coffee Drinker Meme Connoiseur

University of British Columbia

Biography

Hi! I am a PhD Candidate in Computer Science student at the University of British Columbia. I am a member of the Systopia Lab, and am supervised by Prof. Margo Seltzer. My research focuses on graph data management systems.

Previously, I worked with Prof. Ivan Beschastnikh for my masters degree. My thesis explored trusted execution for cross-platform data privacy. Prior to that I worked at NetApp Inc., making tools and utilities to make Linux hosts work seamlessly with NetApp’s Data ONTAP.

Interests

  • Operating Systems
  • Distributed Systems
  • Data Processing at Scale

Education

  • PhD in Computer Science

    University of British Columbia

  • MSc in Computer Science, 2019

    University of British Columbia

  • BEng in Computer Science, 2013

    Birla Institute of Technology and Science, India

  • MSc in Biological Sciences, 2013

    Birla Institute of Technology and Science, India

Experience

 
 
 
 
 

PhD Candidate

University of British Columbia

Jan 2020 – Present Vancouver, BC
  • Designed FlexoGraph: an ACID-compliant Graph Database that uses WiredTiger as the storage engine that performs better than existing specialized graph processing systems or graph databases. We provide multiple optimized data layouts, persist algorithmic state for bootstrapping future computations on new snapshots, and use lockless algorithms.

  • Studied the implications of translation architecture on the performance of SMR drives and showed that a host-based log-structured translation layer can reduce the p99 latency by 50×.

  • Benchmarked graph processing systems to understand the latent impact of statistical properties of the datasets on the performance of these systems and identified a set of best practices for benchmarking graph processing systems

 
 
 
 
 

Graduate Research Assistant

University of British Columbia

Jan 2018 – Oct 2019 Vancouver, BC

I worked on the Trusted Capsules project. Trusted Capsules provide graduated access control on remote devices.

  • Uses Linaro OP-TEE to manage fine-grained and trackable access to data on remote devices by linking the data to its access policy and encrypting them together
  • Leverage FUSE to intercept operations on encrypted files to facilitate their on-demand decryption and re-encryption using the trusted application running in the Secure World
  • Prototype written in C using a LeMaker Hikey board with ARM TrustZone and Linux 4.15.
 
 
 
 
 

Member of Technical Staff - II

NetApp Inc.

Jul 2015 – Jul 2017 Bangalore, India

Responsibilities include:

  • Worked on Unified Host Utilities Kits for Linux and Unix – a tool for checking the health of storage on the OS when connected to NetApp storage controllers. It provides path and state information for all NetApp LUNs present on a host by issuing queries to the Host Bus Adapter API libraries.
  • Handled infrastructure orchestration and configuration management for interop QA infrastructure.
  • Designed and developed SAN Host Remediation Tool that automates the tasks to be performed on hosts when the storage migrates from NetApp 7Mode Data ONTAP to Cluster Data ONTAP. Supports all major host OS variants
  • Designed and developed iLAB – a framework that handles dynamic testbed creation, resource allocation and initialization, test execution, and testbed tear-down. Increased execution efficiency by 95%
  • Wrote Python and Perl scripts and libraries to test the interoperability of new Linux host and Data ONTAP features Depths, etc

Research

My research is focussed on graph processing systems and graph data management.

Graph DBs (ex. Neo4j or ArangoDB) provide superior performance for storing and managing an evolving graph; however, they underperform on long-running whole-graph analytics tasks. On the other hand, specialized graph processing systems (ex. GraphX, PowerLyra, GraphChi, Ligra, etc.) offer strong whole-graph analytics performance; however, they must first construct a favorable in-memory graph representation using expensive extract transform load (ETL) pipelines and preprocessing steps.
I address the central dichotomy of graph data management research: graph databases allow for persistent storage of graphs in a native representation, but they are outperformed by specialized graph processing systems, which need a costly ingest and preprocessing phase before executing efficient graph analytics queries.

My research answers whether it is possible to build a persistent graph database that can provide the OLTP performance expected of a database with the OLAP performance expected from a purpose-built graph-processing system.
I have built a hybrid implementation, Flexograph, using the WiredTiger key/value store. Our preliminary analysis shows Flexograph has a robust performance on analytics tasks compared to other out-of-core and in-memory systems when factoring in their pre-processing times.

Service

Academic Service and Volunteering

Systems Lab Representative

Systems Lab Representative

Program Committee Member

Tuesday Tea Czar

Teaching

 
 
 
 
 

Advanced Operating Systems

University of British Columbia

September 2019 – December 2019 Vancouver, BC

I was the Teaching Assistant for CPSC 508: Advanced Operating Systems This seminar-style course introduces students to the theory and practice of conducting systems research. The papers discussed cover the history of operating systems research with a special emphasis on understanding what constitutes systems research and how it has evolved.

As a Teaching Assistant, I help students with their assignments and term projects - right from defining scope to finding the right compute resources for them to use. I also ran tutorials to bring the undergraduate students up to speed so that they can have a richer learning experience.

 
 
 
 
 

Graduate Teaching Assistant - Software Engineering Course

University of British Columbia

September 2017 – April 2018 Vancouver, BC

I was the Teaching Assistant for CPSC 319: Software Engineering Project at UBC from September 2017 to April 2018. This course provides an opportunity to undergrad students to design, implement and test on a large software system for an industry sponsor. The focus in this course is to apply waterfall SDLC methodology to the solution from inception to production, producing key documentation artefacts while working in a team.

I acted as the industry sponsor liason and “engineering manager” to the teams. Over the course of the year, I overlooked two projects and managed about 40 students.

  • Managed two teams implementing a self-service tool for Uniserve clients and IT support staff.The tool allows the IT staff to manage their devices and see real time information about their network health and usage trends.

  • Supervised two student teams implementing a self-checkout and payment portal for ChainXY.

Projects

Analysing Snort with KLEE

Understanding usability of KLEE by trying to analyse a large Intrusion Detection System

Revelio

A tool for doing static analysis of Python code for known vulnerabilities

Trusted Capsules

ARM TrustZone backed data privacy on remote devices

Recent & Upcoming Talks

Cross-platform Data Integrity and Confidentiality with Graduated Access Control