Appearance
Research Statement
Introduction
There are a certain set of problems and questions I am interested in solving or answering or tools I want to use, and these dictate my research interests. I have limited experience in performing formal research but do think I possess a suitably rich history of exploration via academic projects. I would like to begin by talking about these and then move on to explain how they inform my research interests.
Bachelor's thesis
During my Bachelor's in 2017, I was interested in a smarter To-Do application that could self-organize its contents by intelligently inferring metadata based on to-do item text. An MVP demo for this tool can be found here.
The approach I used to create this tool formed the basis of my bachelor's thesis in 2018. It was titled "Intent Recognition from Textual User Utterances" where I demonstrated how a hierarchy of classifiers could be trained to recognize the intent of a user, from textual utterances. I made use of the crowd-sourced Snips NLU dataset, and the approach was found to have an overall accuracy of 95.16% while only using rudimentary text classification techniques such as Bayesian text classification or Logistic Regression. The work also studied how the choice of text classifier affects accuracy, and the improvement when classifiers are arranged in a hierarchy. Here is a poster that summarizes the work.
Masters Projects
I joined the master's in computer science program here in the Fall of 2018. Here's what I worked on -
- In 2018, Under Dr. Dorodchi's guidance, I worked on an application for my Software System Design & Implementation that would transcribe discussions happening at each table in a classroom. The transcriptions were then analyzed using NLP to generate useful insights. An MVP demo for this tool can be found here.
- In 2018, Under Prof. Welch and Joshua Shabtai's guidance from Lowe's Innovation Labs, we worked on a Visual Search Prototype for my Computer Vision class. Lowe's often has customers walk in with a random part wanting to buy more of the same SKU, but employees have trouble identifying the part. We proposed a similar approach used in my bachelor's thesis where a hierarchy of classifiers would be used. The first set of classifiers would infer product category, while the next set of classifiers would be specialized to infer the SKU within that product category. An MVP demo for this can be found here.
- In 2018, Under Dr. Zadrozny's guidance, I worked on a solution to SemEval-2019 Task 3: EmoContext for my Modern Data Science Class. The approach used was pre-trained word vectors combined with custom embeddings supplied to a Bi-LSTM-based novel neural network. Custom embeddings were trained and used because our corpus had a lot of textual (such as
:(
,:p
, etc.) and Unicode emojis (😍, 😁, etc.). You can read more about it here. - In 2019, For my Intelligent Robotic course, under Dr. Akella, I worked on "Robotic Arm Manipulation using Quaternion Neural Networks" where I studied the efficacy of using Quaternion Neural Networks to do Inverse Kinematics. The input to the neural network was quaternions describing the current orientation of each link in the robotic arm, along with the desired end-effector coordinates. The output of the neural network was the translational and rotational components of the movement each link should make such that the end-effector ends up at the desired position. Quaternion neural networks were able to fit the function but performed no better than regular neural networks. You can read more about it here.
In 2019, For my Machine Learning course, under Dr. Minwoo Lee, I worked on "Semi-supervised Natural Language Understanding" where I used generative language models to solve the Natural language Understanding task of user intent identification. The process used to train the language model (LM) was similar to the first step from "Training language models to follow instructions with human feedback (2022)" (p. 3), except instead of a human labeler demonstrating desired output behavior, the prompt and the desired output behavior was computer generated from a labeled dataset. Performance was evaluated by using the Bilingual Evaluation Understudy (BLEU) score to compare LM-generated output with computer-generated outputs from the labeled dataset. The labeled dataset used was the same one from my bachelor's thesis - Snips NLU.
I write about this project separately, because it is this work that inspires most of my current and future interests.
I am also aware that none of these possess the right rigor to ever be published, but I was just learning and exploring at the time.
Current and Future Research interests
While I had a background in Machine learning, I would've spent 3 years teaching as an Adjunct by the end of Fall 2022. Most of my current interests are in the use of Machine Learning for teaching and learning.
Teaching programming in the age of LM based autocomplete
There has been a lot of interest in studying the code generation capabilities of generative Language Models. This has led to commercial tools like Amazon CodeWhisperer and GitHub Copilot, which coincidently is trying its hardest right now to help me write this statement.
LM-based autocomplete has the potential to disrupt assessments in undergraduate computer science, as assessments for these courses tend to be heavily based on programming assignments. Copilot, for instance, was able to complete most of my Data Structures programming assignments with minimal human intervention.
Imagining a future where such tools are prevalent, and where their use by students cannot be regulated - I am interested in studying how such tools can be incorporated into undergraduate computer science in a manner that doesn't allow them to sidestep the need to meet their learning outcomes.
Let us also look at a student's workflow when using such a tool - 1. Write a comment describing what needs to happen (not always necessary) 2. Trigger code generation 3. Select a generated solution 4. Modify if necessary
1.
is done by the instructors generally, or not required at all, as the LM can infer what it needs to from class names, method names, etc.
3.
and 4.
are done by the students. In my experience 4.
is only needed 10-30% of the time, when using Copilot.
So, 3.
is where the student does most of the work. Selecting the appropriate option is quite different from thinking up and writing code. It turns a program writing assignment into a program (reading) comprehension assignment. d So, all this leads to some interesting questions: - Is it better for the student that most introductory assignments are program reading comprehension instead of writing? - If it's better, how can we adjust the curriculum or assessments to optimize for this? (New tools or activities that promote program comprehension) - If it's worse, how can we adjust the curriculum or assessments to avoid students using them? - How should the learning outcomes for introductory computer science courses change with the presence of such tools in the classroom?
Short, LM-assisted generation of activities and assessments for undergraduate computer science
As a thought exercise, if we were to assume that every student's life situation and level of motivation/engagement is ideal for success, would all of them pass with an A?
If we consider how every student learns at different paces - for a sufficiently difficult course, it is likely that some students' weekly progress would be slow enough that as the weeks go by, they'll keep lagging until a point in time where they cannot score an A or pass the course.
When a student is stuck on an assignment for a certain week, there is not much they can do outside of asking the instructional team for help. The instructional team gives them feedback, and they use the feedback to complete the assignment. For these students who complete assignments with help, it is unclear if the student will be able to learn to generalize the feedback given and apply it successfully to a different assignment testing the same learning outcome.
Instructors only prepare a limited number of assignments, and there is not enough time in the week, especially for struggling students, to retest them on a different assignment.
When using 1-3 medium to large assignments per week, it seems to me that even in the ideal situation where every student's life situation and level of motivation/engagement is ideal, along with an ideal instructional team, we cannot guarantee their success.
I am interested in investigating the possibility of using Language Models to generate short assessments, which take < three minutes to finish and surgically assess a single learning objective. Students would perform 20-40 such short activities each week, allowing for multiple iterations of the feedback loop, but still leaving "untainted" activities that can be used for assessment.
I imagine these activities presented to students as flash cards generally are - something they can work on their phones. There are also other benefits - - Multiple short assessments allow the incorporation of adaptive learning algorithms, distributed practice, etc. - Allows for a quick reflection opportunity before presenting feedback - "I am confident I am correct", "I am unsure", etc. - Allows for mastery grading, specification grading, etc. - Allows for more accurate measurement of student engagement, instead of us finding out at the deadline that a student did not work on the assignment at all, we can notice patterns of inactivity early. This is important for asynchronous course delivery. - If there is one large assignment per week, and if the student cheats on it, then a small-time investment in cheating allows for a large reward for students. Sometimes I feel students cheat because that is the only pathway to success they can envision.
If there are multiple short assignments, the "cheating overhead" adds up for individual assignments to make cheating unfeasible.
Instructional Technologies
- In introductory computer science, the largest source of feedback is compiler errors. Can LMs be finetuned to treat source code + syntax error messages as a prompt and consistently generate friendly explanations of the compiler error?
- Similarly, syntax tree visualizers are helpful to students trying to understand program structure. Can an LM be finetuned to take in source code along with natural language questions from the student about the syntax, and generate a friendly explanation?
- Reflecting on and answering essay-style questions/prompts is an effective method that is often excluded from coursework due to the grading overhead and the infeasibility of providing timely feedback. Can LMs be finetuned to speed up the process by assisting a human grader in providing feedback, or independently provide feedback?
- Course policies can have a huge effect on student success. Flexible late policies can help "revive" students who could not temporarily engage with the course, but for other students, it can be a detriment - students not taking deadlines seriously can lead to them having too much to do at the end of the semester. Similarly for attendance policies. Can course policies be adapted to individual students and independently enforced? Say attendance is required for some and not required for others. Can their canvas activity/attendance from previous semesters be used to determine which policy is appropriate for them? Can we let students decide which day of the week certain categories of coursework should be due?
- Asynchronous courses allow for remarkable flexibility for students, but students taking them have historically performed poorly at CCI vs other course delivery formats. Synchronous courses have a lot of innate inefficiencies, so if anything, asynchronous courses should perform better. What causes this? What tools or technologies do students need access to overcome this?
- For web pages and applications, the effects of time taken to load the page, or time taken for the page to be interactive on user engagement/retention are well studied. How does a slow LMS like Canvas or other tools students are required to use affect student engagement?
- Students nowadays are not used to having to remember information - addresses, contacts, notes, email, everything can be searched through. Every major operating system now includes a system-wide search. How does a course-wide search, say capable of searching through slides, e-books, lecture transcripts, announcements, uploaded files, and assignment instructions affect student performance?
Other interests
- Students need to learn multiple programming paradigms or software architectural patterns (concurrent programming, parallel programming, JavaScript web frameworks, backend frameworks, entity-component system etc.) to be productive in the industry, and upper-level courses. What programming paradigms or software architectural patterns show up most often? What can be taught in introductory CS and how?
- How does the presence of a simple, ergonomic package manager, along with easy-to-use build tools for introductory CS programming languages affect student confidence ("I can make a program that does this!"), or retention (stay in major or leave.)