I’m Ben McMorran, a rising junior studying computer science at Worcester Polytechnic Institute. I’ve spent the last twelve weeks as a software engineering intern on the Teaching and Learning (TNL) team. This is my second summer as an intern here at edX. While there are many tasks I worked on during my time here, there are two main projects that I’d like to highlight.
API and Front End Development for Teams
The first project I worked on was the Teams feature for the LMS, which is still in development. This feature will make it easier for students to connect and converse with each other in small groups and increases the virality of edX courses. Development for this feature included front end work using Backbone and API implementation with the Django Rest Framework (DRF). While I was familiar with Backbone from improving the course publishing workflow last summer, the API work was new to me.
There was a strong focus throughout development on creating decoupled, reusable components. One example of this is the way we designed pagination controls for team and topic listings. We developed several generic, reusable paging controls compatible with the page information that DRF returns:
These controls, as pictured above, will be easy to integrate with other edX API endpoints in future development.
The expandable fields I created to support the Team API are another example of reusable code. Clients are able to specify which fields they would like more information about as part of the initial request. For example, a request for team information could specify that the users field should be expanded. Instead of only providing usernames, the response would then include details about each user on the team. This reduces the number of requests the client must make, or reduces the size of the response when the fields are unnecessary. Expandable fields are easy to integrate with any DRF API by specifying the field as an ExpandableField and providing a serializer for the collapsed and expanded state. As the edX platform grows, this focus on reusable components will only become more important.
Discussion Forums Performance Improvements
I also spent several weeks improving the performance of our discussion forums. We use New Relic to monitor the servers running edx.org. Earlier this summer, the monitoring captured a trace that showed it was taking over 40 seconds to post a comment in one specific course, prompting further investigation.
I loaded the problematic course into my local development environment and tried posting a comment. Profiling revealed that the server was spending the vast majority of the time emitting an analytics event, which includes the topic of the discussion, if applicable. The topic of a discussion component provides a way to filter and group discussion threads. For example, all of the threads in an inline discussion component have the same topic.
In the discussion app, comments are created based on a discussion id used by the comments service. However, the discussion topic for a particular comment is stored in the discussion module as part of the course. Discussion modules know their associated discussion id, but there was no efficient way of obtaining the discussion’s topic if you only knew the discussion id. The problematic course had almost 1000 discussion modules. Creating the analytics event loaded every single one to discover the discussion topic!
My first thought was to add an index on the discussion id. This proved to be problematic because there are several persistence mechanisms for courses in the edX platform (old mongo, split mongo, and XML courses). Using a new index would require drastic changes. Instead I created a mapping of discussion ids to associated modules. This mapping is cached in the MySQL database when a course is published. Since course data rarely changes but is often accessed, the relatively high cost of building the mapping by traversing the entire course is acceptable, since it will happen infrequently.
With my fix implemented, I needed to verify it through load testing. This process was brand new to me. While it wasn’t challenging in itself, it took me a while to get up to speed. I ran the existing forums Locust tests against the problematic course before and after my fix was applied.
Before the fix, it took about 20 seconds to create a comment over the half hour of load testing. Notice the huge number of MongoDB queries, 1320, in the breakdown table as every discussion module in the course is loaded.
After the fix, it took about four seconds to create a comment over the half hour of load testing. Notice how the number of MongoDB queries is now only 6.75.
Response times were about five times faster and the number of MongoDB queries was vastly reduced with the fix. It’s now in the edx-platform master branch and should be deployed to edx.org soon.
My experience as an edX intern was fantastic. Embedded on the TNL team, it felt like I was a full time employee. I was able to take on real tickets and see the impact of my work on the platform. Developing an open source project is awesome. I’d like to thank Andy Armstrong, Christina Roberts, the entire TNL team, and edX for making this a great summer!