stack_overflow_crokage

I am sure every programmer has turned to Stack Overflow whenever they are stuck on a coding problem.

While some use the internal SO search feature, others use search engines such as Google or Bing to find answers on the website, but finding all answers on the Q&A platform isn’t always that easy.

To address this issue, a team of computer science researchers has come up with a solution: CROKAGE – the Crowd Knowledge Answer Generator.

This service takes the description of a programming task as a query and then fetches relevant and detailed programming solutions that contain both code snippets and their explanations.

Developing CROKAGE

The researchers knew that coders looking for coding solutions face a challenge because of a lexical gap between their query (task description) and the information (lines of actual code) related to the answers they are looking for.

So more than often, developers have to browse dozens of pages to synthesize a full solution.

To reduce the gap between the queries and solutions, they trained a word-embedding model with FastText, using millions of Q&A threads from Stack Overflow as training data.

approach
Schematic diagram of CROKAGE. A) Corpus Preparation, B) Building
Models, Maps and Indices, C) Searching for Relevant Answers, and D) Composition of Programming Solutions

The best part about CROKAGE is that it expands the natural language query (task description) to include unique open-source software library and function terms — sourced from Stack Overflow.

How CROKAGE works?

A combination of four weighted factors is used to rank the candidate answers, using traditional information retrieval (IR) metrics such as TF-IDF and asymmetric relevance.

To create a custom list of factors related to Stack Overflow, they adopted a specialized ranking mechanism that works well with software-specific documents.

A crucial part of developing CROKAGE was collecting programming functions that potentially implement the target programming task (the query), and promoting those answers containing such functions.

According to CROKAGE, an answer containing a code snippet that uses the relevant functions and is complemented with a sufficient explanation — is a strong candidate for a solution.

To ensure that the explanation provided in these solutions are relevant and valuable, the team used natural language processing to rank the answers by the four weighted factors.

For more details on the same, you can refer to the published paper.

Currently, CROKAGE is experimentally available at http://www.isel.ufu.br:9000. It’s limited to Java queries for now, but the creators hope to release an expanded version soon enough.

Interested developers can find the project source code at GitHub.

Also Read: Nvidia CEO Slams Intel’s OneAPI, Says “Programming Isn’t As Simple”