Translate into a different language

Saturday, August 11, 2018

Even Anonymous Coders Leave Fingerprints | Security - WIRED

Researchers who study stylometry—the statistical analysis of linguistic style—have long known that writing is a unique, individualistic process, reports Louise Matsakis, staff writer at WIRED covering cybersecurity, internet law, and online culture.

Photo: Casey Chin

The vocabulary you select, your syntax, and your grammatical decisions leave behind a signature. Automated tools can now accurately identify the author of a forum post for example, as long as they have adequate training data to work with. But newer research shows that stylometry can also apply to artificial language samples, like code. Software developers, it turns out, leave behind a fingerprint as well.

Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt's former PhD student and now an assistant professor at George Washington University, have found that code, like other forms of stylistic expression, are not anonymous. At the DefCon hacking conference Friday, the pair will present a number of studies they've conducted using machine learning techniques to de-anonymize the authors of code samples. Their work could be useful in a plagiarism dispute, for instance, but it also has privacy implications, especially for the thousands of developers who contribute open source code to the world.

How To De-Anonymize Code
Here's a simple explanation of how the researchers used machine learning to uncover who authored a piece of code. First, the algorithm they designed identifies all the features found in a selection of code samples. That's a lot of different characteristics. Think of every aspect that exists in natural language: There's the words you choose, which way you put them together, sentence length, and so on. Greenstadt and Caliskan then narrowed the features to only include the ones that actually distinguish developers from each other, trimming the list from hundreds of thousands to around 50 or so...

Plagiarism and Privacy Implications
Caliskan and Greenstadt say their work could be used to tell whether a programming student plagiarized, or whether a developer violated a noncompete clause in their employment contract. Security researchers could potentially use it to help determine who might have created a specific type of malware.
Read more...

Source: WIRED


If you enjoyed this post, make sure you subscribe to my Email Updates!

0 comments: