CSC News

October 05, 2015

Menzies Makes a PROMISE He Doesn’t Plan to Keep, Not All to Himself Anyway.

The slithery pitch of the snake oil salesman always rings true – if it sounds too good to be true, it probably is. Thanks to today’s 24-hour accessibility to information, it can be rather hard to sell someone a “cure-all.” In order to make sure something is legitimate today, the information has got to be online, easily accessible and current. Even before the advent of online information, great discoveries were made through the sharing of information, ideas, and research.  Without information sharing, where would we be?
That is the question Dr. Tim Menzies, NC State professor of computer science, asked himself and a colleague almost a decade ago, when he realized that in software engineering, computer data mining was almost a rarity.  In comparison to other professions that shared data openly, the professors discovered that of 154 sets of data found, only four percent of those sets were online as a resource to fellow computer scientists.
This premise of the different mentality of sharing data in software engineering led to the publication “On the Shoulder of Giants.” Menzies, along with fellow professors Dr. Earl Barr, Dr. Christian Bird, Dr. Eric Hyatt and Dr. Gregorio Robles, discussed the longstanding hurdles facing data sharing in the industry – the fear of being scooped and the time and energy taken to record the data, in the hopes of discovering ways to lessen the risk of sharing in favor of the benefits to the field itself.
“I’m just an old hippie at heart and my philosophy is to share everything,” Menzies said with a chuckle. “If you think about it, the lack of data sharing is pretty tragic. Just think where we could be if we all had a different mindset.”
Menzies cited other sciences for their innovative ways to share, yet still garner credit for work well done. Astronomy, for example, allows the author to “own” one’s own work for six months, but after that time period, you are required to share it with the public. Even a quick Google search of the professional body Science 2.0 shows 5,954,000 results from all sorts of disciplines.
“Even though there is great evidence of data sharing from other disciplines, it doesn’t mean that everyone wants to play by the rules,” he said.  “The Hubble telescope actually caused a big problem by trying to hold onto its data longer than it was allowed.”
Different theories abound as to why data mining is missing in computer science. Some blame the speed in which the discipline changes, rendering old data irrelevant before it is worthwhile to post. Others say that the more models shared, the more variances increase. Regardless of the mindsets, Menzies and his colleagues set out to change things with the creation of the PROMISE data repository that started as a grassroots effort 10 years ago and is growing by leaps and bounds today, thanks to support from NC State’s computer science department and its upgrade to terabyte size.
Specializing in software engineering datasets, it offers free and long-term storage for research artifacts and membership in the valuable repository is by committee invitation.  The “cost” of membership is the use of one graduate student who can devote time each month to the maintenance of the site.  Seems like a mighty small price to pay for the value of sharing and storing one’s hard-earned work.
“Sharing is a good thing,” Menzies said. “We can’t convince people to change their mentality of not sharing from the past, we’ve got to lead from the front. A case in point is our students that are working on PROMISE have been traveling all over the world and have been sharing their work on the repository which is raising interest in participation and changing minds about the need for sharing data.”
Plans are for the site to continue to grow, as Menzies actively looks for corporate sponsors for an upgrade to a petabyte, so the information can continue to be accessible to the computer science community.
As a passionate advocate of sharing lessons learned, Menzies thinks the future of data storage for computer science is essential.
“It’s not like when Newton was hit on the head that he just blurted out F = ma,” he said. “There is a continuing molding and revision of our ideas and if our community recognizes the efforts and we become more data centric, we have more evidence to back up our discussions and conclusions. It’s like the philosophy of the snake oil salesman, with the data at our disposal, he can’t sell us the snake oil anymore because now you can look it up and see what is in that snake oil before you buy it.”
For more information on the PROMISE data repository and how to donate data, please visit

Return To News Homepage