Join KDNuggets’ online analytics and data mining community for data mining, predictive modeling, database analysis, and social network analytics research. Subscribe to KDNuggets’ data mining, analytics and big data RSS and read KDNuggets data mining articles such as Applying Euclidean Distance Clustering to Social Network Data. Find data mining webinars and webcasts and follow data mining and analytics forums on KDNuggets. Download sample KDNuggets datasets from data repositories to test software and develop predictive models. KDNuggets has data mining and analysis software, analytics jobs, data mining courses, and online analytics and data mining education. Dr. Gregory Piatetsky-Shapiro built KDNuggets in 1997 and the Knowledge Discovery and Data Mining (KDD) analytics series.
Watch a Dr. Piatetsky-Shapiro interview from Knowledge Discovery and Data Mining (KDD) KDD-07. Dr. Shapiro wrote data mining and knowledge discovery books and teaches analytics and data mining courses. Piatetsky-Shapiro’s analytics and data mining bio summarizes his data science career, data research, and analytics affiliations. Shapiro wrote estimating campaign benefits and modeling lift. Dr. Piatetsky-Shapiro explains KDNuggets, data mining and analytics below.
Question 1: What is KDNuggets, and for how long have you had this website?
Dr. Piatetsky-Shapiro: KDnuggets is a leading site for analytics and data mining, and we have recently featured on CNN and Forbes (among Top Influencers in Big Data). I started publishing Knowledge Discovery Nuggets email back in 1993 as a way to communicate with researchers who attended Knowledge Discovery in Data (KDD-93) workshop, which I organized. In 1994 I created what was then the second website on Knowledge Discovery and Data Mining (at GTE), and when I left GTE in 1997 I choose domain KDnuggets, which stands for Knowledge Discovery Nuggets – short items relevant to the field. KDnuggets covers news, software, jobs, competitions, courses, data, education, webinars, and other things relevant to analytics and data mining. The number of subscribers grew from 50 in 1993 to about 12,000 via email and about 8,000 via other social networks. Also, in 2012 KDnuggets.com had an average of 55,000 unique monthly visitors – all time high. If I knew KDnuggets would be so successful, I would have chosen an easier to spell name! KDnuggets evolves with technology. I now publish news daily, blog style on www.kdnuggets.com/news. KDnuggets also has an RSS feed, a facebook.com/kdnuggets page, and a LinkedIn group. I am probably most active on twitter.com/kdnuggets, which is a great way for me to publish relevant links or items.
Question 2: Does an analytics professional have to subscribe to KDNuggets? Are there specific analytics resources that someone has access to as a subscriber?
Dr. Piatetsky-Shapiro: I hope that analytics professionals do subscribe to KDnuggets. There is too much information, but I view my role as the editor to select what is most interesting and important. Subscription is free and subscribers get KDnuggets News about 2-3 times in the email.
Question 3: What are the two most popular sections of KDNuggets website? Can you provide a link to these and briefly describe each (in 2-3 sentences)?
- KDNuggets Software – a directory of data mining software for classification, clustering, visualization, text mining, and many other tasks
- KDNuggets Jobs – an active jobs board, with jobs from top companies, from Apple and Amazon to Yahoo and Yandex
Question 4: You conduct regular polls of KDNuggets visitors about various analytics issues. How do you decide about which topics on which to conduct surveys? And do you ever summarize poll results for KDNuggets readers? Can you talk about a KDNuggets poll result that has surprised you the most?
Dr. Piatetsky-Shapiro: I try to choose the poll topics that are interesting to me and my readers and have current relevance. For example, latest poll asks Was Target wrong in using analytics to find pregnant women? I always summarize the poll results in the next issue – here is the summary of the previous poll on analytics/data mining salary by region. Your readers may view all past KDNuggets polls.
Question 5: KDNuggets has free datasets that analytics professionals may download and do modeling exercises on. Can you provide links to three of the more popular datasets? And, perhaps, a social media website dataset?
Dr. Piatetsky-Shapiro: The KDnuggets directory of datasets is a great resource for analytics professionals. I helped start first KDD Cup in 1997, when I was the Chair of the KDD (Knowledge Discovery Conference) steering committee, and I am pleased that the KDD Cup gave rise to so many competitions. The KDD Cup datasets are very popular. For data mining students, the UCI Machine learning repository is probably most useful. There are also Big Data datasets, like ones from Netflix Prize, and you also see the Hilary Mason datasets collection. The Infochimps Data Marketplace and Koblenz Network Collection have lots of social media data sets.
Analytics and Social Media Data
Question 6: Is social media data more or less difficult to conduct a predictive model with? Why or why not?
Dr. Piatetsky-Shapiro: Social media is very noisy and the behavior people are trying to predict with it is usually influenced by many other factors. E.g. twitter may be a useful stock for predicting stock prices or movie popularity, but prediction will be better if other factors are included, like economy, other stocks, etc..
Question 7: Can the same analytics techniques (eg. regression, inductive decision tree) applied to traditional datasets (eg. customer database) be used on a social media dataset, say, blog or web log data?
Dr. Piatetsky-Shapiro: Regression and decision trees can be applied to social media if the goal is to generate predictions. However, social networks have many other questions, like which nodes/ people are more influential, how do you measure influence, what are the important communities, etc.. These questions require new algorithms.
Question 8: What new analytics challenges do analytics professionals face due to the rise in popularity of social media?
Dr. Piatetsky-Shapiro: Probably the biggest challenge is to keep up with it! However, if analytics professionals subscribe to KDnuggets (www.kdnuggets.com/news/subscribe.html ) or twitter.com/kdnuggets , the challenge will be easier. Social media is also a great opportunity for new research and new business. 2012 is the year of Big Data, and social media plays a big role, both in generating Big Data, and in helping to understand it.
Thank you: Thank you for taking the time to speak about your online marketing analytics and data mining community, KDNuggets.
Dr. Piatetsky-Shapiro: Thank you for having me.