choward

Digg: Dupe Detection Updates Are Here

Hi all,

After much anticipation, we are finally releasing several major updates to our dupe detection technology and content submission process that should go a long way in eliminating duplicate submissions.

To better understand the nature of the problem, we analyzed the types of duplicate stories being submitted. Most common are the same stories from the same site, but with different URLs. Our R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. Look for a separate tech blog post on how this works, but it has proven to be a reliable way of identifying identical content from the same source.

Another common type of duplicate is the same (or similar) story covered on different sites. Because this enters more subjective territory, we focused on doing a better job at detecting dupes with similar descriptive information. By leveraging Digg’s improved search technology, released a couple months back, we now match stories with similar titles and descriptions with much higher accuracy than before.

Most importantly, we made changes to an often cumbersome submission process. We moved the duplicate check immediately after the URL entry, *before* we ask for descriptive information. This eliminates the need to describe your submission before checking for dupes. In addition, the lag time from when a story is submitted to when it’s available in our duplicate checker is now a few seconds. These changes may take some time to adjust to but we anticipate that they’ll help to eliminate dupes and encourage folks to Digg previously submitted content.

While we pilot the new dupe detection system, we will continue to only block submissions of the exact same URLs within a 30-day period.  We’ll also be monitoring when certain Diggers choose to bypass high-confidence duplicates and will use this data to continue to improve the process going forward. As always, please share any feedback you have on these updates.

Cheers,
Chris

Daniel Burka

Some (small but important) Digg Updates

We’ve just made a few small but significant changes to Digg. For the past few years, all of the content on Digg has been licensed as public domain. Comments, story titles, story descriptions, and all of the other user-contributed content on the Digg site are explicitly put into the public domain so that others can do great things with them. This is good for the internet and good for society.

As of today, we’ve taken that one step further by upgrading our public domain waiver to Creative Commons Zero (CC0). The CC0 waiver expresses that content posted on Digg is public domain even internationally. A minor point maybe, but our previous public domain dedication was only clear within the USA. When a friend from Creative Commons suggested that we move to a CC0 waiver, to even more clearly affirm our intentions, it seemed obvious. And, as we try to always do when we change something that affects the content that you (our users) submit to Digg, we’re trying to keep you informed about it.

To reflect this change, we’ve updated the language of our Terms of Use agreement. See Section #6 of the TOU to review the new wording. The notice in the footer on every page of Digg has also been updated.

Go forth and keep doing all of the wonderful commenting, submitting, and voting – even more so within the public domain than you were before.

Daniel

Jen Burton

Attention Ladies and Gentlemen: shouts have left the building

Hey all –

We’ve been working on adding new ways for you to share the content you find on Digg – Facebook Connect launched earlier this month, and in April we added the ability to share stories via Facebook and Twitter directly from the DiggBar.

Starting today you’ll notice a few more changes to sharing options on Digg. We’ve listened to your feedback, crunched some user data, and decided to remove shouts. As some of you know, shouts have been a controversial feature since their inception and considering the ever-changing landscape of the social web, we’ve elected to remove them in favor of more popular options. We’ve added easier access to sharing via email, Facebook and Twitter. As always, we want to encourage sharing and communication within our community and will continue to look into features that address these needs.

On the homepage you can now mouse over or click on the “share” link to open a dialog box that offers sharing via email, Facebook or Twitter. For example, if you click on the email icon, we’ll open a new mail message from your default email client and all you’ll need to do is enter email addresses.  If you click on the Twitter icon, we’ll open Twitter in a browser window and populate the update field with the story title and URL (note that you’ll need to be logged in to Twitter at the time).

On the story list pages, you’ll see those icons directly under the story description (no need to click on “Share”).

A few of you may also notice that we removed the ‘blog this’ feature, which had really low usage. We think these changes better reflect how folks want to share content, and while we understand that some folks will miss the shout feature, we hope that you’ll give these new options a try.

As always, we’ll continue to iterate on features based on your feedback.

Have a good one -

Jen

Daniel Burka

Digg Laptop and iPhone Skins

Our friends at Infectious have manufactured some sweet Digg-themed vinyl skins for all kinds of laptops and iPhones. There are two designs for each – a simple one with a single logo and a skin with scattered logos all over the place.

Infectious makes really great, high-quality products and these skins are no exception. They’re tough enough to protect your laptop or phone and they’re easy to apply. But they’re easy to remove too, so should you ever get tired of the design (though I know I won’t!), you can peel it off without leaving any messy residue behind.

If you want to see one of these skins in the wild, we’ll be giving some away at our next Digg Meetup in Seattle on May 21st. If you’re in the area we’d love to see you there!

Anton Kast

Update on Digg’s Promotional Algorithm

Hi all,

We’re constantly tweaking the Digg algorithm to ensure that a unique and diverse set of Diggs drives the content on the homepage.  We’ve made a few notable enhancements to our promotional algorithm recently, to ensure that all Diggers have a fair chance at getting their submitted stories promoted to the homepage. We felt it was important to call these out.

In addition to ongoing tweaks, we’ve taken steps to prevent abusive Digging behavior in an effort to improve the quality of content that is promoted to the homepage. Only a very small portion of the Digg community will experience these limits, well under 1%, and the vast majority of folks won’t be affected by them at all.

There may be a few subtle promotion dynamics as we release this logic into the wild, but fear not, our team is making adjustments real time. As always, let us know what you think and keep the feedback coming.

Thanks,
Anton

PS: For an insider peek into how the algorithm works, check out Digg’s Community Manager Jen Burton, as she provides investigative insight on the inner-workings of the infamous Digg Algorithm. Special thanks to Hammer in his cameo role.

Anton Kast

Dupes & ongoing updates to Digg’s promotional algorithm

Hi everyone,

We wanted to address some complaints about weaknesses in our duplicate detection mechanism and provide some insights on upcoming changes to the Digg promotional algorithm. I head up Research and Development at Digg and my team is responsible for many of the advances in the promotional algorithm and the logic that powers features like the Recommendation Engine and Search.

Duplicate submissions have been an ongoing issue, and we are working on several new tools that will help address this. Improvements in duplication detection are underway and expected soon. We’re also working on a new system that will, among other things, allow us to track users who abusively submit duplicate content. While we haven’t fully hammered out all the details, the tool will likely include warnings and limits on duplicate submissions.

Another area of recent community debate has focused on home page diversity, and the concentration of certain popular submitters. Our goal is to give each person a fair chance at getting his or her submission promoted to the home page. Digg’s promotional algorithm aims to ensure that the most popular content Dugg by a diverse, unique group of diggers reaches the home page. Since Digg began over four years ago, we’ve been making ongoing tweaks to the promotional algorithm. We spend a lot of time analyzing the data and improving the system. While most of these changes go unnoticed, we will be testing different approaches to increase submitter diversity in the upcoming months.

We are also developing new features and Digg experiences that will encourage participation and discovery of content outside of the home page proper. These changes will contribute to a much broader platform for the Digg community to share and discuss stories.

We are always testing and refining, and welcome your thoughts and feedback.

Thanks,
Anton

Jen Burton

New Twitter feeds!

Hey all –

We’re super stoked to announce that we’ve created Twitter feeds across several Digg topics, so you can get popular stories delivered right to Twitter. This provides another way to customize the types of stories you receive from Digg. The Twitter feeds will update as stories become popular, so when, say, a new story about green technology gets promoted to the Digg homepage, the Digg environment Twitter feed will post to your stream. These feeds were created using the Twitter API and are kept active by the likes of you.

We’ve created several official Digg news feeds, so hopefully there’s something for every Twitter enthusiast:

Digg Homepage updates as every new story is promoted

Digg 2000 : Stories that reach 2,000 or more Diggs

Technology: Popular stories from Digg.com/technology

Apple: Popular stories from Digg.com/apple

Software: Popular stories from Digg.com/software

World & Business: Popular stories from Digg.com/world_business

Political News: Popular stories from Digg.com/politics

Science: Popular stories from Digg.com/science

Environment: Popular stories from Digg.com/environment

Offbeat: Popular stories from Digg.com/offbeat

Gaming: Popular stories from Digg.com/gaming

Images: Popular images from Digg.com/images

Video: Popular videos from Digg.com/video

Entertainment: Popular stories from Digg.com/entertainment

Sports: Popular stories from Digg.com/sports

As always, let us know what you think. Of course, don’t forget to check out Digg updates on Twitter for other news and announcements!

Have a good one,
Jen

Jen Burton

Digg: Update on Script Abuse

Hey all –

Digg enforces its Terms of Use so that Digg remains a vibrant community of people committed to sharing and discovering great content. Everyone who uses Digg agrees to abide by the TOU, which maintains a positive experience on Digg for all community members by prohibiting spam, porn, gaming, hate speech etc.

In many cases, Digg gives users who violate the TOU a second and sometimes even a third chance. When people continue to violate the TOU, or where a first-time violation is egregious, Digg is reluctantly left with no option but to ban the user.

A couple weeks ago we posted a blog entry regarding script usage, reminding members of the community that it violates the Digg TOU. We also rolled out some changes that warned users when unnatural Digging activity was detected. Since this post, we analyzed our logs on a regular basis to clearly identify script use over an extended period of time.

While we never speak to specific instances of user bans to protect the privacy of individual users, we have banned a small number of users for script use over the past several weeks. Some of them are active users that are well known within the Digg Community. While we’ll sincerely miss the contributions of these individuals and are never happy about playing policeman, we believe that the larger Digg community is adversely impacted by people who choose to violate the TOU.

Please don’t hesitate to contact us at support@digg.com with questions or feedback. We’re continuously researching and investigating this process, so don’t be shy and let us know what you think.

Have a good one,
Jen

Jen Burton

How Digg REALLY works

Hey all,

We’re growing up so fast…sort of. Today we launched two new official Digg Blogs: Community (this one) and Technology. We’ll still use our main blog for big news and other important announcements, but because we love you and you love us, we thought we’d add a few new sections to keep you in the know.

We leave you with a little glimpse of some of the people here at the Digg offices. We have crooning QA Managers, Product Managers with crabs and a little secret about what really powers the Digg Recommendation Engine. We are calling this video series ‘Safe for Work.’ Hope you like the first installment.

Keep an eye on this space, more to come!

- Jen