Browsing from:

EngNet - Engineering Network

Contact Details:

EngNet - Engineering Network

11121 Carmel Commons Blvd.
Charlotte
NC
28226
United States of America

Tel: +01 704 5413311
Fax: +01 704 9430560

Send Enquiry | Company Information

Google Crunches One Trillion Pieces of Data With Single Click

Google Crunches One Trillion Pieces of Data With Single Click

Product News Monday, August 27, 2012: EngNet - Engineering Network

Google Crunches One Trillion Pieces of Data With Single Click
BY CADE METZ for Wired Magazine.

Original article taken from here

Image: Flickr/BreckenPool

Yes, Google treats its latest data center technologies as the most important of trade secrets. But when these creations get a little older, the company is happy to at least describe them to the rest of the world. Sometimes.

“We try to be as open as possible without losing our competitive advantage,” Urs Hölzle, the Grand Poobah of Google’s data centers, told us earlier this summer, as we discussed the research papers that occasionally emerge from Google providing a peak into its internal infrastructure.

These papers often foreshadow where the rest of the world is going. Nearly a decade ago, Google released two papers that gave rise to Hadoop — one of the world’s most important open source software projects — and a 2010 paper describing a shockingly powerful data analysis tool called Dremel just spawned a new project poised to reinvent Hadoop.

So, there may be a piece of the future in a new Google research paper published earlier this month. It describes a tool capable of processing a trillion pieces of information with a single mouse click. According to the paper, this tool is 10 to 100 times faster than traditional databases that afford similar types of information analysis.

Part of a larger Google data analysis platform called PowerDrill, the tool has been used inside the company since 2008, and it serves as an alternative to Dremel. According to Hölzle, Dremel can run queries on a petabyte of data — aka millions of gigabytes — in about three seconds. The PowerDrill tool — referred to only as a new breed of “column-store” — can’t handle quite as much data, but it can handle a lot. And it’s even faster.

According to the paper, the tool can process 782 billion cells of data in about 30 to 40 seconds — or about 2 seconds per query. Google says that’s “several orders of magnitude” faster than Dremel’s approach.

Like the rest of Google’s sweeping software platform, the tool operates across thousands of servers. But unlike others, it focuses on storing data in server memory, as opposed to on disk. “Dremel is designed to analyze many different datasets,” says Tomer Shiran, one the first employees at MapR, a company at the heart of the movement to duplicate Google’s internal infrastructure, “but this new system is optimized to run in memory, and that means you can achieve really, really low latency.”

In short, the tool provides instant access to the data you need to access the most often. “If you have, say, four datasets that are central to your business,” Shiran says, “this is where you would store them.” The system uses various compression techniques, he says, to pack as much data as possible into memory.

Shiran oversees the new open source project that seeks to clone Dremel. It’s called Drill, not to be confused with PowerDrill. He and MapR have no immediate plans to duplicate the PowerDrill column-store. But we wouldn’t be surprised if they did.

 

Cade Metz is the editor of Wired Enterprise. Got a NEWS TIP related to this story -- or to anything else in the world of big tech? Please e-mail him: cade_metz at wired.com.

Original article taken from here

Engineered Media
 
Engineered Media - Google AdWords Partner | Digital Marketing Agency