Thursday, July 7, 2022
SCIENMAG: Latest Science and Health News
No Result
View All Result
  • Login
  • HOME PAGE
  • BIOLOGY
  • CHEMISTRY AND PHYSICS
  • MEDICINE
    • Cancer
    • Infectious Emerging Diseases
  • SPACE
  • TECHNOLOGY
  • CONTACT US
  • HOME PAGE
  • BIOLOGY
  • CHEMISTRY AND PHYSICS
  • MEDICINE
    • Cancer
    • Infectious Emerging Diseases
  • SPACE
  • TECHNOLOGY
  • CONTACT US
No Result
View All Result
Scienmag - Latest science news from science magazine
No Result
View All Result
Home SCIENCE NEWS Technology and Engineering

CAREER Award: Playing catch-up with data storage

June 9, 2022
in Technology and Engineering
0
Share on FacebookShare on Twitter

The total amount of data created, captured, copied and consumed globally in 2020 exceeded 64 trillion gigabytes, and German market research firm Statista projects that by 2025 the total data created could surpass 180 trillion gigabytes. To put that in perspective, with just one gigabyte, you could send 350,000 emails, view 600 web pages and stream 200 songs.

Assistant Professor Farzad Farnoud

Credit: Tom Cogill

The total amount of data created, captured, copied and consumed globally in 2020 exceeded 64 trillion gigabytes, and German market research firm Statista projects that by 2025 the total data created could surpass 180 trillion gigabytes. To put that in perspective, with just one gigabyte, you could send 350,000 emails, view 600 web pages and stream 200 songs.

This data revolution has transformed scientific research, especially in the physical and life sciences, including health care. The problem is there’s only enough storage capacity for about 10% of the data produced globally. New algorithms and architectures for networked data-intensive computing are needed, based on storage, processing and use.

Farzad Farnoud Hassanzadeh, an assistant professor of electrical and computer engineering and computer science at the University of Virginia School of Engineering and Applied Science, has earned a prestigious National Science Foundation CAREER award to meet this need. He will use his $560,000 five-year award to develop new models and data compression algorithms that will make the storage and analysis of large data sequences more efficient and accurate.

The CAREER program, one of the NSF’s most prestigious awards for early-career faculty, recognizes the recipient’s potential for leadership in research and education. Farnoud leads the information processing and storage lab, whose members solve problems at the intersection of information theory, computational biology and machine learning — a research strength of the Charles L. Brown Department of Electrical and Computer Engineering.

“From an information theory perspective, data is just a sequence of symbols, which could be letters, DNA symbols or bytes,” Farnoud said. “The contents of a book are a sequence of letters. Spelling and grammar rules help us anticipate which letters naturally follow to form a word, and which words naturally follow to form a sentence. If you can predict the next word well, you can compress the sequence very well. We can prove this mathematically.”

For short sequences of data, models that predict the data that will probably come next, called probabilistic models, can be very helpful. But the models struggle to find and analyze patterns that emerge in long sequences. In the example of words and sentences in a book, what comes next depends on a small number of previous letters or words, not what appeared two pages before.

“I am interested in patterns that emerge at long ranges,” Farnoud said, referring to a data feature called long-range dependence. If you imagine a sequence, long-range dependence is how far back you need to go to describe the probabilistic characteristics of the symbol you are observing.

“Let’s say the sequence itself is a terabyte of data, or 10 to the 12th power bytes,” Farnoud said. “It is normal to assume that a byte’s probabilistic characteristics depend on maybe the previous 10 bytes. But if long-range dependencies exist, the byte’s probabilistic characteristics may depend on a million bytes before it.”

These long-range dependencies between elements of the sequence are not captured well in current models. Farnoud will apply his CAREER Award to construct probabilistic models that describe long-range dependence accurately and realistically, leveraging the statistical properties of the models to improve tasks like prediction or data compression.

With data sets this large, it would take a super computer to “zip” the file. Determining which patterns are meaningful provides equal insight into which patterns are redundant and can be removed during data compression. “If you want to do data compression effectively at scale, you need to be able to model these long-range dependencies and take advantage of these patterns,” Farnoud said.

In addition to theoretical and scientific advances, Farnoud’s data-compression methods could enable large-scale data storage systems to operate more efficiently, requiring less hardware, computing resources and electrical power.

The same limitations of existing models also give rise to challenges when analyzing data, specifically genomic data, which is generated over billions of years through evolutionary processes. For example, repeats are a prevalent feature of genomes and can be better analyzed by models that can handle long-range dependence. Farnoud will develop better statistical algorithms to analyze these sequences. 

“There are statistical tests that biologists and phylogenomic scientists use, to determine if two organisms are related and how many mutations are needed for an evolutionary event to happen, for certain diseases to develop,” Farnoud said. “Those types of studies would benefit from having these more accurate models and hypothesis testing and prediction tools.”



Tags: awardcareercatchupdataplayingstorage
Share26Tweet16Share4ShareSendShare
  • PAN protein domain

    Scientists discover cancer trigger that could spur targeted drug therapies

    77 shares
    Share 31 Tweet 19
  • COVID-19 fattens up our body’s cells to fuel its viral takeover

    103 shares
    Share 41 Tweet 26
  • Messenger RNA technology shows promise for developing infectious disease therapeutics

    66 shares
    Share 26 Tweet 17
  • New guidelines laid out to standardize swallowing fluoroscopy

    65 shares
    Share 26 Tweet 16
  • Physicists work to shrink microchips with first one-dimensional helium model system

    65 shares
    Share 26 Tweet 16
  • How bilingual brains work: Cross-language interplay and an integrated lexicon

    65 shares
    Share 26 Tweet 16
ADVERTISEMENT

About us

We bring you the latest science news from best research centers and universities around the world. Check our website.

Latest NEWS

COVID-19 fattens up our body’s cells to fuel its viral takeover

Scientists discover cancer trigger that could spur targeted drug therapies

Immune molecules from a llama could provide protection against a vast array of SARS-like viruses including COVID-19, researchers say

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 190 other subscribers

© 2022 Scienmag- Science Magazine: Latest Science News.

No Result
View All Result
  • HOME PAGE
  • BIOLOGY
  • CHEMISTRY AND PHYSICS
  • MEDICINE
    • Cancer
    • Infectious Emerging Diseases
  • SPACE
  • TECHNOLOGY
  • CONTACT US

© 2022 Scienmag- Science Magazine: Latest Science News.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Posting....