开发者

Does anyone know what was/is used as the DBMS for the infamous NSA call database?

开发者 https://www.devze.com 2022-12-22 10:29 出处:网络
Another question on SO suddenly got me wondering what the largest database in the world is (and how big it could be). A quick Google search turned up this: the NSA call database, created by the U.S. N

Another question on SO suddenly got me wondering what the largest database in the world is (and how big it could be). A quick Google search turned up this: the NSA call database, created by the U.S. National Security Agency. Supposedly this database contains over 1.9 trillion records containing details relating to phone calls placed through AT&T and Verizon from as far back as 2001.

Does anyone have any idea what kind of DB system was used for this database? 1.9 trillion records seems to me like a lot more than even your typical large-scale commercial databases would have. But maybe I'm wrong. I a开发者_StackOverflow社区lso didn't research this extensively by any means, so perhaps the claim that the NSA call database is the biggest in the world is flat-out false.

Still, I'm interested to know what kind of DBMS, if any, could reasonably deal with this many records.


1.9 trillion rows multiplied by, say, 8000 bytes/row is, ummm, 15 petabytes? Did I do that arithmetic right? That's just one order of magnitude bigger than several well-known business databases. Googling "petabyte databases" gave me

  • ebay: one 2+ petabyte data warehouse and one 6+ petabyte data warehouse (2009)
  • facebook: 2+ petabyte data warehouse (2010)
  • Walmart: 2+ petabyte data warehouse (2010)
  • Bank of America: 1+ petabyte data warehouse (2010)
  • Dell: 1+ petabyte data warehouse (2010)

1.9 trillion rows are easily (cough) row-addressable in the range of a 64-bit unsigned int.

Physicists and astronomers seem to have the biggest targets. Stanford needs to manage about 155 petabytes of data for their Large Synoptic Survey Telescope. An astronomy project down the street from me generates about 10 petabytes a day, but they don't store nearly that much.

Heck, I almost forgot the point of the question. Greenplum and Teradata showed up the most often. But I don't think anybody who knows what the NSA actually uses will talk about it.

@Tomislav Nakic-Alfirevic: An awk program to print every 1000th line:

NR % 1000 == 0 {print $0}

Do you think the NSA would pay me for that? My house needs a new roof.

0

精彩评论

暂无评论...
验证码 换一张
取 消