large-data
Clustering of 10's of millions of high dimensional data
I have a set of 50 million text snippet开发者_高级运维s and I would like to create some clusters out of them. The dimensionality might be somewhere between 60k-100k. The average text snippet length wo[详细]
2023-04-05 03:26 分类:问答150M records order by name
I have a dataset of around 开发者_开发知识库150 million records that\'s generated daily it contains:[详细]
2023-04-05 01:13 分类:问答RDBMS for extremely large data sets - what are people using?
I have to perform some serious data mining on very large data sets stored in MySQL db. However, queries that require a bit more than a basic SELECT * FROM X WH开发者_C百科ERE ... tend to become rather[详细]
2023-04-02 05:42 分类:问答java: very large trees?
The objective is to build very large trees. By very large I mean hun开发者_StackOverflow中文版dreds of millions of nodes, fitting in a few gigabytes.[详细]
2023-04-01 23:33 分类:问答INSERT IGNORE or INSERT WHERE NOT IN
I have a 9 million rows table and I\'m struggling to handle all this data because of its sheer size. What I want to do is add IMPORT a CSV to the table without overwriting data.[详细]
2023-04-01 10:56 分类:问答Hibernate Stored Procedure invocation leads to OutOfMemory
I am using Hibernate\'s nam开发者_如何学Pythoned Query to execute a stored procedure returning a very large dataset ( over 2 million rows ) The DB is Oracle 11g[详细]
2023-03-28 12:08 分类:问答Python: memory efficient list with list.sort(cmp=myfnc)
What is the best way to improve this code: def my_func(开发者_如何学Pythonx, y): ... do smth ... return cmp(x\',y\')[详细]
2023-03-26 13:04 分类:问答High-performance multi-tier tag filtering
I have a large database of artists, albums, and tracks. Each of these items may have one or more tags assigned via glue tables (track_attributes, album_attributes, artist_attributes). There are severa[详细]
2023-03-26 00:14 分类:问答Database design - large fields
Let\'s say, for example, that I have a list of articles in a blog. Each article has one image, each image has one thumbnail.[详细]
2023-03-25 09:26 分类:问答Is there any way to load result query in memory?
I have a huge database (2.1 billions row) and I need to perform some calculation to extract some statistical results. To my understanding, it\'s obvious that it is not wise to perform the calculation[详细]
2023-03-24 07:17 分类:问答