Do you think that using a MongoDB Json Database to store log files from application is a good idea and why ?
The only advantage for me is the schema abstraction, but i think it's also a weakness we cannot ensure the integri开发者_如何转开发ty of a log file.
Obviously I'm biased (I work on MongoDB) but I think it works very well for logs.
Reasons:
- It's fast for inserts and updates... you can do thousands per second
- As well as normal queries, you can run analytics and generate reports using JavaScript. You could have a cron job running nightly which does nice MapReduce things to your logs.
- You can use capped collections, which are collection that act like queues, to keep only the latest N KBs/MBs/GBs of logs
I'm not sure what you mean "ensure the integrity of a log file"... do you mean you are worried about not knowing what fields the document you're pulling out has? If so, I think you'll find it's no harder dealing with null fields in a relational database and much more flexible.
See also: the MongoDB blog post on logging.
I'm using MongoDB to store logs from many applications and it's working out very well so far.
You might want to take a look at the slides from a presentation on Logging Application Behavior to MongoDB that I gave at Mongo SV and at the last MongoDB SF Meetup for more background on why I think it is good for logging, as well as for info on libraries for Java, Python, Ruby, PHP and C# that support logging to MongoDB.
I'm now the main committer on log4mongo-java, Log4J appenders for MongoDB. So, it's probably not too surprising that that's what I'm using.
With respect to log integrity, I assume you mean confidence that it hasn't been modified after it was written. One option you have, at least with log4mongo-java, it to store logging events in a database that requires authentication. That would limit to some degree the number of users who could add, delete or update events.
In addition, you could set up a replication slave that is tightly locked down. Frequent backups of the slave would at least limit the time during which the set of logged events could be modified.
精彩评论