I've read conflicting information as to whether or not the WADLogsTable table used by the DiagnosticMonitor in Windows Azure will automatically prune old log entries.
I'm guessing it doesn't,开发者_运维问答 and will instead grow forever - costing me money. :)
If that's the case, does anybody have a good code sample as to how to clear out old log entries from this table manually? Perhaps based on timestamp? I'd run this code from a worker role periodically.
The data in tables created by Windows Azure Diagnostics isn't deleted automatically.
However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.
PS D:\> help Clear-WindowsAzureLog
NAME Clear-WindowsAzureLog
SYNOPSIS Removes Windows Azure trace log data from a storage account.
SYNTAX Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD evelopmentStorage] [-StorageAccountCredentials ] []
Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>] [-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc ountAndKey>] [<CommonParameters>]
You need to specify -ToUtc parameter, and all logs before that date will be deleted.
If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.
Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.
Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).
public static void TruncateDiagnostics(CloudStorageAccount storageAccount,
DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");
var query = new TableQuery();
var dt = startDateTime;
while (true)
{
dt = stepFunction(dt);
if (dt>finishDateTime)
break;
var l = dt.Ticks;
string partitionKey = "0" + l;
query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
query.Select(new string[] {});
var items = cloudTable.ExecuteQuery(query).ToList();
const int chunkSize = 200;
var chunkedList = new List<List<DynamicTableEntity>>();
int index = 0;
while (index < items.Count)
{
var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
chunkedList.Add(items.GetRange(index, count));
index += chunkSize;
}
foreach (var chunk in chunkedList)
{
var batches = new Dictionary<string, TableBatchOperation>();
foreach (var entity in chunk)
{
var tableOperation = TableOperation.Delete(entity);
if (batches.ContainsKey(entity.PartitionKey))
batches[entity.PartitionKey].Add(tableOperation);
else
batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
}
foreach (var batch in batches.Values)
cloudTable.ExecuteBatch(batch);
}
}
}
You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan. http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx
Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)
It does use a table scan to get the data but if run say once per day would not be too painful:
/// <summary>
/// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
/// </summary>
/// <param name="storageAccount"></param>
/// <param name="keepThreshold"></param>
public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
{
try
{
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");
TableQuery query = new TableQuery();
query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
var items = cloudTable.ExecuteQuery(query).ToList();
Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
foreach (var entity in items)
{
TableOperation tableOperation = TableOperation.Delete(entity);
if (!batches.ContainsKey(entity.PartitionKey))
{
batches.Add(entity.PartitionKey, new TableBatchOperation());
}
batches[entity.PartitionKey].Add(tableOperation);
}
foreach (var batch in batches.Values)
{
cloudTable.ExecuteBatch(batch);
}
}
catch (Exception ex)
{
Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
}
}
Here's my slightly different version of @Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.
static class LogHelper
{
/// <summary>
/// Periodically run a cleanup task for log data, asynchronously
/// </summary>
public static async void TruncateDiagnosticsAsync()
{
while ( true )
{
try
{
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );
// keep a weeks worth of logs
DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );
// do this until we run out of items
while ( true )
{
TableQuery query = new TableQuery();
query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
var items = cloudTable.ExecuteQuery( query ).Take( 1000 );
if ( items.Count() == 0 )
break;
Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
foreach ( var entity in items )
{
TableOperation tableOperation = TableOperation.Delete( entity );
// need a new batch?
if ( !batches.ContainsKey( entity.PartitionKey ) )
batches.Add( entity.PartitionKey, new TableBatchOperation() );
// can have only 100 per batch
if ( batches[entity.PartitionKey].Count < 100)
batches[entity.PartitionKey].Add( tableOperation );
}
// execute!
foreach ( var batch in batches.Values )
await cloudTable.ExecuteBatchAsync( batch );
Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
}
}
catch ( Exception ex )
{
Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
}
// run this once per day
await Task.Delay( TimeSpan.FromDays( 1 ) );
}
}
}
To start the process, just call this from the OnStart method in your worker role.
// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();
If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.
Slightly updated Chriseyre2000's code:
using ExecuteQuerySegmented instead of ExecuteQuery
observing TableBatchOperation limit of 100 operations
purging all Azure tables
public static void TruncateAllAzureTables(CloudStorageAccount storageAccount, DateTime keepThreshold) { TruncateAzureTable(storageAccount, "WADLogsTable", keepThreshold); TruncateAzureTable(storageAccount, "WADCrashDump", keepThreshold); TruncateAzureTable(storageAccount, "WADDiagnosticInfrastructureLogsTable", keepThreshold); TruncateAzureTable(storageAccount, "WADPerformanceCountersTable", keepThreshold); TruncateAzureTable(storageAccount, "WADWindowsEventLogsTable", keepThreshold); } public static void TruncateAzureTable(CloudStorageAccount storageAccount, string aTableName, DateTime keepThreshold) { const int maxOperationsInBatch = 100; var tableClient = storageAccount.CreateCloudTableClient(); var cloudTable = tableClient.GetTableReference(aTableName); var query = new TableQuery { FilterString = $"Timestamp lt datetime'{keepThreshold:yyyy-MM-ddTHH:mm:ss}'" }; TableContinuationToken continuationToken = null; do { var queryResult = cloudTable.ExecuteQuerySegmented(query, continuationToken); continuationToken = queryResult.ContinuationToken; var items = queryResult.ToList(); var batches = new Dictionary<string, List<TableBatchOperation>>(); foreach (var entity in items) { var tableOperation = TableOperation.Delete(entity); if (!batches.TryGetValue(entity.PartitionKey, out var batchOperationList)) { batchOperationList = new List<TableBatchOperation>(); batches.Add(entity.PartitionKey, batchOperationList); } var batchOperation = batchOperationList.FirstOrDefault(bo => bo.Count < maxOperationsInBatch); if (batchOperation == null) { batchOperation = new TableBatchOperation(); batchOperationList.Add(batchOperation); } batchOperation.Add(tableOperation); } foreach (var batch in batches.Values.SelectMany(l => l)) { cloudTable.ExecuteBatch(batch); } } while (continuationToken != null); }
精彩评论