开发者

Optimize code: Linq and foreach loop 15k records

开发者 https://www.devze.com 2023-01-14 19:52 出处:网络
this is my code void fixInstellingenTabel(object source, ElapsedEventArgs e) { NASDataContext _db = new NASDataContext(开发者_开发技巧);

this is my code

void fixInstellingenTabel(object source, ElapsedEventArgs e)
{
    NASDataContext _db = new NASDataContext(开发者_开发技巧);

    List<Instellingen> newOnes = new List<Instellingen>();

    List<InstellingGegeven> li = _db.InstellingGegevens.ToList();
    foreach (InstellingGegeven i in li) {
        if (_db.Instellingens.Count(q => q.INST_LOC_REF == i.INST_LOC_REF && q.INST_LOCNR == i.INST_LOCNR && q.INST_REF == i.INST_REF && q.INST_TYPE == i.INST_TYPE) <= 0) {
            // There is no item yet. Create one.
            Instellingen newInst = new Instellingen();
            newInst.INST_LOC_REF = i.INST_LOC_REF;
            newInst.INST_LOCNR = i.INST_LOCNR;
            newInst.INST_REF = i.INST_REF;
            newInst.INST_TYPE = i.INST_TYPE;
            newInst.Opt_KalStandaard = false;
            newOnes.Add(newInst);
        }
    }
    _db.Instellingens.InsertAllOnSubmit(newOnes);
    _db.SubmitChanges();
}

basically, the InstellingGegevens table gest filled in by some procedure from another server. the thing i then need to do is check if there are new records in this table, and fill in the new ones in Instellingens.

this code runs for like 4 minutes on 15k records. how do I optimize it? or is the only way a Stored Procedure?

this code runs in a timer, running every 6h. IF a stored procedure is best, how to I use that in a timer?

        Timer Tim = new Timer(21600000); //6u
        Tim.Elapsed += new ElapsedEventHandler(fixInstellingenTabel);
        Tim.Start();


Doing this in a stored procedure would be a lot faster. We do something quite similar, only there is about 100k items in the table, it's updated every five minutes, and has a lot more fields in it. Our job takes about two minutes to run, and then it does updates in several tables across three databases, so your job would reasonably take only a couple of seconds.

The query you need would just be something like:

create procedure UpdateInstellingens as

insert into Instellingens (
  INST_LOC_REF, INST_LOCNR, INST_REF, INST_TYPE, Opt_KalStandaard
)
select q.INST_LOC_REF, q.INST_LOCNR, q.INST_REF, q.INST_TYPE, cast(0 as bit)
from InstellingGeven q
left join Instellingens i
  on q.INST_LOC_REF = i.INST_LOC_REF and q.INST_LOCNR = i.INST_LOCNR
  and q.INST_REF = i.INST_REF and q.INST_TYPE = i.INST_TYPE
where i.INST_LOC_REF is null

You can run the procedure from a job in the SQL server, without involving any application at all, or you can use ADO.NET to execute the procedure from your timer.


One way you could optimise this is by changing the Count(...) <= 0 into Any(). However, an even better optimisation would be to retrieve this information in a single query outside the loop:

var instellingens = _db.Instellingens
    .Select(q => new { q.INST_LOC_REF, q.INST_LOCNR, q.INST_REF, q.INST_TYPE })
    .Distinct()
    .ToDictionary(q => q, q => true);

(On second thought, a HashSet would be most appropriate here, but there is unfortunately no ToHashSet() extension method. You can write one of your own if you like!)

And then inside your loop:

if (instellingens.ContainsKey(new { q.INST_LOC_REF, q.INST_LOCNR,
                                    q.INST_REF, q.INST_TYPE })) {
    // There is no item yet. Create one.
    // ...
}

Then you can optimise the loop itself by making it lazy-retrieve:

// No need for the List<InstellingGegeven>
foreach (InstellingGegeven i in _db.InstellingGegevens) {
    // ...
}


What Guffa said, but using Linq here is not the best course if performance is what you are after. Linq, like every other ORM, sacrifices performance for usability. Which is usually a great tradeoff for typical application execution paths. On the other hand, SQL is very, very good at set based operations so that really is the way to fly here.

0

精彩评论

暂无评论...
验证码 换一张
取 消