开发者

Show duplicates in Mathematica

开发者 https://www.devze.com 2022-12-10 01:41 出处:网络
In Mathematica I hav开发者_如何学运维e a list: x = {1,2,3,3,4,5,5,6} How will I make a list with the duplicates? Like:

In Mathematica I hav开发者_如何学运维e a list:

x = {1,2,3,3,4,5,5,6}

How will I make a list with the duplicates? Like:

{3,5}

I have been looking at Lists as Sets, if there is something like Except[] for lists, so I could do:

unique = Union[x]
duplicates = MyExcept[x,unique]

(Of course, if the x would have more than two duplicates - say, {1,2,2,2,3,4,4}, there the output would be {2,2,4}, but additional Union[] would solve this.)

But there wasn't anything like that (if I did understand all the functions there well).

So, how to do that?


Lots of ways to do list extraction like this; here's the first thing that came to my mind:

Part[Select[Tally@x, Part[#, 2] > 1 &], All, 1]

Or, more readably in pieces:

Tally@x
Select[%, Part[#, 2] > 1 &]
Part[%, All, 1]

which gives, respectively,

{{1, 1}, {2, 1}, {3, 2}, {4, 1}, {5, 2}, {6, 1}}
{{3, 2}, {5, 2}}
{3, 5}

Perhaps you can think of a more efficient (in time or code space) way :)

By the way, if the list is unsorted then you need run Sort on it first before this will work.


Here's a way to do it in a single pass through the list:

collectDups[l_] := Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

For example:

collectDups[{1, 1, 6, 1, 3, 4, 4, 5, 4, 4, 2, 2}] --> {1, 1, 4, 4, 4, 2}

If you want the list of unique duplicates -- {1, 4, 2} -- then wrap the above in DeleteDuplicates, which is another single pass through the list (Union is less efficient as it also sorts the result).

collectDups[l_] := 
  DeleteDuplicates@Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

Will Robertson's solution is probably better just because it's more straightforward, but I think if you wanted to eek out more speed, this should win. But if you cared about that, you wouldn't be programming in Mathematica! :)


Here are several faster variations of the Tally method.

f4 uses "tricks" given by Carl Woll and Oliver Ruebenkoenig on MathGroup.

f2 = Tally@# /. {{_, 1} :> Sequence[], {a_, _} :> a} &;

f3 = Pick[#, Unitize[#2 - 1], 1] & @@ Transpose@Tally@# &;

f4 = # ~Extract~ SparseArray[Unitize[#2 - 1]]["NonzeroPositions"] & @@ Transpose@Tally@# &;

Speed comparison (f1 included for reference)

a = RandomInteger[100000, 25000];

f1 = Part[Select[Tally@#, Part[#, 2] > 1 &], All, 1] &;

First@Timing@Do[#@a, {50}] & /@ {f1, f2, f3, f4, Tally}

SameQ @@ (#@a &) /@ {f1, f2, f3, f4}

Out[]= {3.188, 1.296, 0.719, 0.375, 0.36}

Out[]= True

It is amazing to me that f4 has almost no overhead relative to a pure Tally!


Using a solution like dreeves, but only returning a single instance of each duplicated element, is a bit on the tricky side. One way of doing it is as follows:

collectDups1[l_] :=
  Module[{i, j},
    i[n_] := (i[n] := j[n]; Unevaluated@Sequence[]);
    j[n_] := (j[n] = Unevaluated@Sequence[]; n);
    i /@ l];

This doesn't precisely match the output produced by Will Robertson's (IMO superior) solution, because elements will appear in the returned list in the order that it can be determined that they're duplicates. I'm not sure if it really can be done in a single pass, all the ways I can think of involve, in effect, at least two passes, although one might only be over the duplicated elements.


Here is a version of Robertson's answer that uses 100% "postfix notation" for function calls.

identifyDuplicates[list_List, test_:SameQ] :=
 list //
    Tally[#, test] & //
   Select[#, #[[2]] > 1 &] & //
  Map[#[[1]] &, #] &

Mathematica's // is similar to the dot for method calls in other languages. For instance, if this were written in C# / LINQ style, it would resemble

list.Tally(test).Where(x => x[2] > 1).Select(x => x[1])

Note that C#'s Where is like MMA's Select, and C#'s Select is like MMA's Map.

EDIT: added optional test function argument, defaulting to SameQ.

EDIT: here is a version that addresses my comment below & reports all the equivalents in a group given a projector function that produces a value such that elements of the list are considered equivalent if the value is equal. This essentially finds equivalence classes longer than a given size:

reportDuplicateClusters[list_List, projector_: (# &), 
  minimumClusterSize_: 2] :=
 GatherBy[list, projector] //
  Select[#, Length@# >= minimumClusterSize &] &

Here is a sample that checks pairs of integers on their first elements, considering two pairs equivalent if their first elements are equal

reportDuplicateClusters[RandomInteger[10, {10, 2}], #[[1]] &]


This thread seems old, but I've had to solve this myself.

This is kind of crude, but does this do it?

Union[Select[Table[If[tt[[n]] == tt[[n + 1]], tt[[n]], ""], {n, Length[tt] - 1}], IntegerQ]]


Given a list A,
get the non-duplicate values in B
B = DeleteDuplicates[A]
get the duplicate values in C
C = Complement[A,B]
get the non-duplicate values from the duplicate list in D
D = DeleteDuplicates[C]

So for your example:
A = 1, 2, 2, 2, 3, 4, 4
B = 1, 2, 3, 4
C = 2, 2, 4
D = 2, 4

so your answer would be DeleteDuplicates[Complement[x,DeleteDuplicates[x]]] where x is your list. I don't know mathematica, so the syntax may or may not be perfect here. Just going by the docs on the page you linked to.


Another short possibility is

Last /@ Select[Gather[x], Length[#] > 1 &]
0

精彩评论

暂无评论...
验证码 换一张
取 消