Efficiently merge lists chronologically without duplicates?Selecting a list from a list of lists
PhD student with mental health issues and bad performance
Why did a party with more votes get fewer seats in the 2019 European Parliament election in Denmark?
You've spoiled/damaged the card
Are the AT-AT's from "Empire Strikes Back" a deliberate reference to Mecha?
How do I calculate APR from monthly instalments?
Funtion to extract float from different price patterns
Can you please explain this joke: "I'm going bananas is what I tell my bananas before I leave the house"?
How do photons get into the eyes?
Why don't B747s start takeoffs with full throttle?
How can I instantiate a lambda closure type in C++11/14?
C SIGINT signal in Linux
Adding two lambda-functions in C++
Calling GPL'ed socket server inside Docker?
What are the words for people who cause trouble believing they know better?
How to pass a regex when finding a directory path in bash?
Company did not petition for visa in a timely manner. Is asking me to work from overseas, but wants me to take a paycut
Java guess the number
Is it legal in the UK for politicians to lie to the public for political gain?
How bad would a partial hash leak be, realistically?
Pronoun introduced before its antecedent
What happened to all the nuclear material being smuggled after the fall of the USSR?
Traffic law UK, pedestrians
Is there any word or phrase for negative bearing?
Why is the relationship between frequency and pitch exponential?
Efficiently merge lists chronologically without duplicates?
Selecting a list from a list of lists
$begingroup$
Prepare some dummy data to investigate:
alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]
we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?
A tiny example of the above would be:
alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;
so that the function merge returns:
merge[Table[sublists[i],i,1,3]]
a1,a3,a2,a5
Note that merge was not given the actual alist.
EDIT:
Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by
alist = Table[a[i], i, 1, 10000];
This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example
seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length
10984
Repeating the steps seems to consistently return a seq that is longer than 10000, while
seq // DeleteDuplicates // Length
10000
To check that all sublists contain a[i] elements in strictly increasing order, we can do:
FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]
True
The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...
list-manipulation function-construction
$endgroup$
add a comment |
$begingroup$
Prepare some dummy data to investigate:
alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]
we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?
A tiny example of the above would be:
alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;
so that the function merge returns:
merge[Table[sublists[i],i,1,3]]
a1,a3,a2,a5
Note that merge was not given the actual alist.
EDIT:
Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by
alist = Table[a[i], i, 1, 10000];
This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example
seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length
10984
Repeating the steps seems to consistently return a seq that is longer than 10000, while
seq // DeleteDuplicates // Length
10000
To check that all sublists contain a[i] elements in strictly increasing order, we can do:
FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]
True
The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...
list-manipulation function-construction
$endgroup$
$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknownalist)
$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr note that there is aSortafter theRandomSampleof the integers inrange. This picks elements fromalistin an ordered way.
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago
$begingroup$
then..a3, a1, a2,a5is also a solution, right?
$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr that's right, without knowingalistthere are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
$endgroup$
– Kagaratsch
8 hours ago
add a comment |
$begingroup$
Prepare some dummy data to investigate:
alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]
we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?
A tiny example of the above would be:
alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;
so that the function merge returns:
merge[Table[sublists[i],i,1,3]]
a1,a3,a2,a5
Note that merge was not given the actual alist.
EDIT:
Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by
alist = Table[a[i], i, 1, 10000];
This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example
seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length
10984
Repeating the steps seems to consistently return a seq that is longer than 10000, while
seq // DeleteDuplicates // Length
10000
To check that all sublists contain a[i] elements in strictly increasing order, we can do:
FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]
True
The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...
list-manipulation function-construction
$endgroup$
Prepare some dummy data to investigate:
alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]
we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?
A tiny example of the above would be:
alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;
so that the function merge returns:
merge[Table[sublists[i],i,1,3]]
a1,a3,a2,a5
Note that merge was not given the actual alist.
EDIT:
Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by
alist = Table[a[i], i, 1, 10000];
This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example
seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length
10984
Repeating the steps seems to consistently return a seq that is longer than 10000, while
seq // DeleteDuplicates // Length
10000
To check that all sublists contain a[i] elements in strictly increasing order, we can do:
FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]
True
The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...
list-manipulation function-construction
list-manipulation function-construction
edited 7 hours ago
Kagaratsch
asked 9 hours ago
KagaratschKagaratsch
5,16941351
5,16941351
$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknownalist)
$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr note that there is aSortafter theRandomSampleof the integers inrange. This picks elements fromalistin an ordered way.
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago
$begingroup$
then..a3, a1, a2,a5is also a solution, right?
$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr that's right, without knowingalistthere are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
$endgroup$
– Kagaratsch
8 hours ago
add a comment |
$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknownalist)
$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr note that there is aSortafter theRandomSampleof the integers inrange. This picks elements fromalistin an ordered way.
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago
$begingroup$
then..a3, a1, a2,a5is also a solution, right?
$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr that's right, without knowingalistthere are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown
alist)$endgroup$
– kglr
8 hours ago
$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown
alist)$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr note that there is a
Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@kglr note that there is a
Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago
$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago
$begingroup$
then..
a3, a1, a2,a5 is also a solution, right?$endgroup$
– kglr
8 hours ago
$begingroup$
then..
a3, a1, a2,a5 is also a solution, right?$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr that's right, without knowing
alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@kglr that's right, without knowing
alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.$endgroup$
– Kagaratsch
8 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.
First, construct a directed graph with edges defined by neighbors in the sublists:
allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]
Next, look at the GraphDistanceMatrix and count how many times the symbol ∞ appears in each row:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
Next, sort the graph vertices according to this infcount:
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
All together in one function:
merge[L_] :=
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]
example
The given example is
allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)
Construct the directed neighbor graph:
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]

Have a look at the graph distance matrix:
VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)
count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)
sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)
$endgroup$
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
add a comment |
$begingroup$
Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]
a3, a1, a2, a5
Also
Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]
a1, a3, a2, a5
In case this produces an output with duplicates we can use
DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]
$endgroup$
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I applyDeleteDuplicatesto it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping withDeleteDuplicatesfixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to theSortcommand before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f199518%2fefficiently-merge-lists-chronologically-without-duplicates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.
First, construct a directed graph with edges defined by neighbors in the sublists:
allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]
Next, look at the GraphDistanceMatrix and count how many times the symbol ∞ appears in each row:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
Next, sort the graph vertices according to this infcount:
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
All together in one function:
merge[L_] :=
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]
example
The given example is
allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)
Construct the directed neighbor graph:
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]

Have a look at the graph distance matrix:
VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)
count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)
sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)
$endgroup$
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
add a comment |
$begingroup$
I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.
First, construct a directed graph with edges defined by neighbors in the sublists:
allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]
Next, look at the GraphDistanceMatrix and count how many times the symbol ∞ appears in each row:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
Next, sort the graph vertices according to this infcount:
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
All together in one function:
merge[L_] :=
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]
example
The given example is
allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)
Construct the directed neighbor graph:
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]

Have a look at the graph distance matrix:
VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)
count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)
sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)
$endgroup$
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
add a comment |
$begingroup$
I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.
First, construct a directed graph with edges defined by neighbors in the sublists:
allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]
Next, look at the GraphDistanceMatrix and count how many times the symbol ∞ appears in each row:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
Next, sort the graph vertices according to this infcount:
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
All together in one function:
merge[L_] :=
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]
example
The given example is
allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)
Construct the directed neighbor graph:
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]

Have a look at the graph distance matrix:
VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)
count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)
sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)
$endgroup$
I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.
First, construct a directed graph with edges defined by neighbors in the sublists:
allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]
Next, look at the GraphDistanceMatrix and count how many times the symbol ∞ appears in each row:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
Next, sort the graph vertices according to this infcount:
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
All together in one function:
merge[L_] :=
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]
example
The given example is
allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)
Construct the directed neighbor graph:
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]

Have a look at the graph distance matrix:
VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)
count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:
infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)
sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):
alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)
edited 45 mins ago
answered 1 hour ago
RomanRoman
9,84511640
9,84511640
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
add a comment |
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago
add a comment |
$begingroup$
Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]
a3, a1, a2, a5
Also
Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]
a1, a3, a2, a5
In case this produces an output with duplicates we can use
DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]
$endgroup$
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I applyDeleteDuplicatesto it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping withDeleteDuplicatesfixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to theSortcommand before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
add a comment |
$begingroup$
Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]
a3, a1, a2, a5
Also
Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]
a1, a3, a2, a5
In case this produces an output with duplicates we can use
DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]
$endgroup$
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I applyDeleteDuplicatesto it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping withDeleteDuplicatesfixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to theSortcommand before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
add a comment |
$begingroup$
Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]
a3, a1, a2, a5
Also
Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]
a1, a3, a2, a5
In case this produces an output with duplicates we can use
DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]
$endgroup$
Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]
a3, a1, a2, a5
Also
Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]
a1, a3, a2, a5
In case this produces an output with duplicates we can use
DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]
edited 8 hours ago
answered 9 hours ago
kglrkglr
195k10216439
195k10216439
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I applyDeleteDuplicatesto it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping withDeleteDuplicatesfixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to theSortcommand before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
add a comment |
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I applyDeleteDuplicatesto it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping withDeleteDuplicatesfixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to theSortcommand before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply
DeleteDuplicates to it, will it give the desired ordering?$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply
DeleteDuplicates to it, will it give the desired ordering?$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with
DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?$endgroup$
– kglr
8 hours ago
$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with
DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?$endgroup$
– kglr
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to the
Sort command before the elements are picked...$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to the
Sort command before the elements are picked...$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
Then it is probably some glitch in the
ExperimentalShortestSupersequence` function.$endgroup$
– kglr
8 hours ago
$begingroup$
Then it is probably some glitch in the
ExperimentalShortestSupersequence` function.$endgroup$
– kglr
8 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f199518%2fefficiently-merge-lists-chronologically-without-duplicates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown
alist)$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr note that there is a
Sortafter theRandomSampleof the integers inrange. This picks elements fromalistin an ordered way.$endgroup$
– Kagaratsch
8 hours ago
$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago
$begingroup$
then..
a3, a1, a2,a5is also a solution, right?$endgroup$
– kglr
8 hours ago
$begingroup$
@kglr that's right, without knowing
alistthere are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.$endgroup$
– Kagaratsch
8 hours ago