Efficiently merge lists chronologically without duplicates?Selecting a list from a list of lists

PhD student with mental health issues and bad performance

Why did a party with more votes get fewer seats in the 2019 European Parliament election in Denmark?

You've spoiled/damaged the card

Are the AT-AT's from "Empire Strikes Back" a deliberate reference to Mecha?

How do I calculate APR from monthly instalments?

Funtion to extract float from different price patterns

Can you please explain this joke: "I'm going bananas is what I tell my bananas before I leave the house"?

How do photons get into the eyes?

Why don't B747s start takeoffs with full throttle?

How can I instantiate a lambda closure type in C++11/14?

C SIGINT signal in Linux

Adding two lambda-functions in C++

Calling GPL'ed socket server inside Docker?

What are the words for people who cause trouble believing they know better?

How to pass a regex when finding a directory path in bash?

Company did not petition for visa in a timely manner. Is asking me to work from overseas, but wants me to take a paycut

Java guess the number

Is it legal in the UK for politicians to lie to the public for political gain?

How bad would a partial hash leak be, realistically?

Pronoun introduced before its antecedent

What happened to all the nuclear material being smuggled after the fall of the USSR?

Traffic law UK, pedestrians

Is there any word or phrase for negative bearing?

Why is the relationship between frequency and pitch exponential?



Efficiently merge lists chronologically without duplicates?


Selecting a list from a list of lists













5












$begingroup$


Prepare some dummy data to investigate:



alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]


we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?



A tiny example of the above would be:



alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;


so that the function merge returns:



merge[Table[sublists[i],i,1,3]]



a1,a3,a2,a5




Note that merge was not given the actual alist.



EDIT:



Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by



alist = Table[a[i], i, 1, 10000];


This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example



seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length



10984




Repeating the steps seems to consistently return a seq that is longer than 10000, while



seq // DeleteDuplicates // Length



10000




To check that all sublists contain a[i] elements in strictly increasing order, we can do:



FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]



True




The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...










share|improve this question











$endgroup$











  • $begingroup$
    how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown alist)
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr note that there is a Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    oh i see... thanks!
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    then.. a3, a1, a2,a5 is also a solution, right?
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr that's right, without knowing alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
    $endgroup$
    – Kagaratsch
    8 hours ago
















5












$begingroup$


Prepare some dummy data to investigate:



alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]


we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?



A tiny example of the above would be:



alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;


so that the function merge returns:



merge[Table[sublists[i],i,1,3]]



a1,a3,a2,a5




Note that merge was not given the actual alist.



EDIT:



Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by



alist = Table[a[i], i, 1, 10000];


This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example



seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length



10984




Repeating the steps seems to consistently return a seq that is longer than 10000, while



seq // DeleteDuplicates // Length



10000




To check that all sublists contain a[i] elements in strictly increasing order, we can do:



FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]



True




The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...










share|improve this question











$endgroup$











  • $begingroup$
    how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown alist)
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr note that there is a Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    oh i see... thanks!
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    then.. a3, a1, a2,a5 is also a solution, right?
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr that's right, without knowing alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
    $endgroup$
    – Kagaratsch
    8 hours ago














5












5








5


1



$begingroup$


Prepare some dummy data to investigate:



alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]


we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?



A tiny example of the above would be:



alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;


so that the function merge returns:



merge[Table[sublists[i],i,1,3]]



a1,a3,a2,a5




Note that merge was not given the actual alist.



EDIT:



Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by



alist = Table[a[i], i, 1, 10000];


This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example



seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length



10984




Repeating the steps seems to consistently return a seq that is longer than 10000, while



seq // DeleteDuplicates // Length



10000




To check that all sublists contain a[i] elements in strictly increasing order, we can do:



FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]



True




The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...










share|improve this question











$endgroup$




Prepare some dummy data to investigate:



alist = RandomSample[Table[ToExpression["a" <> ToString[i]], i, 1, 10000]];
range = Range[10000];
Do[
len = RandomInteger[3000, 10000];
select = Sort[RandomSample[range, len]];
sublists[j] = alist[[select]];
, j, 1, 5000]


we have 5000 sublists of length between 3000 and 10000 elements "ai" in a particular order. I would like to have a function that merges all 5000 sublists such that all duplicates are discarded and any "ai" appear to the left of "aj" if they do so in any of the sublists. How can one do this in Mathematica most efficiently?



A tiny example of the above would be:



alist = a1,a3,a2,a5,a4;
sublists[1] = a1,a5;
sublists[2] = a3,a2,a5;
sublists[3] = a1,a2;


so that the function merge returns:



merge[Table[sublists[i],i,1,3]]



a1,a3,a2,a5




Note that merge was not given the actual alist.



EDIT:



Investigating the Experimental'ShortestSupersequence command, let us replace the definition of alist above by



alist = Table[a[i], i, 1, 10000];


This creates a list of strictly increasing a[i]. Generating the sublists from this as above, we get for example



seq = Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]];
seq//Length



10984




Repeating the steps seems to consistently return a seq that is longer than 10000, while



seq // DeleteDuplicates // Length



10000




To check that all sublists contain a[i] elements in strictly increasing order, we can do:



FreeQ[
Table[
tmp = sublists[i] /. a[x_] -> x;
tmp = tmp[[2 ;;]] - tmp[[;; -2]];
FreeQ[tmp/Abs[tmp], -1]
, i, 1, 5000]
, False]



True




The only way how some of the sublists might have some elements reversed in inconsistent order with alist is if we had ...,a[i],...,a[j],... with j<i somewhere, which is ruled out by the above test. So it seems that Experimental'ShortestSupersequence is buggy...







list-manipulation function-construction






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 7 hours ago







Kagaratsch

















asked 9 hours ago









KagaratschKagaratsch

5,16941351




5,16941351











  • $begingroup$
    how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown alist)
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr note that there is a Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    oh i see... thanks!
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    then.. a3, a1, a2,a5 is also a solution, right?
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr that's right, without knowing alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
    $endgroup$
    – Kagaratsch
    8 hours ago

















  • $begingroup$
    how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown alist)
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr note that there is a Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    oh i see... thanks!
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    then.. a3, a1, a2,a5 is also a solution, right?
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    @kglr that's right, without knowing alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
    $endgroup$
    – Kagaratsch
    8 hours ago
















$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown alist)
$endgroup$
– kglr
8 hours ago




$begingroup$
how do yo reconcile (a) any "ai" appear to the left of "aj" if they do so in alist and (b) alist is not known explicitly? (Since sublists are independently shuffled they don't contain any information on whether or not "ai" is before "aj" in the unknown alist)
$endgroup$
– kglr
8 hours ago












$begingroup$
@kglr note that there is a Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.
$endgroup$
– Kagaratsch
8 hours ago




$begingroup$
@kglr note that there is a Sort after the RandomSample of the integers in range. This picks elements from alist in an ordered way.
$endgroup$
– Kagaratsch
8 hours ago












$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago




$begingroup$
oh i see... thanks!
$endgroup$
– kglr
8 hours ago












$begingroup$
then.. a3, a1, a2,a5 is also a solution, right?
$endgroup$
– kglr
8 hours ago




$begingroup$
then.. a3, a1, a2,a5 is also a solution, right?
$endgroup$
– kglr
8 hours ago












$begingroup$
@kglr that's right, without knowing alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
$endgroup$
– Kagaratsch
8 hours ago





$begingroup$
@kglr that's right, without knowing alist there are several solutions. Finding any of them should be good enough though. You are right, maybe I should describe the merging better.
$endgroup$
– Kagaratsch
8 hours ago











2 Answers
2






active

oldest

votes


















2












$begingroup$

I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.



First, construct a directed graph with edges defined by neighbors in the sublists:



allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]


Next, look at the GraphDistanceMatrix and count how many times the symbol appears in each row:



infcount = Count[∞] /@ GraphDistanceMatrix[G]


Next, sort the graph vertices according to this infcount:



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]


All together in one function:



merge[L_] := 
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]


example



The given example is



allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)


Construct the directed neighbor graph:



G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]


enter image description here



Have a look at the graph distance matrix:



VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)


count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:



infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)


sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)





share|improve this answer











$endgroup$












  • $begingroup$
    Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
    $endgroup$
    – Kagaratsch
    7 mins ago


















1












$begingroup$

Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]



a3, a1, a2, a5




Also



Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]



a1, a3, a2, a5




In case this produces an output with duplicates we can use



DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]





share|improve this answer











$endgroup$












  • $begingroup$
    Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    @Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
    $endgroup$
    – kglr
    8 hours ago











  • $begingroup$
    Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    Then it is probably some glitch in theExperimentalShortestSupersequence` function.
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    Seems that way, see edit of my question for some details.
    $endgroup$
    – Kagaratsch
    7 hours ago











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f199518%2fefficiently-merge-lists-chronologically-without-duplicates%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.



First, construct a directed graph with edges defined by neighbors in the sublists:



allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]


Next, look at the GraphDistanceMatrix and count how many times the symbol appears in each row:



infcount = Count[∞] /@ GraphDistanceMatrix[G]


Next, sort the graph vertices according to this infcount:



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]


All together in one function:



merge[L_] := 
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]


example



The given example is



allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)


Construct the directed neighbor graph:



G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]


enter image description here



Have a look at the graph distance matrix:



VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)


count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:



infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)


sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)





share|improve this answer











$endgroup$












  • $begingroup$
    Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
    $endgroup$
    – Kagaratsch
    7 mins ago















2












$begingroup$

I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.



First, construct a directed graph with edges defined by neighbors in the sublists:



allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]


Next, look at the GraphDistanceMatrix and count how many times the symbol appears in each row:



infcount = Count[∞] /@ GraphDistanceMatrix[G]


Next, sort the graph vertices according to this infcount:



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]


All together in one function:



merge[L_] := 
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]


example



The given example is



allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)


Construct the directed neighbor graph:



G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]


enter image description here



Have a look at the graph distance matrix:



VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)


count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:



infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)


sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)





share|improve this answer











$endgroup$












  • $begingroup$
    Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
    $endgroup$
    – Kagaratsch
    7 mins ago













2












2








2





$begingroup$

I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.



First, construct a directed graph with edges defined by neighbors in the sublists:



allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]


Next, look at the GraphDistanceMatrix and count how many times the symbol appears in each row:



infcount = Count[∞] /@ GraphDistanceMatrix[G]


Next, sort the graph vertices according to this infcount:



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]


All together in one function:



merge[L_] := 
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]


example



The given example is



allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)


Construct the directed neighbor graph:



G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]


enter image description here



Have a look at the graph distance matrix:



VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)


count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:



infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)


sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)





share|improve this answer











$endgroup$



I think the following works but please give it a try with your data. I've tried quite a few random examples and it has worked for all of them.



First, construct a directed graph with edges defined by neighbors in the sublists:



allsublists = Array[sublists, 5000]; (* or whatever the max count is *)
G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists)]


Next, look at the GraphDistanceMatrix and count how many times the symbol appears in each row:



infcount = Count[∞] /@ GraphDistanceMatrix[G]


Next, sort the graph vertices according to this infcount:



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]


All together in one function:



merge[L_] := 
With[G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ L)],
SortBy[Transpose[VertexList[G], Count[∞] /@ GraphDistanceMatrix[G]], Last][[All, 1]]]


example



The given example is



allsublists = Array[sublists, 3]
(* a1, a5, a3, a2, a5, a1, a2 *)


Construct the directed neighbor graph:



G = Graph[Join @@ (BlockMap[Apply[Rule], #, 2, 1] & /@ allsublists),
VertexLabels -> Automatic]


enter image description here



Have a look at the graph distance matrix:



VertexList[G]
(* a1, a5, a3, a2 *)
GraphDistanceMatrix[G]
(* 0, 1, ∞, 1,
∞, 0, ∞, ∞,
∞, 2, 0, 1,
∞, 1, ∞, 0 *)


count the infinities in each row, i.e., for each vertex we count how many other vertices are unreachable:



infcount = Count[∞] /@ GraphDistanceMatrix[G]
(* 1, 3, 1, 2 *)


sort the vertices by the number of infinities (i.e. by the number of unreachable other vertices):



alist = SortBy[Transpose[VertexList[G], infcount], Last][[All, 1]]
(* a1, a3, a2, a5 *)






share|improve this answer














share|improve this answer



share|improve this answer








edited 45 mins ago

























answered 1 hour ago









RomanRoman

9,84511640




9,84511640











  • $begingroup$
    Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
    $endgroup$
    – Kagaratsch
    7 mins ago
















  • $begingroup$
    Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
    $endgroup$
    – Kagaratsch
    7 mins ago















$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago




$begingroup$
Nice solution with graphs! I was afraid it would be too memory demanding and/or slow, but my 32GB ram turned out to be enough and it finished in just a minute or two. Amazing!
$endgroup$
– Kagaratsch
7 mins ago











1












$begingroup$

Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]



a3, a1, a2, a5




Also



Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]



a1, a3, a2, a5




In case this produces an output with duplicates we can use



DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]





share|improve this answer











$endgroup$












  • $begingroup$
    Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    @Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
    $endgroup$
    – kglr
    8 hours ago











  • $begingroup$
    Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    Then it is probably some glitch in theExperimentalShortestSupersequence` function.
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    Seems that way, see edit of my question for some details.
    $endgroup$
    – Kagaratsch
    7 hours ago















1












$begingroup$

Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]



a3, a1, a2, a5




Also



Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]



a1, a3, a2, a5




In case this produces an output with duplicates we can use



DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]





share|improve this answer











$endgroup$












  • $begingroup$
    Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    @Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
    $endgroup$
    – kglr
    8 hours ago











  • $begingroup$
    Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    Then it is probably some glitch in theExperimentalShortestSupersequence` function.
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    Seems that way, see edit of my question for some details.
    $endgroup$
    – Kagaratsch
    7 hours ago













1












1








1





$begingroup$

Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]



a3, a1, a2, a5




Also



Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]



a1, a3, a2, a5




In case this produces an output with duplicates we can use



DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]





share|improve this answer











$endgroup$



Fold[Experimental`ShortestSupersequence, sublists /@ Range[3]]



a3, a1, a2, a5




Also



Fold[Experimental`ShortestSupersequence, RotateRight[sublists /@ Range[3]]]



a1, a3, a2, a5




In case this produces an output with duplicates we can use



DeleteDuplicates @ Fold[Experimental`ShortestSupersequence, sublists /@ Range[5000]]






share|improve this answer














share|improve this answer



share|improve this answer








edited 8 hours ago

























answered 9 hours ago









kglrkglr

195k10216439




195k10216439











  • $begingroup$
    Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    @Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
    $endgroup$
    – kglr
    8 hours ago











  • $begingroup$
    Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    Then it is probably some glitch in theExperimentalShortestSupersequence` function.
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    Seems that way, see edit of my question for some details.
    $endgroup$
    – Kagaratsch
    7 hours ago
















  • $begingroup$
    Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    @Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
    $endgroup$
    – kglr
    8 hours ago











  • $begingroup$
    Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
    $endgroup$
    – Kagaratsch
    8 hours ago










  • $begingroup$
    Then it is probably some glitch in theExperimentalShortestSupersequence` function.
    $endgroup$
    – kglr
    8 hours ago










  • $begingroup$
    Seems that way, see edit of my question for some details.
    $endgroup$
    – Kagaratsch
    7 hours ago















$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago




$begingroup$
Somehow, when I apply this to the big example it returns an output that still has duplicates. If I apply DeleteDuplicates to it, will it give the desired ordering?
$endgroup$
– Kagaratsch
8 hours ago












$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago





$begingroup$
@Kagaratsch, my rough intuition is that it should not happen if the orderings in all the sublists are consistent (ie, a and b appear in the same order in all sublists that contain both) . It could happen if a and b appear in different order in two sublists. In any case wrapping with DeleteDuplicates fixes the problem. Could you post the smallest example in which the output has duplicates?
$endgroup$
– kglr
8 hours ago













$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago




$begingroup$
Hmm, I'm pretty sure the code generates all lists in consistent order due to the Sort command before the elements are picked...
$endgroup$
– Kagaratsch
8 hours ago












$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago




$begingroup$
Then it is probably some glitch in theExperimentalShortestSupersequence` function.
$endgroup$
– kglr
8 hours ago












$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago




$begingroup$
Seems that way, see edit of my question for some details.
$endgroup$
– Kagaratsch
7 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Mathematica Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f199518%2fefficiently-merge-lists-chronologically-without-duplicates%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Sahara Skak | Bilen | Luke uk diar | NawigatsjuunCommonskategorii: SaharaWikivoyage raisfeerer: Sahara26° N, 13° O

The fall designs the understood secretary. Looking glass Science Shock Discovery Hot Everybody Loves Raymond Smile 곳 서비스 성실하다 Defas Kaloolon Definition: To combine or impregnate with sulphur or any of its compounds as to sulphurize caoutchouc in vulcanizing Flame colored Reason Useful Thin Help 갖다 유명하다 낙엽 장례식 Country Iron Definition: A fencer a gladiator one who exhibits his skill in the use of the sword Definition: The American black throated bunting Spiza Americana Nostalgic Needy Method to my madness 시키다 평가되다 전부 소설가 우아하다 Argument Tin Feeling Representative Gym Music Gaur Chicken 일쑤 코치 편 학생증 The harbor values the sugar. Vasagle Yammoe Enstatite Definition: Capable of being limited Road Neighborly Five Refer Built Kangaroo 비비다 Degree Release Bargain Horse 하루 형님 유교 석 동부 괴롭히다 경제력

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу