Efficient Algorithms for Destroyed Document ReconstructionStereo images rectification and disparity: which algorithms?Which algorithms are usable for heatmaps and what are their pros and consEfficient flood filling (seed filling)How to test image segmentation algorithms?Viewing images that compressed using lossless algorithmsSources on dictionary learning and related algorithmsHigh Dimensional Spaces for ImagesImage registration algorithms for images with varying distancesAlgorithms to correct misspelled word?More Efficient Feature Method Than Haar-Feature For Face Detection
Keeping the dodos out of the field
Singular Integration
Shell builtin `printf` line limit?
How would a physicist explain this starship engine?
Why is Ni[(PPh₃)₂Cl₂] tetrahedral?
Why did Nick Fury not hesitate in blowing up the plane he thought was carrying a nuke?
If I arrive in the UK, and then head to mainland Europe, does my Schengen visa 90 day limit start when I arrived in the UK, or mainland Europe?
Team member is vehemently against code formatting
Can diplomats be allowed on the flight deck of a commercial European airline?
Which values for voltage divider
How does the Earth's center produce heat?
Is there any mention of ghosts who live outside the Hogwarts castle?
Caught with my phone during an exam
If a character has cast the Fly spell on themselves, can they "hand off" to the Levitate spell without interruption?
Why is unzipped file smaller than zipped file
Computing elements of a 1000 x 60 matrix exhausts RAM
Way of refund if scammed?
Proto-Indo-European (PIE) words with IPA
Adobe Illustrator: How can I change the profile of a dashed stroke?
Does the fact that we can only measure the two-way speed of light undermine the axiom of invariance?
Is it normal to "extract a paper" from a master thesis?
Are clauses with "который" restrictive or non-restrictive by default?
why "American-born", not "America-born"?
Does science define life as "beginning at conception"?
Efficient Algorithms for Destroyed Document Reconstruction
Stereo images rectification and disparity: which algorithms?Which algorithms are usable for heatmaps and what are their pros and consEfficient flood filling (seed filling)How to test image segmentation algorithms?Viewing images that compressed using lossless algorithmsSources on dictionary learning and related algorithmsHigh Dimensional Spaces for ImagesImage registration algorithms for images with varying distancesAlgorithms to correct misspelled word?More Efficient Feature Method Than Haar-Feature For Face Detection
$begingroup$
I am not certain this is the proper site for this question however I am mainly looking for resources on this topic (perhaps code). I was watching TV and one of the characters had a lawyer who destroyed his documents using a paper shredder. A lab tech said that the shredder was special.
I am not familiar with this area of computer science/ mathematics but I am looking for information on efficient algorithms to reconstruct destroyed documents. I can come up with a naive approach that is brute force fairly easily I imagine but just going through all the pieces and looking for edges that are the same but this doesn't sound feasible as the number of combinations will explode.
Note: By destroyed documents I am talking about taking a document (printed out) and then shredding it into small pieces and reassembling it by determining which pieces fit together.
image-processing
New contributor
$endgroup$
add a comment |
$begingroup$
I am not certain this is the proper site for this question however I am mainly looking for resources on this topic (perhaps code). I was watching TV and one of the characters had a lawyer who destroyed his documents using a paper shredder. A lab tech said that the shredder was special.
I am not familiar with this area of computer science/ mathematics but I am looking for information on efficient algorithms to reconstruct destroyed documents. I can come up with a naive approach that is brute force fairly easily I imagine but just going through all the pieces and looking for edges that are the same but this doesn't sound feasible as the number of combinations will explode.
Note: By destroyed documents I am talking about taking a document (printed out) and then shredding it into small pieces and reassembling it by determining which pieces fit together.
image-processing
New contributor
$endgroup$
1
$begingroup$
Can you edit your question to define "destroyed documents"?
$endgroup$
– lox
3 hours ago
1
$begingroup$
You should look at the methods used to recover the Stasi (East German secret police) archives that were shredded or mostly -- oops all the shredders are broken from over use -- torn up after the fall of the Berlin Wall. The BBC has a very high-level summary.
$endgroup$
– David Richerby
3 hours ago
add a comment |
$begingroup$
I am not certain this is the proper site for this question however I am mainly looking for resources on this topic (perhaps code). I was watching TV and one of the characters had a lawyer who destroyed his documents using a paper shredder. A lab tech said that the shredder was special.
I am not familiar with this area of computer science/ mathematics but I am looking for information on efficient algorithms to reconstruct destroyed documents. I can come up with a naive approach that is brute force fairly easily I imagine but just going through all the pieces and looking for edges that are the same but this doesn't sound feasible as the number of combinations will explode.
Note: By destroyed documents I am talking about taking a document (printed out) and then shredding it into small pieces and reassembling it by determining which pieces fit together.
image-processing
New contributor
$endgroup$
I am not certain this is the proper site for this question however I am mainly looking for resources on this topic (perhaps code). I was watching TV and one of the characters had a lawyer who destroyed his documents using a paper shredder. A lab tech said that the shredder was special.
I am not familiar with this area of computer science/ mathematics but I am looking for information on efficient algorithms to reconstruct destroyed documents. I can come up with a naive approach that is brute force fairly easily I imagine but just going through all the pieces and looking for edges that are the same but this doesn't sound feasible as the number of combinations will explode.
Note: By destroyed documents I am talking about taking a document (printed out) and then shredding it into small pieces and reassembling it by determining which pieces fit together.
image-processing
image-processing
New contributor
New contributor
edited 3 hours ago
Shogun
New contributor
asked 3 hours ago
ShogunShogun
1085
1085
New contributor
New contributor
1
$begingroup$
Can you edit your question to define "destroyed documents"?
$endgroup$
– lox
3 hours ago
1
$begingroup$
You should look at the methods used to recover the Stasi (East German secret police) archives that were shredded or mostly -- oops all the shredders are broken from over use -- torn up after the fall of the Berlin Wall. The BBC has a very high-level summary.
$endgroup$
– David Richerby
3 hours ago
add a comment |
1
$begingroup$
Can you edit your question to define "destroyed documents"?
$endgroup$
– lox
3 hours ago
1
$begingroup$
You should look at the methods used to recover the Stasi (East German secret police) archives that were shredded or mostly -- oops all the shredders are broken from over use -- torn up after the fall of the Berlin Wall. The BBC has a very high-level summary.
$endgroup$
– David Richerby
3 hours ago
1
1
$begingroup$
Can you edit your question to define "destroyed documents"?
$endgroup$
– lox
3 hours ago
$begingroup$
Can you edit your question to define "destroyed documents"?
$endgroup$
– lox
3 hours ago
1
1
$begingroup$
You should look at the methods used to recover the Stasi (East German secret police) archives that were shredded or mostly -- oops all the shredders are broken from over use -- torn up after the fall of the Berlin Wall. The BBC has a very high-level summary.
$endgroup$
– David Richerby
3 hours ago
$begingroup$
You should look at the methods used to recover the Stasi (East German secret police) archives that were shredded or mostly -- oops all the shredders are broken from over use -- torn up after the fall of the Berlin Wall. The BBC has a very high-level summary.
$endgroup$
– David Richerby
3 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Your problem is NP-Complete, even for strips (n strips yields (2n)!) used, so people use heuristics, transforms like Hough and morphological filters (to match continuity of text, but this heavily increases complexity for matching) or any kind of genetic / NN search, Ants Colony Optimization.
For summary of consecutive steps and various algorithm I recommend An Investigation into Automated Shredded Document Reconstruction using Heuristic Search Algorithms.
The problem itself may end up in nasty cases, when document is not fully sharp (blurred, printed with low resolution) and strips width is small and cut by physical cutter with dulled edges, because standard merging methods like panorama photo sticher gets lost and yield improper results. This is due to lost information by missing small strips, otherwise if you have full digital image cut into pieces, it is as hard as Jigsaw puzzle, non-digital image falls into approximate search.
To make algorithm automatic another problem is pieces feeding, rarely you can give axis aligned strips, so to start process it is nice to input all stripes as one picture with pieces lay by hand, this imposes another problem (this one is easy) to detect blobs and rotate them.
By special shredder instead of stripes yield very small rectangles. For comparison, P-1 class shredder gives stripes 6-12mm wide of any length (about 1800mm^2), class P-7 gives rectangles with area less than 5mm^2. When you get rectangles instead of stripes, problem yields (4n)! permutations, assuming one one-sided document, if there are lots of shreds from unrelated documents (no pictures, text only) in one bag, problem is not really tractable.
$endgroup$
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still beO(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "419"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Shogun is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f109567%2fefficient-algorithms-for-destroyed-document-reconstruction%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Your problem is NP-Complete, even for strips (n strips yields (2n)!) used, so people use heuristics, transforms like Hough and morphological filters (to match continuity of text, but this heavily increases complexity for matching) or any kind of genetic / NN search, Ants Colony Optimization.
For summary of consecutive steps and various algorithm I recommend An Investigation into Automated Shredded Document Reconstruction using Heuristic Search Algorithms.
The problem itself may end up in nasty cases, when document is not fully sharp (blurred, printed with low resolution) and strips width is small and cut by physical cutter with dulled edges, because standard merging methods like panorama photo sticher gets lost and yield improper results. This is due to lost information by missing small strips, otherwise if you have full digital image cut into pieces, it is as hard as Jigsaw puzzle, non-digital image falls into approximate search.
To make algorithm automatic another problem is pieces feeding, rarely you can give axis aligned strips, so to start process it is nice to input all stripes as one picture with pieces lay by hand, this imposes another problem (this one is easy) to detect blobs and rotate them.
By special shredder instead of stripes yield very small rectangles. For comparison, P-1 class shredder gives stripes 6-12mm wide of any length (about 1800mm^2), class P-7 gives rectangles with area less than 5mm^2. When you get rectangles instead of stripes, problem yields (4n)! permutations, assuming one one-sided document, if there are lots of shreds from unrelated documents (no pictures, text only) in one bag, problem is not really tractable.
$endgroup$
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still beO(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
add a comment |
$begingroup$
Your problem is NP-Complete, even for strips (n strips yields (2n)!) used, so people use heuristics, transforms like Hough and morphological filters (to match continuity of text, but this heavily increases complexity for matching) or any kind of genetic / NN search, Ants Colony Optimization.
For summary of consecutive steps and various algorithm I recommend An Investigation into Automated Shredded Document Reconstruction using Heuristic Search Algorithms.
The problem itself may end up in nasty cases, when document is not fully sharp (blurred, printed with low resolution) and strips width is small and cut by physical cutter with dulled edges, because standard merging methods like panorama photo sticher gets lost and yield improper results. This is due to lost information by missing small strips, otherwise if you have full digital image cut into pieces, it is as hard as Jigsaw puzzle, non-digital image falls into approximate search.
To make algorithm automatic another problem is pieces feeding, rarely you can give axis aligned strips, so to start process it is nice to input all stripes as one picture with pieces lay by hand, this imposes another problem (this one is easy) to detect blobs and rotate them.
By special shredder instead of stripes yield very small rectangles. For comparison, P-1 class shredder gives stripes 6-12mm wide of any length (about 1800mm^2), class P-7 gives rectangles with area less than 5mm^2. When you get rectangles instead of stripes, problem yields (4n)! permutations, assuming one one-sided document, if there are lots of shreds from unrelated documents (no pictures, text only) in one bag, problem is not really tractable.
$endgroup$
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still beO(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
add a comment |
$begingroup$
Your problem is NP-Complete, even for strips (n strips yields (2n)!) used, so people use heuristics, transforms like Hough and morphological filters (to match continuity of text, but this heavily increases complexity for matching) or any kind of genetic / NN search, Ants Colony Optimization.
For summary of consecutive steps and various algorithm I recommend An Investigation into Automated Shredded Document Reconstruction using Heuristic Search Algorithms.
The problem itself may end up in nasty cases, when document is not fully sharp (blurred, printed with low resolution) and strips width is small and cut by physical cutter with dulled edges, because standard merging methods like panorama photo sticher gets lost and yield improper results. This is due to lost information by missing small strips, otherwise if you have full digital image cut into pieces, it is as hard as Jigsaw puzzle, non-digital image falls into approximate search.
To make algorithm automatic another problem is pieces feeding, rarely you can give axis aligned strips, so to start process it is nice to input all stripes as one picture with pieces lay by hand, this imposes another problem (this one is easy) to detect blobs and rotate them.
By special shredder instead of stripes yield very small rectangles. For comparison, P-1 class shredder gives stripes 6-12mm wide of any length (about 1800mm^2), class P-7 gives rectangles with area less than 5mm^2. When you get rectangles instead of stripes, problem yields (4n)! permutations, assuming one one-sided document, if there are lots of shreds from unrelated documents (no pictures, text only) in one bag, problem is not really tractable.
$endgroup$
Your problem is NP-Complete, even for strips (n strips yields (2n)!) used, so people use heuristics, transforms like Hough and morphological filters (to match continuity of text, but this heavily increases complexity for matching) or any kind of genetic / NN search, Ants Colony Optimization.
For summary of consecutive steps and various algorithm I recommend An Investigation into Automated Shredded Document Reconstruction using Heuristic Search Algorithms.
The problem itself may end up in nasty cases, when document is not fully sharp (blurred, printed with low resolution) and strips width is small and cut by physical cutter with dulled edges, because standard merging methods like panorama photo sticher gets lost and yield improper results. This is due to lost information by missing small strips, otherwise if you have full digital image cut into pieces, it is as hard as Jigsaw puzzle, non-digital image falls into approximate search.
To make algorithm automatic another problem is pieces feeding, rarely you can give axis aligned strips, so to start process it is nice to input all stripes as one picture with pieces lay by hand, this imposes another problem (this one is easy) to detect blobs and rotate them.
By special shredder instead of stripes yield very small rectangles. For comparison, P-1 class shredder gives stripes 6-12mm wide of any length (about 1800mm^2), class P-7 gives rectangles with area less than 5mm^2. When you get rectangles instead of stripes, problem yields (4n)! permutations, assuming one one-sided document, if there are lots of shreds from unrelated documents (no pictures, text only) in one bag, problem is not really tractable.
answered 2 hours ago
EvilEvil
8,47742447
8,47742447
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still beO(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
add a comment |
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still beO(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still be
O(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
There may be (2n)! arrangements of the shredded strips, but does that still determine the time complexity? Whenever you find "matches" you can "group" them together, behaving as a "thick strip", where only the first and last edge matter for the sake of comparison against other strips. This "clumping" should reduce the search space hugely, but IDK if it will still be
O(n!)
$endgroup$
– Alexander
42 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
$begingroup$
@Alexander This is not the complexity per se. The true hardness comes from the fact, that you are not fully sure, whether your match is really good. If you take a look at the pdf, figure 6.1 page 69, the tigers picture and all consecutive pictures, there are errors. You still have to check fitness of all edges pairwise , take for example several pieces, "grouping them" seems nice, but by choosing elements you prevent some other matches, which may get lower fit but MSE is lower. If exact matching of the edges is viable option, it will be blazingly fast, in my answer I assume it is not possible.
$endgroup$
– Evil
25 mins ago
add a comment |
Shogun is a new contributor. Be nice, and check out our Code of Conduct.
Shogun is a new contributor. Be nice, and check out our Code of Conduct.
Shogun is a new contributor. Be nice, and check out our Code of Conduct.
Shogun is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f109567%2fefficient-algorithms-for-destroyed-document-reconstruction%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Can you edit your question to define "destroyed documents"?
$endgroup$
– lox
3 hours ago
1
$begingroup$
You should look at the methods used to recover the Stasi (East German secret police) archives that were shredded or mostly -- oops all the shredders are broken from over use -- torn up after the fall of the Berlin Wall. The BBC has a very high-level summary.
$endgroup$
– David Richerby
3 hours ago