Is there really no use for MD5 anymore? Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?How many trials does it take to break HMAC-MD5?What differentiates a password hash from a cryptographic hash besides speed?A question regarding relevance of vulnerability of MD5 when linking multiple records togetherCould a very long password theoretically eliminate the need for a slow hash?TCR hash functions from MD5Collision attacks on digital signaturesChecksum vs. non-cryptographic hashIs using a broken SHA-1 for password hashing secure?Unable to implement Client and Server Side Hashing (Validation problem)Keyspace in truncated MD5 hash?Very difficult hashing function?

"Rubric" as meaning "signature" or "personal mark" -- is this accepted usage?

Is there any pythonic way to find average of specific tuple elements in array?

Approximating integral with small parameter

Why didn't the Space Shuttle bounce back into space as many times as possible so as to lose a lot of kinetic energy up there?

What is this word supposed to be?

How to open locks without disable device?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

Co-worker works way more than he should

What is the best way to deal with NPC-NPC combat?

Raising a bilingual kid. When should we introduce the majority language?

How to find the stem of any word?

Do I need to protect SFP ports and optics from dust/contaminants? If so, how?

How much of a wave function must reside inside event horizon for it to be consumed by the black hole?

What to do with someone that cheated their way through university and a PhD program?

Why do games have consumables?

std::unique_ptr of base class holding reference of derived class does not show warning in gcc compiler while naked pointer shows it. Why?

When do you need buffers/drivers on buses in a microprocessor design?

A strange hotel

How long after the last departure shall the airport stay open for an emergency return?

What is the ongoing value of the Kanban board to the developers as opposed to management

How to have a sharp product image?

Putting Ant-Man on house arrest

How to not starve gigantic beasts

Implementing 3DES algorithm in Java: is my code secure?



Is there really no use for MD5 anymore?



Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?How many trials does it take to break HMAC-MD5?What differentiates a password hash from a cryptographic hash besides speed?A question regarding relevance of vulnerability of MD5 when linking multiple records togetherCould a very long password theoretically eliminate the need for a slow hash?TCR hash functions from MD5Collision attacks on digital signaturesChecksum vs. non-cryptographic hashIs using a broken SHA-1 for password hashing secure?Unable to implement Client and Server Side Hashing (Validation problem)Keyspace in truncated MD5 hash?Very difficult hashing function?










2












$begingroup$


I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?










share|improve this question









$endgroup$







  • 2




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    5 hours ago















2












$begingroup$


I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?










share|improve this question









$endgroup$







  • 2




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    5 hours ago













2












2








2





$begingroup$


I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?










share|improve this question









$endgroup$




I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?







hash md5






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 5 hours ago









jornanejornane

1483




1483







  • 2




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    5 hours ago












  • 2




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    5 hours ago







2




2




$begingroup$
The collision attack is the problem, however, MD5 still has pre-image resistance.
$endgroup$
– kelalaka
5 hours ago




$begingroup$
The collision attack is the problem, however, MD5 still has pre-image resistance.
$endgroup$
– kelalaka
5 hours ago










5 Answers
5






active

oldest

votes


















1












$begingroup$


But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






share|improve this answer









$endgroup$




















    1












    $begingroup$

    There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



    The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
    they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



    The BLAKE team did a nice job of making it work well on silicon and in instructions.






    share|improve this answer









    $endgroup$




















      0












      $begingroup$

      A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



      If you want to find duplicate files you can just use CRC32.



      As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




      In a case of high risk by deleting files:



      You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



      Speed is no concern: Compare the files byte per byte.






      share|improve this answer











      $endgroup$








      • 3




        $begingroup$
        If you want to be 100% sure, you compare both files byte per byte.
        $endgroup$
        – A. Hersean
        5 hours ago










      • $begingroup$
        Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
        $endgroup$
        – fkraiem
        4 hours ago






      • 4




        $begingroup$
        Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
        $endgroup$
        – Ruben De Smet
        4 hours ago






      • 1




        $begingroup$
        @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
        $endgroup$
        – JensV
        32 mins ago










      • $begingroup$
        @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
        $endgroup$
        – Martin Bonner
        13 mins ago


















      0












      $begingroup$

      MD5 is used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



      Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. Yet if the adversary takes advantage of MD5's low collision resistance, file/document propagation through systems can be stopped. Carefully made changes can therefore go unnoticed by rsync, and attacks can occur. I expect that this is somewhat of a niche attack but illustrates the concept vis-à-vis MD5 being at the heart of many computer systems.






      share|improve this answer









      $endgroup$




















        0












        $begingroup$


        I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




        There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function.




        1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



        There are two issues here:




        1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



          To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



          There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




        2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



          • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

          • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

          • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

          In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




        1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



        When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



        • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


        • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


        • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision is less than $1/2^100$, so you can still safely skip the comparison step until you have quadrillions of files.


        • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.



        (In a way, this would be creating a rainbow table where all the files are the dictionary)




        This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.




        When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




        I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






        share|improve this answer











        $endgroup$













          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "281"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f70036%2fis-there-really-no-use-for-md5-anymore%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$


          But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




          BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






          share|improve this answer









          $endgroup$

















            1












            $begingroup$


            But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




            BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






            share|improve this answer









            $endgroup$















              1












              1








              1





              $begingroup$


              But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




              BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






              share|improve this answer









              $endgroup$




              But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




              BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 3 hours ago









              DannyNiuDannyNiu

              1,4051629




              1,4051629





















                  1












                  $begingroup$

                  There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



                  The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
                  they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



                  The BLAKE team did a nice job of making it work well on silicon and in instructions.






                  share|improve this answer









                  $endgroup$

















                    1












                    $begingroup$

                    There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



                    The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
                    they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



                    The BLAKE team did a nice job of making it work well on silicon and in instructions.






                    share|improve this answer









                    $endgroup$















                      1












                      1








                      1





                      $begingroup$

                      There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



                      The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
                      they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



                      The BLAKE team did a nice job of making it work well on silicon and in instructions.






                      share|improve this answer









                      $endgroup$



                      There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



                      The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
                      they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



                      The BLAKE team did a nice job of making it work well on silicon and in instructions.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered 1 hour ago









                      b degnanb degnan

                      2,0921829




                      2,0921829





















                          0












                          $begingroup$

                          A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



                          If you want to find duplicate files you can just use CRC32.



                          As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




                          In a case of high risk by deleting files:



                          You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



                          Speed is no concern: Compare the files byte per byte.






                          share|improve this answer











                          $endgroup$








                          • 3




                            $begingroup$
                            If you want to be 100% sure, you compare both files byte per byte.
                            $endgroup$
                            – A. Hersean
                            5 hours ago










                          • $begingroup$
                            Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
                            $endgroup$
                            – fkraiem
                            4 hours ago






                          • 4




                            $begingroup$
                            Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
                            $endgroup$
                            – Ruben De Smet
                            4 hours ago






                          • 1




                            $begingroup$
                            @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
                            $endgroup$
                            – JensV
                            32 mins ago










                          • $begingroup$
                            @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
                            $endgroup$
                            – Martin Bonner
                            13 mins ago















                          0












                          $begingroup$

                          A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



                          If you want to find duplicate files you can just use CRC32.



                          As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




                          In a case of high risk by deleting files:



                          You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



                          Speed is no concern: Compare the files byte per byte.






                          share|improve this answer











                          $endgroup$








                          • 3




                            $begingroup$
                            If you want to be 100% sure, you compare both files byte per byte.
                            $endgroup$
                            – A. Hersean
                            5 hours ago










                          • $begingroup$
                            Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
                            $endgroup$
                            – fkraiem
                            4 hours ago






                          • 4




                            $begingroup$
                            Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
                            $endgroup$
                            – Ruben De Smet
                            4 hours ago






                          • 1




                            $begingroup$
                            @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
                            $endgroup$
                            – JensV
                            32 mins ago










                          • $begingroup$
                            @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
                            $endgroup$
                            – Martin Bonner
                            13 mins ago













                          0












                          0








                          0





                          $begingroup$

                          A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



                          If you want to find duplicate files you can just use CRC32.



                          As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




                          In a case of high risk by deleting files:



                          You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



                          Speed is no concern: Compare the files byte per byte.






                          share|improve this answer











                          $endgroup$



                          A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



                          If you want to find duplicate files you can just use CRC32.



                          As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




                          In a case of high risk by deleting files:



                          You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



                          Speed is no concern: Compare the files byte per byte.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 4 hours ago

























                          answered 5 hours ago









                          AleksanderRasAleksanderRas

                          3,1091937




                          3,1091937







                          • 3




                            $begingroup$
                            If you want to be 100% sure, you compare both files byte per byte.
                            $endgroup$
                            – A. Hersean
                            5 hours ago










                          • $begingroup$
                            Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
                            $endgroup$
                            – fkraiem
                            4 hours ago






                          • 4




                            $begingroup$
                            Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
                            $endgroup$
                            – Ruben De Smet
                            4 hours ago






                          • 1




                            $begingroup$
                            @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
                            $endgroup$
                            – JensV
                            32 mins ago










                          • $begingroup$
                            @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
                            $endgroup$
                            – Martin Bonner
                            13 mins ago












                          • 3




                            $begingroup$
                            If you want to be 100% sure, you compare both files byte per byte.
                            $endgroup$
                            – A. Hersean
                            5 hours ago










                          • $begingroup$
                            Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
                            $endgroup$
                            – fkraiem
                            4 hours ago






                          • 4




                            $begingroup$
                            Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
                            $endgroup$
                            – Ruben De Smet
                            4 hours ago






                          • 1




                            $begingroup$
                            @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
                            $endgroup$
                            – JensV
                            32 mins ago










                          • $begingroup$
                            @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
                            $endgroup$
                            – Martin Bonner
                            13 mins ago







                          3




                          3




                          $begingroup$
                          If you want to be 100% sure, you compare both files byte per byte.
                          $endgroup$
                          – A. Hersean
                          5 hours ago




                          $begingroup$
                          If you want to be 100% sure, you compare both files byte per byte.
                          $endgroup$
                          – A. Hersean
                          5 hours ago












                          $begingroup$
                          Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
                          $endgroup$
                          – fkraiem
                          4 hours ago




                          $begingroup$
                          Why use MD5 as a second step after CRC32 instead of SHA2/3 directly? It should make a negligible difference since, absent deliberate file corruption, CRC32 collisions should still be rare. And if you are concerned about deliberate file corruption, you don't use CRC32 nor MD5 in the first place.
                          $endgroup$
                          – fkraiem
                          4 hours ago




                          4




                          4




                          $begingroup$
                          Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
                          $endgroup$
                          – Ruben De Smet
                          4 hours ago




                          $begingroup$
                          Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
                          $endgroup$
                          – Ruben De Smet
                          4 hours ago




                          1




                          1




                          $begingroup$
                          @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
                          $endgroup$
                          – JensV
                          32 mins ago




                          $begingroup$
                          @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
                          $endgroup$
                          – JensV
                          32 mins ago












                          $begingroup$
                          @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
                          $endgroup$
                          – Martin Bonner
                          13 mins ago




                          $begingroup$
                          @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
                          $endgroup$
                          – Martin Bonner
                          13 mins ago











                          0












                          $begingroup$

                          MD5 is used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



                          Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. Yet if the adversary takes advantage of MD5's low collision resistance, file/document propagation through systems can be stopped. Carefully made changes can therefore go unnoticed by rsync, and attacks can occur. I expect that this is somewhat of a niche attack but illustrates the concept vis-à-vis MD5 being at the heart of many computer systems.






                          share|improve this answer









                          $endgroup$

















                            0












                            $begingroup$

                            MD5 is used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



                            Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. Yet if the adversary takes advantage of MD5's low collision resistance, file/document propagation through systems can be stopped. Carefully made changes can therefore go unnoticed by rsync, and attacks can occur. I expect that this is somewhat of a niche attack but illustrates the concept vis-à-vis MD5 being at the heart of many computer systems.






                            share|improve this answer









                            $endgroup$















                              0












                              0








                              0





                              $begingroup$

                              MD5 is used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



                              Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. Yet if the adversary takes advantage of MD5's low collision resistance, file/document propagation through systems can be stopped. Carefully made changes can therefore go unnoticed by rsync, and attacks can occur. I expect that this is somewhat of a niche attack but illustrates the concept vis-à-vis MD5 being at the heart of many computer systems.






                              share|improve this answer









                              $endgroup$



                              MD5 is used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



                              Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. Yet if the adversary takes advantage of MD5's low collision resistance, file/document propagation through systems can be stopped. Carefully made changes can therefore go unnoticed by rsync, and attacks can occur. I expect that this is somewhat of a niche attack but illustrates the concept vis-à-vis MD5 being at the heart of many computer systems.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered 1 hour ago









                              Paul UszakPaul Uszak

                              7,80011638




                              7,80011638





















                                  0












                                  $begingroup$


                                  I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




                                  There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function.




                                  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



                                  There are two issues here:




                                  1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



                                    To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



                                    There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




                                  2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



                                    • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

                                    • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

                                    • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

                                    In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




                                  1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



                                  When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



                                  • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


                                  • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


                                  • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision is less than $1/2^100$, so you can still safely skip the comparison step until you have quadrillions of files.


                                  • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.



                                  (In a way, this would be creating a rainbow table where all the files are the dictionary)




                                  This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.




                                  When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




                                  I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






                                  share|improve this answer











                                  $endgroup$

















                                    0












                                    $begingroup$


                                    I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




                                    There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function.




                                    1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



                                    There are two issues here:




                                    1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



                                      To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



                                      There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




                                    2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



                                      • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

                                      • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

                                      • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

                                      In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




                                    1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



                                    When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



                                    • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


                                    • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


                                    • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision is less than $1/2^100$, so you can still safely skip the comparison step until you have quadrillions of files.


                                    • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.



                                    (In a way, this would be creating a rainbow table where all the files are the dictionary)




                                    This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.




                                    When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




                                    I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






                                    share|improve this answer











                                    $endgroup$















                                      0












                                      0








                                      0





                                      $begingroup$


                                      I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




                                      There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function.




                                      1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



                                      There are two issues here:




                                      1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



                                        To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



                                        There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




                                      2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



                                        • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

                                        • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

                                        • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

                                        In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




                                      1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



                                      When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



                                      • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


                                      • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


                                      • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision is less than $1/2^100$, so you can still safely skip the comparison step until you have quadrillions of files.


                                      • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.



                                      (In a way, this would be creating a rainbow table where all the files are the dictionary)




                                      This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.




                                      When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




                                      I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






                                      share|improve this answer











                                      $endgroup$




                                      I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




                                      There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function.




                                      1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



                                      There are two issues here:




                                      1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



                                        To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



                                        There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




                                      2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



                                        • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

                                        • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

                                        • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

                                        In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




                                      1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



                                      When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



                                      • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


                                      • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


                                      • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision is less than $1/2^100$, so you can still safely skip the comparison step until you have quadrillions of files.


                                      • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.



                                      (In a way, this would be creating a rainbow table where all the files are the dictionary)




                                      This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.




                                      When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




                                      I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited 4 mins ago

























                                      answered 22 mins ago









                                      Squeamish OssifrageSqueamish Ossifrage

                                      23.2k133106




                                      23.2k133106



























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Cryptography Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          Use MathJax to format equations. MathJax reference.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f70036%2fis-there-really-no-use-for-md5-anymore%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

                                          Israel Cuprins Etimologie | Istorie | Geografie | Politică | Demografie | Educație | Economie | Cultură | Note explicative | Note bibliografice | Bibliografie | Legături externe | Meniu de navigaresite web oficialfacebooktweeterGoogle+Instagramcanal YouTubeInstagramtextmodificaremodificarewww.technion.ac.ilnew.huji.ac.ilwww.weizmann.ac.ilwww1.biu.ac.ilenglish.tau.ac.ilwww.haifa.ac.ilin.bgu.ac.ilwww.openu.ac.ilwww.ariel.ac.ilCIA FactbookHarta Israelului"Negotiating Jerusalem," Palestine–Israel JournalThe Schizoid Nature of Modern Hebrew: A Slavic Language in Search of a Semitic Past„Arabic in Israel: an official language and a cultural bridge”„Latest Population Statistics for Israel”„Israel Population”„Tables”„Report for Selected Countries and Subjects”Human Development Report 2016: Human Development for Everyone„Distribution of family income - Gini index”The World FactbookJerusalem Law„Israel”„Israel”„Zionist Leaders: David Ben-Gurion 1886–1973”„The status of Jerusalem”„Analysis: Kadima's big plans”„Israel's Hard-Learned Lessons”„The Legacy of Undefined Borders, Tel Aviv Notes No. 40, 5 iunie 2002”„Israel Journal: A Land Without Borders”„Population”„Israel closes decade with population of 7.5 million”Time Series-DataBank„Selected Statistics on Jerusalem Day 2007 (Hebrew)”Golan belongs to Syria, Druze protestGlobal Survey 2006: Middle East Progress Amid Global Gains in FreedomWHO: Life expectancy in Israel among highest in the worldInternational Monetary Fund, World Economic Outlook Database, April 2011: Nominal GDP list of countries. Data for the year 2010.„Israel's accession to the OECD”Popular Opinion„On the Move”Hosea 12:5„Walking the Bible Timeline”„Palestine: History”„Return to Zion”An invention called 'the Jewish people' – Haaretz – Israel NewsoriginalJewish and Non-Jewish Population of Palestine-Israel (1517–2004)ImmigrationJewishvirtuallibrary.orgChapter One: The Heralders of Zionism„The birth of modern Israel: A scrap of paper that changed history”„League of Nations: The Mandate for Palestine, 24 iulie 1922”The Population of Palestine Prior to 1948originalBackground Paper No. 47 (ST/DPI/SER.A/47)History: Foreign DominationTwo Hundred and Seventh Plenary Meeting„Israel (Labor Zionism)”Population, by Religion and Population GroupThe Suez CrisisAdolf EichmannJustice Ministry Reply to Amnesty International Report„The Interregnum”Israel Ministry of Foreign Affairs – The Palestinian National Covenant- July 1968Research on terrorism: trends, achievements & failuresThe Routledge Atlas of the Arab–Israeli conflict: The Complete History of the Struggle and the Efforts to Resolve It"George Habash, Palestinian Terrorism Tactician, Dies at 82."„1973: Arab states attack Israeli forces”Agranat Commission„Has Israel Annexed East Jerusalem?”original„After 4 Years, Intifada Still Smolders”From the End of the Cold War to 2001originalThe Oslo Accords, 1993Israel-PLO Recognition – Exchange of Letters between PM Rabin and Chairman Arafat – Sept 9- 1993Foundation for Middle East PeaceSources of Population Growth: Total Israeli Population and Settler Population, 1991–2003original„Israel marks Rabin assassination”The Wye River Memorandumoriginal„West Bank barrier route disputed, Israeli missile kills 2”"Permanent Ceasefire to Be Based on Creation Of Buffer Zone Free of Armed Personnel Other than UN, Lebanese Forces"„Hezbollah kills 8 soldiers, kidnaps two in offensive on northern border”„Olmert confirms peace talks with Syria”„Battleground Gaza: Israeli ground forces invade the strip”„IDF begins Gaza troop withdrawal, hours after ending 3-week offensive”„THE LAND: Geography and Climate”„Area of districts, sub-districts, natural regions and lakes”„Israel - Geography”„Makhteshim Country”Israel and the Palestinian Territories„Makhtesh Ramon”„The Living Dead Sea”„Temperatures reach record high in Pakistan”„Climate Extremes In Israel”Israel in figures„Deuteronom”„JNF: 240 million trees planted since 1901”„Vegetation of Israel and Neighboring Countries”Environmental Law in Israel„Executive branch”„Israel's election process explained”„The Electoral System in Israel”„Constitution for Israel”„All 120 incoming Knesset members”„Statul ISRAEL”„The Judiciary: The Court System”„Israel's high court unique in region”„Israel and the International Criminal Court: A Legal Battlefield”„Localities and population, by population group, district, sub-district and natural region”„Israel: Districts, Major Cities, Urban Localities & Metropolitan Areas”„Israel-Egypt Relations: Background & Overview of Peace Treaty”„Solana to Haaretz: New Rules of War Needed for Age of Terror”„Israel's Announcement Regarding Settlements”„United Nations Security Council Resolution 497”„Security Council resolution 478 (1980) on the status of Jerusalem”„Arabs will ask U.N. to seek razing of Israeli wall”„Olmert: Willing to trade land for peace”„Mapping Peace between Syria and Israel”„Egypt: Israel must accept the land-for-peace formula”„Israel: Age structure from 2005 to 2015”„Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological transition”10.1016/S0140-6736(15)61340-X„World Health Statistics 2014”„Life expectancy for Israeli men world's 4th highest”„Family Structure and Well-Being Across Israel's Diverse Population”„Fertility among Jewish and Muslim Women in Israel, by Level of Religiosity, 1979-2009”„Israel leaders in birth rate, but poverty major challenge”„Ethnic Groups”„Israel's population: Over 8.5 million”„Israel - Ethnic groups”„Jews, by country of origin and age”„Minority Communities in Israel: Background & Overview”„Israel”„Language in Israel”„Selected Data from the 2011 Social Survey on Mastery of the Hebrew Language and Usage of Languages”„Religions”„5 facts about Israeli Druze, a unique religious and ethnic group”„Israël”Israel Country Study Guide„Haredi city in Negev – blessing or curse?”„New town Harish harbors hopes of being more than another Pleasantville”„List of localities, in alphabetical order”„Muncitorii români, doriți în Israel”„Prietenia româno-israeliană la nevoie se cunoaște”„The Higher Education System in Israel”„Middle East”„Academic Ranking of World Universities 2016”„Israel”„Israel”„Jewish Nobel Prize Winners”„All Nobel Prizes in Literature”„All Nobel Peace Prizes”„All Prizes in Economic Sciences”„All Nobel Prizes in Chemistry”„List of Fields Medallists”„Sakharov Prize”„Țara care și-a sfidat "destinul" și se bate umăr la umăr cu Silicon Valley”„Apple's R&D center in Israel grew to about 800 employees”„Tim Cook: Apple's Herzliya R&D center second-largest in world”„Lecții de economie de la Israel”„Land use”Israel Investment and Business GuideA Country Study: IsraelCentral Bureau of StatisticsFlorin Diaconu, „Kadima: Flexibilitate și pragmatism, dar nici un compromis în chestiuni vitale", în Revista Institutului Diplomatic Român, anul I, numărul I, semestrul I, 2006, pp. 71-72Florin Diaconu, „Likud: Dreapta israeliană constant opusă retrocedării teritoriilor cureite prin luptă în 1967", în Revista Institutului Diplomatic Român, anul I, numărul I, semestrul I, 2006, pp. 73-74MassadaIsraelul a crescut in 50 de ani cât alte state intr-un mileniuIsrael Government PortalIsraelIsraelIsraelmmmmmXX451232cb118646298(data)4027808-634110000 0004 0372 0767n7900328503691455-bb46-37e3-91d2-cb064a35ffcc1003570400564274ge1294033523775214929302638955X146498911146498911

                                          Кастелфранко ди Сопра Становништво Референце Спољашње везе Мени за навигацију43°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.5588543°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.558853179688„The GeoNames geographical database”„Istituto Nazionale di Statistica”проширитиууWorldCat156923403n850174324558639-1cb14643287r(подаци)