librelist archives

« back to archive

Hashing - what is the best solution?

Hashing - what is the best solution?

From:
Ramakrishnan Muthukrishnan
Date:
2014-01-30 @ 14:22
Hi,

I have a problem where I have this big video file. I need to chunk them
into blocks of 1024 bytes long. After that, I need to find SHA256 of
each block. But the catch is that, I need to start from the last
chunk. Append that hash with the block before that and find the combined
hash and so on.. till we hit the first block and return the value of the
one for the first block.

I thought of using Lazy ByteString (Data.ByteString.Lazy.Char8
module). But laziness does not seem to be of an advantage here as we
cannot compute the hash as we go. We need to hit the last chunk and work
from there, recursion fits nicely here. Another approach is to do a
chunking and store them as chunks in the list and then use foldr.

I am wondering if there is an elegant approach to the problem. I haven't
studied lens. Don't even know if it can be applied to this type of a
problem.

Any suggestions and help will be greatly appreciated. :) 

Thanks
Ramakrishnan

Re: Hashing - what is the best solution?

From:
Ramakrishnan Muthukrishnan
Date:
2014-01-30 @ 17:52
I solved it with foldl, it turned out to be not too bad. But criticisms
and improvements and reviews always welcome.

Here is the code for the complete program:

module Main where

import System.IO
import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Char8 as BLC
import System.Environment (getArgs)
import qualified Data.Digest.Pure.SHA as SHA

chunkify :: Handle -> Int -> [BLC.ByteString] -> IO [BLC.ByteString]
chunkify fileH size bxs = do
  ineof <- hIsEOF fileH
  if ineof
  then return bxs
  else do
    chunk <- BL.hGet fileH size
    chunkify fileH size (chunk:bxs)

hashChain bxs =
    let (bx':bxs') = bxs
        hDigest = SHA.sha256 bx'
        h = SHA.bytestringDigest hDigest
    in
      foldl combine hDigest bxs'
            where combine chash chunk' = 
                      SHA.sha256 $ BLC.append chunk' $ SHA.bytestringDigest chash
main = do
  args <- getArgs
  handle <- openFile (head args) ReadMode
  bxs <- chunkify handle 1024 []
  putStrLn $ "number of chunks = " ++ (show (length bxs))
  putStrLn $ "hash value = " ++ (show $ hashChain bxs)

Re: [bangalorehaskell] Re: Hashing - what is the best solution?

From:
Ashok Gautham
Date:
2014-01-31 @ 09:56
---- On Thu, 30 Jan 2014 17:52:32 +0000 Ramakrishnan Muthukrishnan  wrote ---- 

>I solved it with foldl, it turned out to be not too bad. But criticisms 
>and improvements and reviews always welcome. 

If you are using foldl, it might be a good idea to look at the 
enumerator/iteratee library. I have heard good stuff being said about it 
for high-performing foldl-based IO in #haskell.

http://hackage.haskell.org/package/iteratee (hackage) 
http://www.tiresiaspress.us/haskell/iteratee/Examples/ (examples)


>hashChain bxs = 
> let (bx':bxs') = bxs 
> hDigest = SHA.sha256 bx' 
> h = SHA.bytestringDigest hDigest 
> in 
> foldl combine hDigest bxs' 
> where combine chash chunk' = 
> SHA.sha256 $ BLC.append chunk' $ SHA.bytestringDigest chash 
>main = do 
> args <- getArgs 
> handle <- openFile (head args) ReadMode 
> bxs <- chunkify handle 1024 [] 
> putStrLn $ "number of chunks = " ++ (show (length bxs)) 
> putStrLn $ "hash value = " ++ (show $ hashChain bxs) 
>

Am I missing something here?  You are using left fold. That does this.

foldl f a [b1,b2,b3...] is 
(f (f (f a b1) b2) b3) 

So, your chunking method is
(sha256 (chunk10 ++ .. (sha256 (chunk4 ++ (sha256 (chunk3 ++ (sha256 
(chunk2 ++  hash_chunk1)))))))

But your initial question was for something that could only be solved by 
foldr. i.e. What you initially wanted was

(sha256 (chunk1 ++ .. (sha256 chunk7 ++ (sha256 (chunk8 ++ (sha256 (chunk9
++ hash_chunk10))))


Are you missing a reverse somewhere? or am I?
Also, you could write your function as

hashChain (bx:bxs) = bxs ... rather than using the let for this. Which 
again brings me to another question. What about hashChain []? While this 
is not recursive and may never happen, it is still good-practice to never 
have partially defined functions.