File Juicer
File Juicer for macOS
Overview of Formats

Search & Extract

Images

jpg jpeg 2000 gif png pdf wmf emf tiff eps pict bmp

Video

mov mpeg avi wmv

Sound

mp3 wav System 7 au aiff

Text

ascii rtf html

From:

avi cab cache chm dmg doc emlx exe ithmb m4p mht mp3 pdf pps ppt raw swf wps xls zip and other formats

HTML and Web Archives

Extract Images, Flash and Movies from HTML

HTML itself does not contain other file formats - at least not in the literal sense. It does however contain links to images, flash and movies, which are presented in the web page.

These external files can be collected and saved together with the HTML file in a "web archive".
FireFox has a "save complete" feature which will download all the referred files and store them in a folder together with the HTML, and this is probably the best choice for archiving web pages in an open way.

Web archives are not perfect for all cases. Web pages can be scripted in ways which does not support archiving. This is common on movie trailer sites.

Safari and Internet Explorer both have web archive formats, which does the same thing, except the files are packed into a web archive file, designed to be a single file easyly opened again with Safari or Internet Explorer. If you have one of these archives you can also extract the main content with FileJuicer.

Extract HTML from other file formats

HTML is mostly found in browser cache files, web archives, and email attachments, but also inside other formats where it is used at rich text. File Juicer will extract the HTML if it has got the start html tag, and a proper end html tag, and it will include the doctype tag if it is there.

Dedicated web archive application

With plenty of control of how links are followed, and web pages are archived.
Using web site downloaders is not always popular on web sites served on lesser internet connections.