Text is everywhere. Web pages, databases, the contents of
files--for almost any programming task you perform, you need
to process text. Cut even the most complex text-based tasks
down to size and learn how to master regular expressions,
scrape information from Web pages, develop reusable
utilities to process text in pipelines, and more.
Most
information in the world is in text format, and programmers
often find themselves needing to make sense of the data
hiding within. It might be to convert it from one format to
another, or to find out information about the text as a
whole, or to extract information fromit. But how do you do
this efficiently, avoiding labor-intensive, manual work?
Text
Processing with Ruby takes a practical approach. You'll
learn how to get text into your Ruby programs from the file
system and from user input. You'll process delimited files
such as CSVs, and write utilities that interact with other
programs in text-processing pipelines. Decipher character
encoding mysteries, and avoid the pain of jumbled characters
and malformed output.
You'll learn to use regular
expressions to match, extract, and replace patterns in text.
You'll write a parser and learn how to process Web pages to
pull out information from even the messiest of HTML.
Before
long you'll be able to tackle even the most enormous and
entangled text with ease, scything through gigabytes of data
and effortlessly extracting the bits that matter.
This
book requires a passing familiarity with the Ruby
programming language, and assumes that you already have Ruby
installed on your computer.