# regex for parsing dimension measurement descriptions (Python)

#### binomial-torrent

Essentially, for rows whose work_height, work_width, work_depth dimensions are missing but there's a description of those dimensions in the work_dimensions column, I want to parse the said description into the work_height, work_width, work_depth columns. There are a few types of structures available based on my exploration:

• __ unit x __ unit x __ unit e.g. 200 x 300 mm. This one should be easy.
• __ unit x __ unit \newline __ unit x __ unit, e.g. 200 x 300 mm\n400 x 760 mm I believe these are two different image dimension settings possible for the same image. I want to create a new image item (row) with the second setting (or third or whatever).
• The written out mixed fractions, e.g. 16 7/8 in (42.8 cm) or 16 7/8in (42.8cm). How is this supposed to be parsed? This is one of the hard ones. Since the unit column work_measurement_unit is generally mm, that's the unit to parse I presume (and even then I have to convert from cm to mm).
• Measurement Description, followed by the mixed fraction and other unit in parentheses above, i.e. Diameter: 19 3/7 in (72.5 cm).
To access the rows above I used:

code to get missing data:
mask = (df['work_dimensions'] != '-1') & (df['work_dimensions'].notnull()) & ((df[['work_height','work_width','work_depth']] == -1.0).sum(axis=1) == 3)
df[['work_dimensions','work_height','work_width','work_depth','work_measurement_unit']][mask]

I'm not too familiar with regexp stuff in Python or in general so any help would be appreciated!

#### Daniel Duffy

##### C++ author, trainer

Then "Python in a Nutshell" chapter 9.

#### Daniel Duffy

##### C++ author, trainer
There are two kinds of developer; those that know regex (Perl?) and them that don't.
It's a special area indeed.

#### ExSan

There are two kinds of developer; those that know regex (Perl?) and them that don't.
It's a special area indeed.
There are two kinds of developer: those that know C/C++ and them that don't

#### Daniel Duffy

##### C++ author, trainer
The learning curve for regex can be steep.

Last edited: