Essentially, for rows whose
work_depthdimensions are missing but there's a description of those dimensions in the
work_dimensionscolumn, I want to parse the said description into the
work_depthcolumns. There are a few types of structures available based on my exploration:
- __ unit x __ unit x __ unit e.g.
200 x 300 mm. This one should be easy.
- __ unit x __ unit \newline __ unit x __ unit, e.g.
200 x 300 mm\n400 x 760 mmI believe these are two different image dimension settings possible for the same image. I want to create a new image item (row) with the second setting (or third or whatever).
- The written out mixed fractions, e.g.
16 7/8 in (42.8 cm)or
16 7/8in (42.8cm). How is this supposed to be parsed? This is one of the hard ones. Since the unit column
work_measurement_unitis generally mm, that's the unit to parse I presume (and even then I have to convert from cm to mm).
- Measurement Description, followed by the mixed fraction and other unit in parentheses above, i.e.
Diameter: 19 3/7 in (72.5 cm).
code to get missing data:
mask = (df['work_dimensions'] != '-1') & (df['work_dimensions'].notnull()) & ((df[['work_height','work_width','work_depth']] == -1.0).sum(axis=1) == 3) df[['work_dimensions','work_height','work_width','work_depth','work_measurement_unit']][mask]
I'm not too familiar with regexp stuff in Python or in general so any help would be appreciated!