The method is about 1/3 faster than htmlParse but it is still quite slow
Usage
standardize_dehtmlize(
x,
as_single_string = FALSE,
as_single_string_sep = "#_|",
use_read_xml = FALSE,
...
)Arguments
- x
object (table)
- as_single_string
If set then collapse characters in the main column of the
x(i.e.,x.col) as to a single string. It will increase performance (at least for relatively short tables). Default is FALSE- as_single_string_sep
delimiter for collapsed strings to uncollapse it later. Default is "#_|".
- use_read_xml
If set the it will parse XML. Default is FALSE which means it parses HTML
- ...
Arguments passed on to
standardize_optionscolColumn of interest (the one we need to standardize) in the
xobject (if it is data.frame like).rowsLogical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_valueIf
rowsparameter is set then mergeomitted_rows_valuewith the results (filtered byrows). Either single string or a character vector of lengthnrow(x). If NULL (the default) then original values ofcolare merged with results.output_placementWhere to inset retults (standardized vector) in the
xobject. Default options is 'replace_col' which overwrides thecolinxwith results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copyis set for temporary values)'prepend_to_col' :: prepend to
col'append_to_col' :: append to
col'prepend_to_x' :: prepend to
xdata.frame like object'append_to_x' :: append to
xdata.frame like object
x_atomic_nameIf
xis vector use this name for original column if it is in results. Default is "x". Ifxis table the name ofcolwill be used.output_col_nameUse this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copyWhether to append a copy of result vector to
xobjectoutput_copy_col_nameHow the append copy wiil be named