I am working with a dataset that contains limnological parameters. I stacked turbidity-related measures in one column to assess their relationship with a numerical response variable. This column is, therefore, formed of values in different units and consequently different ranges. Is there a way to standardize this column into a numerical variable called “Turbidity” that comprises all the measurements in different units?
I can not just use the measures together in different units because it will create a range that goes from 0.1 to 1000 in absolute numbers and some turbidity units are high at 10. I also don´t want to use each one of them separately because some has only a few measurements, so I´m trying to make my dataset more robust.
Reprex
values | unit | abundance |
---|---|---|
0.5 | x | 500 |
10 | x | 30 |
50 | y | 50 |
100 | y | 100 |
30 | z | 20 |
60 | z | 60 |
500 | z | 80 |