Tuesday, February 14, 2012

corrupted values in database

I just started working on reports on an SQL 2000 database.
Many field contain numerical values stored as varchar. To make matters worse these strings contain both '.' and ',' as decimal sign.
So far this has really been a great pain :-(

The database and its application are bought as is, I cannot modify fieldtypes or anything else.

Can anyone give a good strategy to convert the strings back to numerical so I can calculate revenue's and such??

P.S.

In extreme cases string contain both ',' AND '.' , like:

5.608,42Use

convert (float, replace(column_name,',',''))

or convert (int,replace(replace(column_name,',',''),'.','') if you are sure the value is an integer|||Note quite what I need,

The string can contain either a . or a ,
With the conversion you give all figures with a ',' come out 100 fold higher than they should be, while those with a '.' come out right.

Some sort of validation seems to be needed to get all them right.

Right now I am focussing on prices which means the highest value is just 900.00 (or 900.00) so I have no problem in this column with both a ',' and a '.'|||The following seems to work:

SELECT
CASE
WHEN
SUBSTRING(REVERSE(RTRIM(LTRIM(PRIJS))),3,1) = ',' THEN
convert(float,replace(PRIJS,',',''))/100
WHEN
SUBSTRING(REVERSE(RTRIM(LTRIM(PRIJS))),3,1) = '.' THEN
convert(dec(9,2),PRIJS)
ELSE 0 END

not getting very good feelings though on having to do these king of conversions :-(|||No, me either! You should consider to clean your data before reporting them. You are not allowed to change the field type, but you should be allowed to update your data. So, define your required format, and update everything, that does not match your format, first. Your format may be "#.###,##" or "####.##"; I would prefer the last, because that can be converted into a number directly. You may even consider to convert everything to cents, which makes it easy for you to detect new values, that are not yet converted.

Another approach would be to identify your different formats, and make a union query, handling each of your formats in a separate branch.|||Can't do that.

Some of the fields are filled by procedures from the AS400 system.
THere is something fundamentaly wrong with the way the data is handled on the SQL database. (For which I just mailed an angry mail to all involved) ,

but doctor maybe you can help me with this one:

I have all the differnent flavors in the DB:

1.203,56
1,203.56
456,67
456.67

You see replacing the comma is okay if it comes as first one in the string.
If there is just a comma replace it by a '.'
If a '.' precedes a ',' replace ','by '.' and get rid of the '.'

Nice challenge ain't it?

Next private message next week, gotta cycle to the north this afternoon
:-)|||No, me either! You should consider to clean your data before reporting them. You are not allowed to change the field type, but you should be allowed to update your data. So, define your required format, and update everything, that does not match your format, first. Your format may be "#.###,##" or "####.##"; I would prefer the last, because that can be converted into a number directly. You may even consider to convert everything to cents, which makes it easy for you to detect new values, that are not yet converted.

I dont uderstand the format "#.###,##" . What does it exactly mean ?|||Originally posted by blom0344
Can't do that.

Some of the fields are filled by procedures from the AS400 system.
THere is something fundamentaly wrong with the way the data is handled on the SQL database. (For which I just mailed an angry mail to all involved) ,


I understand, that you don't have control of the import process, but who is preventing you from cleaning your data afterwards?

Originally posted by blom0344
I have all the differnent flavors in the DB:

1.203,56
1,203.56
456,67
456.67

You see replacing the comma is okay if it comes as first one in the string.
If there is just a comma replace it by a '.'
If a '.' precedes a ',' replace ','by '.' and get rid of the '.'

Nice challenge ain't it?


As I said, determine first your possible flavors. What is imported when your price is 456.- or 456.50? Is is 456[,|.]00 or just 456? So, actually I'm asking whether your decimal point is always on the 3rd last position? Your algorithm depends on that. Another point is whether you can expect a maximum number of digits, or not. So, can you also have, for example 1,234,567.89?

Assuming, your possible flavors are those 4 you gave me, a query like I propose would look like (your price field is called p)

SELECT p FROM T WHERE len(p)=6 AND SubString(p, 4,1)="."
UNION
SELECT replace(p,',','.') FROM T WHERE len(p)=6 AND SubString(p, 4,1)=","
UNION
SELECT replace(p,',','') FROM T WHERE len(p)=8 AND SubString(p, 6,1)="."
UNION
SELECT replace(replace(p,'.',''), ',','.') FROM T WHERE len(p)=8 AND SubString(p, 6,1)=","

Calling this union query U you can do your accumulation like

SELECT sum(cast(p as decimal(10,2))) from (<U>)|||-- replace(replace(p,'.',''), ',','.') --

Yep,

I am working with the replace on replace in my solution as well.
Does not seem to work like it should.
I want to use the CASE instead of UNION solution , cause I am going to assign it to a BO object

We'll continue next week...|||If you know you have only two decimal places, how about something along the lines of eliminating all commas/periods, then dividing by 100?

select convert(numeric (10, 2), replace(replace(value, ',', ''), '.', '')))/100.00

A bit pressed for time here, so I have not tested that code snippet. Hope it helps.|||Elegant solution, MCrowley. :cool:

...but make sure the values don't just only have two decimal places, but that the ALWAYS have two decimal places.

blindman|||Originally posted by blom0344
I just started working on reports on an SQL 2000 database.
Many field contain numerical values stored as varchar. To make matters worse these strings contain both '.' and ',' as decimal sign.
So far this has really been a great pain :-(

The database and its application are bought as is, I cannot modify fieldtypes or anything else.

Can anyone give a good strategy to convert the strings back to numerical so I can calculate revenue's and such??

P.S.

In extreme cases string contain both ',' AND '.' , like:

5.608,42

Dumb question ... but, any chance you are dealing with software that handles multi-currency? Do you have the CCSID translation turned on from the AS400 to the SQL server?|||Most certainly not a dumb question, but in this case we are talking about E-commerce orders from two euro countries, so every order amount is always just in one currency.

McCrowleys solution does indeed work for me, cause every order is calculated down to the euro-cent

No comments:

Post a Comment