• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Cleaning data

kean

Mathematics Student
Joined
5/31/06
Messages
246
Points
28
Hi Folks,

I am hoping some of the quant gurus can advise their data cleaning process....really want to know whether only using SQL to rearrange the data...or using SAS, SPlus, R or do it directly from the database management system.
I am using Splus to clean data. Please advise.

Thanks,
K
 
I'm not a quant guru just yet but will reply anyway :)

For my last project I used SAS to clean the data. There is no particular reason that I've chosen SAS. However, it worked great for me. I had a database of about 200,000 loans. There were a lot of missing data points and I had to fill them with something for the further analysis. SAS worked extremely fast with this large data set.

Cleaning data is rather creative process. SAS helped me to minimize technical complications and concentrate more on data itself.
 
Thanks

I find it R and SPlus are quite good. R is free. I used SAS before at university but I don't have the license at home therefore I choose R.

Also, R allows interface with C++ which is quite ok. I tested.

Cheers,
K


I'm not a quant guru just yet but will reply anyway :)

For my last project I used SAS to clean the data. There is no particular reason that I've chosen SAS. However, it worked great for me. I had a database of about 200,000 loans. There were a lot of missing data points and I had to fill them with something for the further analysis. SAS worked extremely fast with this large data set.

Cleaning data is rather creative process. SAS helped me to minimize technical complications and concentrate more on data itself.
 
I find it R and SPlus are quite good. R is free. I used SAS before at university but I don't have the license at home therefore I choose R.

Also, R allows interface with C++ which is quite ok. I tested.

Cheers,
K

I also love the R/S-Plus platform, although it's been a while since I have worked with it. One simple way of working with data is convert the data into a data matrix, and use apply() operator in combination with functions such mean, min, max, as is.na, etc. to generate summary statistics. Once this is done you can proceed to convert the N/A, missing data into whatever you need. Send me a PM.
 
Back
Top