codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

issue in handling CSV data


On 2019-09-08 01:19, Sharan Basappa wrote:
> I am trying to read a log file that is in CSV format.
> 
> The code snippet is below:
> 
> ###############################
> import matplotlib.pyplot as plt
> import seaborn as sns; sns.set()
> import numpy as np
> import pandas as pd
> import os
> import csv
> from numpy import genfromtxt
> 
> # read the CSV and get into X array
> os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies Ltd\Projects\MyBackup\Projects\Initiatives\machine learning\programs\constraints')
> X = []
> #with open("constraints.csv", 'rb') as csvfile:
> #    reader = csv.reader(csvfile)
> #    data_as_list = list(reader)
> #myarray = np.asarray(data_as_list)
> 
> my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> print (my_data)
> 
> my_data_1 = np.delete(my_data, 0, axis=1)
> print (my_data_1)
> 
> my_data_2 = np.delete(my_data_1, 0, axis=1)
> print (my_data_2)
> 
> my_data_3 = my_data_2.astype(np.float)
> ################################
> 
> Here is how print (my_data_2) looks like:
> ##############################
> [['"\t"81' '"\t5c']
>   ['"\t"04' '"\t11']
>   ['"\t"e1' '"\t17']
>   ['"\t"6a' '"\t6c']
>   ['"\t"53' '"\t69']
>   ['"\t"98' '"\t87']
>   ['"\t"5c' '"\t4b']
> ##############################
> 
> Finally, I am trying to get rid of the strings and get array of numbers using Numpy's astype function. At this stage, I get an error.
> 
> This is the error:
> my_data_3 = my_data_2.astype(np.float)
> could not convert string to float: " "81
> 
> As you can see, the string "\t"81 is causing the error.
> It seems to be due to char "\t".
> 
> I don't know how to resolve this.
> 
> Thanks for your help.
> 
Are you sure it's CSV (Comma-Separated Value) and not TSV (Tab-Separated 
Value)?

Also the values look like hexadecimal to me. I think that 
.astype(np.float) assumes that the values are decimal.

I'd probably start by reading them using the csv module, convert the 
values to decimal, and then pass them on to numpy.