please send an example file
please try to provide more detail as to exactly what you would want to do. I understand about removing duplicates. Is that if the entire row is a duplicate, or is it for a specific column.
And what do you mean by retain information before removing duplicat, does that mean you want a list of only the duplicate rows?
I looked at your very small example and do have a few more questions.
First let me explain how this might be accomplished.
First of all there is no such formula that will do this it will need some VBA.
The first part is relatively easy, look for duplicates in column D,B. and delete any row that has no value in column P. This will leave the row containing column P automatically.
Now is where it gets tricky.
If you want to retain data from other rows, it will significantly increase processing time/cost as now blanks anywhere have to be replaced with any potentital data. And the way excel accesses the computers CPU is not the most efficient for this kind of work.
Now the problem occurs when what happens if say for example you have 3 duplicate rows and they have 3 different values in column P or any other coluimn. How does the program prioritize which one to keep and which to delete.
One such way might be to delete 2 of the rows, but copy any extranious data into new columns. Perhaps you can say what the most important data is and those could be copied into new column to the right of the row we are keeping. But this might not look very good, but atleast the data is retained.
Other option is to copy the entire row that is being deleted to another sheet. so you have a master list of the retained data but still have a backup of the original deleted data somewhere that you can refer to.
So ultimately I would suggest we do this is multiple steps. first do the minimum, just delete any duplicates that have empty column P. THen once that is in place, we can open new questions and see if there is more that can be done with the program.
here are my questions.
1. What version of excel do you have.
2. Is this list contained in multiple workbooks or multiple sheets within a workbook.
3. Is each list unique, meaning that list 2 wont contain any duplicates of list 1, meaning that there will not be a duplicate row in the 2nd list that is a duplicate of any row in the first list.
well if you are going to import to a database
The way it is usually done is that we have a separate table for each of the ABCD unique values, then using ID#s reference that back to the table that contains all the other rows.
it depends on what software is being used.
Are you doing the database in MS Access or MySQL or something.
As far as importing the table from a file that does not take long. Then what we have to do is create tables and reports and stuff. Once it is in a data base it really doesn't matter. 100K to 600K records is nothing for a data base, many companies use DBs for 10s of millions of records.
And yes technically you could just set up filters in excel on the original data without changing data at all, but if your goal is to ultimately import this to a DB any way. \
The question is weather it has to be imported a specific way or if we create whatever DB is needed.
so now the question does each set get a unqiue number
for example if there are 5 rows with the same values, do they all get a 1, then the next 5 rows that are the same but with different values get a 2 and so on.
or does each row in the set get a value, so rows 31 to 37 get 1,2,3,4,5 and so on?
yes i understand that.
but I don't think you answered my question of how you want the numbers to be.
so just to clarify
lets say the first 5 rows are the same in the 2 columns. Then each of those 5 rows would each get #1.
Then lest say next 3 rows have same in 2 columns. Then each of those 3 rows would get #2.
lets say next 8 rows are same in 2 columns. Then each of those 8 rows wold get #3.
And so on?
ok then given this criteria you will need a macro.
this is a piece of programming code that I will write, it stays in your excel sheet and you activate it by pressing a button or going to a menu and running it. It is not automatic. Once you run it, it will add the numbers for you.
To work all of the data needs to be either in a single list, or atleast in the same workbook even on separate sheets.
I need to ask what version of excel you use?
well next I send you an additional service offer to cover the cost of this.
Then I need you to send a file. You can either put all the items in a single list, or each list in its own sheet on the same workbook.
yes, yes, and yes.
You can provide me with a dropbox link.
you can put each list on a separate sheet if you want, I believe each sheet goes to 1 million rows.
I am not sure how long it would take to run. But you may want to let it run overnight to be safe.
Yes you could use the macro over and over by copying and pasting new data in.
hey, no you do not need my information. Just set up your own account in Drop and one of the options is to share a link or share a file.
Then they give you a link to paste here.
Once I get the file I can get started
all work the same way as drop box and you don't need any information. All you have to do is upload the file and then they give you a link which you will then provide to me.
Also other people have shared dropbox links with me without my needing to give them any info.
If worse comes to worse I will do a remote login to your computer and grab the file that way.
let me know.
Sooner I get the file sooner I could get started.
hi, did you try one of those other services.
I would like to get started on this for you
i'm not finding the file. Not sure if you completed the upload process or not.
The reason that it needed to be combined into one workbook is because it is 10 times easier than dealing with multiple workbooks.
Now the only option left to us is to set up a remote session where I will download the file directly from your computer and discuss how this will work with you.
Please download and install teamviewer
its a free remote share program, if they ask at some point say it is for personal use.
hold on, 16GB? that is one large file. You sure its not 16MB?
ok, we could do a remote share and Ill just program directly into your computer, this would make things way easier given the file sizes.
If possible can you send me a small sample file maybe 1000 rows or so. including the blanks in columns D and E as you suggested.
i got the file.
Ok so just for clarification
we will insert for column D "0900" anywhere there is a blank cell
column E insert "Unknown" any blank cell.
I might as well leave column A since it is blank and we need a blank column in order to insert the numbers as requested.
Anywhere there are consecutive rows in both Column B and D. we will insert an incremental number in Column A for each set of consecutive columns B and D.
Is that pretty much it?
of so we compare column C and E then
does that bean that for step 1 as listed above
we will insert into columns E and F,
or will it sill be D and E?
sending file back with code
this should be working now
I did a small test on about 10000 rows, and it took about 3 minutes, so you can do the math. It is about as efficient as I can make it now.
Also I had to add a sort feature into it, since we are comparing consecutive colums C and E I have the macro first sort by those columns when it runs.
To get it to run you have to do the following.
First you might get some kind of security warning when you first open the file some bar at the top of excel saying "enable content" or "enable macros" something to that effect, you have to do that or it will not run.
second on the menu click the "view" tab, one of the options on the menu bar will be macros,
goto that and goto view macros
this will give you a list of all the macros in the workbook t
in this case there is only one, "addnums"
select it from the list by higlighing it and click run
then check back in an hour or 2 and it should be done.
yes, 0900 and unknown were inserted.
did you rearrange your columns You said 900 in column D and unknown column E but your graphic shows the opposit
ok, tell me exactly how you ran it , and how long it took to complete.
You rellize that if the circle is just spinning it is actually doing something even though it doesn't show anything on the screen.
well I ran it and was able to run without any problem
Here is your knew file
duplicates were not deleted, all you asked for was to have numbers added in column A and some things added to columns D and E
that is all the macro did
And yes it is complete based on what you asked for
You said """"
The CRM people say that as far as they are concerned as long as I can assign an "unique code" to each of the rows that are the same then that is all that is required."For example" if row numbers 31 to 37 are the same (that is that columns B and D (i.e., the business name and the business phone number are the same) - all we have to do is create a separate column in the spreadsheet for an unique identifier and in the case of my example above all we need to do is enter in this extra column (identifier column) say the numerial 1 (for rows 31 to 37).So could that be a formula for instance ? That is, something along the lines of "if B and D" = "B and D on the next row" then put 1 in column Z then the next formula would be an increment on the (Column Z) number 1 (i.e., in this case the numerial 2 ) etc. - or is there a better way to come up with this unique identifier for similar rows?
Those are all your words.
The CRM people say that as far as they are concerned as long as I can assign an "unique code" to each of the rows that are the same then that is all that is required.
again your words.
I confirmed this with you 3 times before starting and you confirmed it all 3 times.
So this is exactly what it does.
Then you wanted to add some info to the blank spaces in certain columns, I did that and it works fine.
As far as running, it maybe your computer is not powerful enough. I had no problem running it.
The latest file I sent you contains, the code in case you need it in the future. And I ran it for you so it is complete based on what was requested above from your RCM department.
in case you missed it here is the completed file again
delete any old copies of the file and download the new one which is complete. I don't think it could be any clearer than that.
So you should be all set based on what you told me you needed.
Sorry for any confusion.
did you not say
"The CRM people say that as far as they are concerned as long as I can assign an "unique code" to each of the rows that are the same then that is all that is required."
What am I missing?
that is all that is required
and it was confirmed 3 times by you, 3 times