1 February 2006
This patent supports ATT's gigantic Daytona database of telecommunications
data, recently accused of aiding NSA's domestic surveillence in a
suit by EFF against ATT
filed 31 January 2006.
United States Patent | 6,782,085 |
Becker , et al. | August 24, 2004 |
The present invention relates to fast data retrieval. The present invention discloses storing call detail data into two databases. A portion of the total call detail data available is mapped into an interpreted usage event (IUE) and stored in a first database that is indexed for quick data retrieval utilizing a standard database management system. The total raw call detail data is stored in a second database that is less structured, with respect to the first database, without requiring explicit indices. IUEs are retrieved from the first database in response to queries specifying one or more of the characteristics of the desired IUEs. Call detail data stored in the second database is retrieved in response to queries specifying the characteristics from one or more of the retrieved IUEs.
Inventors: | Becker; Richard A. (Morristown, NJ); Wilks; Allan R. (Scotch Plains, NJ) |
Assignee: | AT&T Corp. (New York, NY) |
Appl. No.: | 812195 |
Filed: | March 19, 2001 |
Current U.S. Class: | 379/126; 379/111 |
Intern'l Class: | H04M 015/00 |
Field of Search: | 379/111,112.01,114.03,126,133,143,114.01 707/3,4,5,100,102 |
5333183 | Jul., 1994 | Herbert. | |
5757900 | May., 1998 | Nagel et al. | 379/221. |
5907603 | May., 1999 | Gallagher et al. | 379/133. |
6385301 | May., 2002 | Nolting et al. | 379/32. |
Primary Examiner: Kuntz; Curtis
Assistant Examiner: Taylor; Barry W
FIG. 2 is a flow chart illustrating the storage of data in accordance with an exemplary embodiment of the present invention; and
FIG. 3 is a flow chart illustrating the retrieval of data in accordance with
an exemplary embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
Although embodiments of the present invention are illustrated in the accompanying
Figures and are described in this Detailed Description, it is understood
that the present invention is not limited to these embodiments, but is capable
of numerous arrangements, modifications, and substitutions without departing
from the spirit or scope of the invention as defined in the claims. Various
modifications and adaptations of the present invention will be apparent to
persons skilled in the art. For example, the present invention is often described
below with reference to AMA data used by telecommunications companies; however,
the present invention is not intended to be so limited. Applicants intend
for this invention to be applicable to any type of data gathering system,
regardless of the subject matter or form of the data.
FIG. 1 illustrates a block diagram of telecommunications system 100 in accordance
with an embodiment of the present invention. System 100 includes originating
phone 102, destination phone 104, network 106, transmission lines 110 and
112, operation system 114, Call Detail Database (CDD) 116 and AMA database
118. Originating phone 102 and destination phone 104 are connected to network
106 via transmission lines 110 and 112 respectively. FIG. 1 displays phones
102, 104 as plain old telephones, but phones 102, 104 could alternatively
be cordless or wireless phones or any type of suitable telecommunication
device. Lines 110, 112 may be wired or wireless of any suitable transmission
media.
Network 106 includes AMA data 108. AMA data 108 is call detail data. Call
detail is an industry standard way of describing a call, generally information
for billing and other purposes. As described above, AMA data 108 includes
a variety of call information, including, for example, originating number,
terminating number, connect date and time, elapsed time of call, etc. AMA
data 108 also includes additional data that further describes the call, such
as the trunks used, various status indications about the call, the operator
used, and the carriers used (for example, wireless carriers, local exchange
carriers, and long distance carriers). AMA data comes in a variety of different
formats, i.e., having different structure codes, often with appended modules
that include additional information. Modules may include information about
an operator, if one was used, information about local network portability
or information about call features, such as caller identification.
When a call is placed from originating phone 102 to destination phone 104,
through a switch (not shown) in network 106, the switch generates AMA data
108 upon call completion. AMA data 108 is routed from network 106 via operation
system 114 that primarily operates to provide AMA data 108 from network 106
to various systems, most notably billing, that use AMA data 108. AMA data
108 is transmitted from operation system 114 to CDD 116 and to AMA database
118. AMA data 108 may be routed from network 106 and from operation system
114 via any number of suitable transmission protocols and media. For example,
suitable protocols include File Transfer Protocol and Transmission Control
Protocol/Internet Protocol and suitable transmission media include twisted
shielded pair wire, fiber optic cable, coaxial cable and wireless links,
although others may be used.
In a typical day, hundreds of millions of calls are completed and hundreds
of millions of AMA records are processed. AMA data 108 that is associated
with a call is sometimes referred to herein as an AMA record. On average,
an AMA record is approximately 160 bytes, depending upon the number of modules
appended. The complexity of the structure of the AMA records, as well as
the large amount of call detail data provided in each record, makes the records
expensive to store in a form that enables quick retrieval.
Thus, in accordance with an exemplary embodiment of the present invention,
to provide for quick retrieval of AMA data 108, AMA data 108 from operations
system 114 is stored in a first database, namely CDD 116, and also stored
in second database, namely, AMA database 118, as follows.
AMA data 108 is converted, extracted, copied or mapped from its existing
structure, having its structure code, modules etc., into a flat file, e.g.,
a set of structured records, each having a fixed number of fixed format fields.
The more common or generally useful information included within each AMA
data record is mapped to produce an interpreted usage event (IUE), or CDD
record, which is stored in CDD 116. Each IUE includes a fixed number of fields,
for example, twenty to fifty fields, that include a portion of the total
AMA data available in each AMA record. These fields are each defined by a
fixed format. These fields store characteristics about the call such as the
call originating number, originating switch, terminating number, call connect
date and time, elapsed time of call, etc.
The IUEs of CDD 116 are indexed and retrieved by known existing technology
of a database management system (DBMS). The DBMS chosen should accommodate
the size of data to be stored. Many commercial DBMSs exist, for example,
those sold by ORACLE.RTM., which could be used depending on a user's application.
In one embodiment, CDD 116 is arranged according to a DAYTONA .RTM. DBMS
designed to accommodate the large volume of daily data and structured in
a flat file to store multiple records. (Information about the DAYTONA .RTM.
DBMS can be found at www.gtlinc.com/daytona.html.) Each IUE contains thirty-nine
fields and is approximately 160 bytes in length in a first form. To reduce
the storage requirements, the IUEs may be converted into a second form, via
encoding or other ways known in the art, and compressed into a smaller length,
e.g., thirty bytes, before being stored in CDD 116.
The IUEs are stored in CDD 116 with many of the fields indexed. The indexing
enables the quick retrieval of the IUEs, however, other methods of quick
retrieval now known or later discovered could be used. Such indexed fields
might include originating number, date, and terminating number. Thus, if
a user desires to retrieve calls placed from an originating number on a
particular date, the user enters the originating number and date as a query
(see FIG. 1) and the DBMS searches CDD 116 for IUEs having the desired
characteristics. Once these IUEs are located, the IUEs are retrieved and
displayed for the user. The indices stored in CDD 116 allow certain fields
to be searched instead of sequentially reading all the records, therefore
reducing the total search time from many hours to a few seconds.
The user can input a query such that CDD 116 returns one or more IUEs. For
example, a user might input a query having call characteristics including
the originating call number, month, day, year, and hour for a residential
originating number. This query most likely would yield zero, one or a few
IUEs. Alternatively, a user might input a query having the originating call
number, month, and year for a business originating number. In this case,
the query most likely would reveal a large number of IUEs.
CDD 116 can have an extremely large capacity. For example, approximately
350 million AMA records may be received each day. Roughly the same number
of IUEs are created each day. CDD 116 may store any amount of IUEs in its
indexed database. For example, current CDD 116 stores six months of IUEs
with plans to increase CDD 116 to store two years of IUEs. Notwithstanding
this large size of CDD 116, responses to typical queries for IUEs from CDD
116 can be retrieved in under a minute.
When an IUE is retrieved in response to a query, the IUE is often a sufficient
description of the call in question. However, in the event that more data
is required, it is necessary to retrieve the AMA record that corresponds
to the IUE. For example, the information in AMA database 118 can help one
troubleshoot problems better than the information in the CDD 116, because
AMA database 118 contains more information than CDD 116. AMA database 118
contains more helpful information to solve network problems, incorrect settings,
incorrect switching operations, billing problems, etc. Thus, it is sometimes
desirable to obtain the AMA record that corresponds to the CDD record.
AMA records received from operation system 114 are also extracted, copied,
mapped or stored in AMA database 118. AMA database 118 may be compressed
for more efficient storage. The compression may be at the file level, the
record level or in any suitable manner. AMA database 118 stores AMA data
records within files that are a standard part of a computer's file system.
The files have a naming convention based upon the originating switch, the
day and the hour that the AMA record is processed. Alternate storage structures
and naming conventions may be implemented as is known to one of ordinary
skill in the art. In general, an AMA record for a call is processed shortly
after the call is completed. The day and hour of the call completion, therefore,
is approximately equal to the day and hour that the call is processed.
AMA database 118 is more simplistic in comparison to CDD 116 in that it does
not require explicit indices to readily pin-point individual AMA data records
of interest. If one desires to find an AMA record in AMA database 118, one
can approximately identify the file containing a desired record by entering
a query based upon the originating switch, the date and the hour of the call
completion. (Recall that the date and hour of the call completion are often
close to the date and hour that the call is processed by AMA database 118.)
The query will retrieve a file, uncompress the file, if compressed, and search
the file sequentially for a specific AMA record. If the record is not located,
the next sequential files will then be searched until the specific AMA record
is located and retrieved.
Records in CDD 116 contain information to access the appropriate file in
AMA database 118. Specifically, each IUE in CDD 116 includes the originating
switch, the call date, the start time and the elapsed time of the call. When
a CDD query is issued to CDD 116 and an IUE is retrieved in response to the
CDD query, there is information in the IUE sufficient to access the corresponding
file in AMA database 118. The originating switch and the call date are fields
in the IUE of CDD 116. The disconnect time can be calculated knowing the
start time and the elapsed time of the call. These three values, the originating
switch, the date and the hour of disconnect time, are combined to yield a
file name that approximately corresponds to the file in AMA database 118
which contains more information about the call of interest. An AMA query
with this information is sent to AMA database 118. Once the file in AMA database
118 is retrieved, the file is decompressed as necessary and searched to identify
the specific AMA record of interest by identifying the AMA record with the
corresponding originating number, originating switch, terminating number,
connect time, etc.
FIG. 2 illustrates a flow chart 200 in accordance with the present invention
showing the storage of data. FIG. 2 is explained with reference to a hypothetical
example. Two telephone calls are made from residential number 123-456-7890,
using originating switch 012345, on Jan. 02, 2001 between the hours of 09:00
a.m. and 10:00 a.m. Upon the completion of each call, raw AMA data 108 is
created. (Step 202). The raw AMA data is then processed (Step 204) and stored
into AMA database 118. (Step 206). The raw AMA data may be compressed using
any known compression methods prior to storage. The raw AMA data may be
compressed at the file level, at the record level or at some appropriate
level within a particular data storage arrangement. The raw AMA data is also
mapped to create records, namely IUEs or CDD records. (Step 208). The IUEs
are stored into CDD 116. (Step 210).
A user desires to retrieve information about these two calls as shown in
FIG. 3. A CDD query is run using a DBMS in CDD 116 to retrieve all IUEs regarding
residential number 123-456-7890, using originating switch 012345, on Jan.
02, 2001 between the hours of 09:00 a.m. and 10:00 a.m. (Step 302). Two IUEs
are retrieved, a first call being made at 09:05 a.m. and lasting ten minutes
and a second call being made at 09:45 a.m. and lasting twenty minutes. (Step
304).
More information is desired about these two IUEs, for example, what trunks
were used and whether operators assisted in the calls. (Question 306). (If
more information is not desired, the process would be done, as indicated
by the "No" option and Step 310). To retrieve the corresponding AMA records
for each IUE, a second query is run in AMA database 118. (Step 308). The
files of AMA database 118 are uncompressed, if applicable, and searched until
the AMA records, corresponding to the IUEs, are identified and retrieved.
As noted above, AMA records may be stored in files indexed by the originating
switch, and the date and hour that the call is processed. In this example,
the call completion hour would be "09" for the first call and "10" for the
second call. Assuming that the completed call is processed at approximately
the same time that the call was completed, i.e., within one or two hours,
the AMA query will start searching AMA database 118 at the files indexed
by date, switch, and hour. For example, when storing data in a file system,
the "date" can be used as a file directory and the "switch hour" information
can be used as a file name. This convention reduces the number of files stored
in a single directory. In this example, AMA database 118 would start searching
in file directory Jan. 02, 2001 at file 012345.09 for the first call and
in file directory Jan. 02, 2001 at file 012345.10 for the second call. These
files in AMA database 118, as well as the subsequent files, are then
uncompressed, if applicable, and searched until the two AMA records,
corresponding to the two IUEs, are identified and retrieved. (Step 308).
If the second call made at 09:45 a.m. lasted only ten minutes, instead of
the twenty minutes, the completion time of the call would be 09:55 a.m. Depending
upon the transmission time of the AMA data from the switch to processing,
the second call could be processed in hour 9 or hour 10. Because the processing
time is not definite, a range of AMA files in AMA database 118 needs to be
searched to retrieve the appropriate data.
In an alternate embodiment, the IUEs of CDD 116 include a field that allows
one to more quickly locate the corresponding data in AMA database 118.
Specifically, this field includes information regarding the relationship
between the time the call was completed and the time the completed call was
processed in AMA database 118. As discussed above, AMA data 108 is generally
stored in a file in AMA database 118 within the hour or so that the call
was completed. So, the actual call date and call completion time generally
reflect the date and time that the call was processed and stored in AMA database
118. In certain cases, however, the AMA data may not be stored in AMA database
118 for over a week. When this happens, part of the file index or file
identifier, i.e., the date and hour that the call is processed, does not
approximate the actual call date and actual call termination hour. In this
alternate embodiment, the IUE or CDD record, includes an additional field
that reflects whether or not the AMA file was delayed in being stored. This
field reflects the additional time, in hours, for example, that elapsed between
the call completion and call processing. To locate a file in AMA database
118 that corresponds to an IUE in CDD 116 in this embodiment, the call elapsed
time and the additional time prior to call process is added to the call
initiation time to identify the correct index of the file with which to begin
searching.
AMA database 118 may be interfaced with CDD 116 in any number of ways known
to those of ordinary skill in the art. For example, the interface between
AMA database 118 and CCD 116 may be a web-based interface such that a user
can "point-and-click" on an IUE and the corresponding AMA record will be
retrieved for the user. Alternatively, the interface is designed such that
a user selects, highlights or checks-off one or more IUEs and "points-and-clicks"
on an AMA icon to retrieve the corresponding AMA data records. In yet another
alternate embodiment, the retrieved IUEs are displayed on a screen having
an icon or "hot button" that retrieves the raw AMA data from AMA database
118 for all the retrieved IUEs using a one-step process by "clicking on"
or depressing the icon or "hot button."
Alternatively, a web-based interface may be used to directly access the raw
AMA data. If the call disconnect date and time information is known, as well
as some characteristics of the call, e.g., the originating number, terminating
number, etc., the raw AMA data may be obtained directly without requiring
the access to CDD 116.
As noted above, IUEs from CDD 116 can be retrieved often times in under a
minute. With the teachings of the present invention, AMA records from AMA
database 118 can also be retrieved within a minute.
Although the present invention is illustrated with respect to AMA data, it
is understood that the present invention is not limited to these embodiments,
but is capable of numerous arrangements, modifications, and substitutions
without departing from the spirit or scope of the invention as defined in
the claims. Applicants intend for this invention to be applicable to any
type of data gathering system, regardless of the subject matter or form of
the data. For example, it is contemplated that, instead of AMA records, a
user may wish to store pictures, photographs or other types of graphical
files. To index the pictures, a user might identify the pictures by feature
sets, for example, predominant color, subject matter, brightness, or other
characteristics. In this example, the pictures are the unstructured,
difficult-to-search data that would be stored in database 118. The feature
set would be indexed and stored in database 116. In yet another embodiment,
instead of AMA records, a user may wish to store documents. A user would
compute a feature set of the document, for example, frequently commonly used
words in the document, subject, author, etc. In this example, the documents
are the unstructured data that would be stored in database 118. The feature
set would be indexed and stored in database 116. Numerous other applications
are possible, as would be obvious to one of ordinary skill in the art.